A Quick Guide to momspider

This is just a quick introduction to the momspider. There is also more detailed information available.

Momspider is a web robot that checks the validity and freshness of links. It will traverse a tree/site starting at a certain point and check to make sure that each document referenced is accessible. Normally it is constrained to only checking the existence of documents on other sites.

As the momspider can use large amounts of memory and network bandwidth, please be careful in your use. Also, since it is more efficient to run it once with multiple trees than multiple times with one tree each, we will be running it in a cron job. Once you have tested your instruction file and have fixed the problems uncovered, you can send a message to webmaster@fortnet.org with your instruction file enclosed and we will add it to the main instruction file.

Run momspider with:   momspider [ -i instruct.txt ]

Important Directives

I have listed those you are most likely to use. There are more. If you copy the sample instruction file and change any of these that are needed, you should be ok.
Name infostructure_name
Specifies the infostructure name. This is used both to identify the infostructure in generated messages and also as the owner name for Owner traversals. The name is required for all tasks and must be a single word (no whitespace).
TopURL URL
Specifies the URL of the top of the infostructure to be traversed. In a Tree traversal, will only traverse files that start with this URL. Will still test links to other spots. The top URL is required for all tasks and must be a single word (no whitespace).
IndexURL URL
Specifies the URL of the HTML index file that will be produced for this task. This directive is required and the URL must be in absolute form.
IndexFile pathname.html
Specifies the pathname of the actual file for the HTML index. This directive is required and must specify a valid pathname. If the file already exists, it will be renamed pathname.old.html and a link to it will be included in the new index.
IndexTitle string
Specifies the character string to use as the HTML index title and also the subject line of any e-mail message. This directive is optional. If not present, the title will be "MOMspider Index for Name" where Name is the infostructure name.
EmailAddress email_addresses
Specifies the e-mail addresses to which an automatically generated message should be sent if one or more of the other Email directives below applies to any of the URLs tested during this task. This directive is optional only if no other Email directives are given. The format should be exactly the same as that given to the "To:" header when sending normal e-mail messages.
EmailBroken
Specifies that an e-mail message should be generated if any of the tested links in this task are found to be broken. This directive is optional and, if present, requires that EmailAddress also be given.
EmailRedirected
Specifies that an e-mail message should be generated if any of the tested links in this task are found to be redirected. This directive is optional and, if present, requires that EmailAddress also be given.
Exclude URLprefix
Specifies that the given URLprefix should be added to the Leaf Table such that all URLs encountered during this task's traversal which contain the given prefix will only be tested and not traversed. Multiple Exclude directives can be specified for any task. The IndexURL is automatically excluded at the beginning of every task.


If you learn of a better way to do or explain any of this, please send me mail.