A Quick Guide to momspider
This is just a quick introduction to the momspider. There is also
more detailed information available.
Momspider is a
web robot that checks the validity and freshness of links. It
will traverse a tree/site starting at a certain point and check to
make sure that each document referenced is accessible. Normally it is
constrained to only checking the existence of documents on other sites.
As the momspider can
use large amounts of memory and network bandwidth, please be careful
in your use. Also, since it is more efficient to run it once with
multiple trees than multiple times with one tree each, we will be
running it in a cron job. Once you have tested your
instruction file
and have fixed the problems uncovered, you can send a message to
webmaster@fortnet.org with
your instruction file enclosed and we will add it to the main
instruction file.
Run momspider with: momspider [ -i instruct.txt ]
Important Directives
I have listed those you are most likely to use. There are
more.
If you copy the
sample instruction file and change any of
these that are needed, you should be ok.
- Name infostructure_name
- Specifies the infostructure name. This is used both to identify
the infostructure in generated messages and also as the owner name
for Owner traversals. The name is required for all tasks and
must be a single word (no whitespace).
- TopURL URL
- Specifies the URL of the top of the infostructure
to be traversed. In a Tree traversal, will only
traverse files that start with this URL.
Will still test links to other spots.
The top URL is required for all tasks
and must be a single word (no whitespace).
- IndexURL URL
- Specifies the URL of the HTML index file that will be
produced for this task. This directive is required and the URL
must be in absolute form.
- IndexFile pathname.html
- Specifies the pathname of the actual file for the HTML index.
This directive is required and must specify a valid pathname.
If the file already exists, it will be renamed
pathname.old.html
and a link to it will be included in the new index.
- IndexTitle string
- Specifies the character string to use as the HTML index title and
also the subject line of any e-mail message. This directive
is optional.
If not present, the title will be "MOMspider Index for Name"
where Name is the infostructure name.
- EmailAddress email_addresses
- Specifies the e-mail addresses to which an automatically generated
message should be sent if one or more of the other Email directives
below applies to any of the URLs tested during this task. This
directive is optional only if no other Email directives are given.
The format should be exactly the same as that given to the "To:"
header when sending normal e-mail messages.
- EmailBroken
- Specifies that an e-mail message should be generated if any of the
tested links in this task are found to be broken. This directive
is optional and, if present, requires that EmailAddress also
be given.
- EmailRedirected
- Specifies that an e-mail message should be generated if any of the
tested links in this task are found to be redirected. This directive
is optional and, if present, requires that EmailAddress also
be given.
- Exclude URLprefix
- Specifies that the given URLprefix should be added to the
Leaf Table such that all URLs encountered
during this task's traversal which contain the given prefix will only
be tested and not traversed. Multiple Exclude directives can
be specified for any task. The IndexURL is automatically
excluded at the beginning of every task.
If you learn of a better way to do or explain any of this, please send
me mail.