This section describes the formats of the various LinkScan configuration files:
Purpose: The primary LinkScan system configuration file
Location: The LinkScan directory only
Required: Always
Applies to: All configured Projects
Inheritance: Not applicable
LicenseNumber: Your 10-digit License Number provided
with your License Key.
[Example: LicenseNumber = 2000000000 ]
Licensee: Your name (or company name) provided with
your License Key. This field is case and whitespace sensitive.
[Example: Licensee = Electronic Software Publishing ]
Key: Your License Key.
[Example: Key = 1234:12345:12345:1234 ]
Linkscandir: The absolute pathname to the directory
in which you installed the LinkScan files. See
Pathnames for a definition of the
correct pathname formats.
[Example: Linkscandir = /home/elsop/public_html/linkscan/ ]
Linkscanurl: The absolute URL to the directory in which
you installed the LinkScan files. See
URL's for a definition of the correct URL format. It is essential
that Linkscandir and Linkscanurl point to the same directory using
the absolute pathname and URL respectively.
[Example: Linkscanurl = http://www.elsop.com/linkscan/ ]
Perlpath: The absolute pathname to the 'Perl 5'
executable on your server.
[Default: Perlpath = /usr/local/bin/perl ]
Sendmailpath: The absolute pathname to the 'sendmail'
executable on your server, including any command line parameters.
[Default: Sendmailpath = /usr/lib/sendmail -t]
Smtphost: The hostname of your SMTP mail server (used by
LinkScan/Dispatch). Reports will be mailed to 'Owner@Smtphost' unless
you configure a Mailalias.
[Default: none]
Sortpath: The absolute pathname to the 'sort' utility
executable on your server. You must include the input and output
file parameters appropriate to your version of "sort" using the
strings '*in' and '*out'. The default setting is
designed for use with the GNU textutils version of 'sort'.
[Unix Default: Sortpath = /usr/bin/sort -o *out *in ]
[Windows Default: Sortpath = sort > *out < *in ]
Weblintpath: The absolute pathname to the "weblint"
executable on your server.
[Default: Weblintpath = /usr/local/bin/weblint ]
Weblintoptions: Any special command line options
for weblint.
[Default: Weblintoptions = -x netscape ]
Server: On Windows systems, enable the LinkScan
HTTP Server.
[Unix Default: Server = 0 ]
[Windows Default: Server = 1 ]
Wwwpath: The base pathname for the LinkScan HTTP Server.
[Example: Wwwpath = C:/Www/ ]
Wwwurl: The base URL for the LinkScan HTTP Server.
[Example: Wwwurl = http://localhost/ ]
Msiis: Set to 1 if using the Microsoft IIS Server.
[Default: Msiis = 0 ]
Cgibinpath: Pathname where the LinkScan CGI scripts are
installed.
[Example: Cgibinpath = http://www.elsop.com/cgi-bin/ ]
Cgibinurl: If the LinkScan CGI scripts are moved to a
special directory (e.g. 'cgi-bin/') this parameter must be set to the
the absolute URL of that directory. If the LinkScan CGI scripts
are installed in the LinkScan directory (e.g. 'linkscan/') this
parameter must be set equal to Linkscanurl.
[Example: Cgibinurl = http://www.elsop.com/cgi-bin/ ]
Docspath: The absolute pathname to the directory where
the LinkScan documentation is installed.
[Example: Docspath = /home/elsop/public_html/linkscan/docs/ ]
Docsurl: The absolute URL to the Docspath directory.
[Example: Docsurl = http://www.elsop.com/linkscan/docs/ ]
Reportsdir: The absolute pathname to the directory where
any command-line generated reports are to be stored.
[Example: Reportsdir = /home/elsop/public_html/linkscan/reports/ ]
Reportsurl: The absolute URL to the Reportsdir directory.
[Example: Reportsurl = http://www.elsop.com/linkscan/reports/ ]
Wrapperurl: Must be left blank unless your server requires
the use of a cgi "wrapper". In such cases, you must enter the URL of the
system "wrapper" together with any required parameters. For example:
[Example: Wrapperurl = http://www.elsop.com/htbin/cgiwrap?user=elsop&script=linkscan.cgi ]
Defaultpages: When the LinkScan HTTP Server encounters a
reference to a directory without an explicit filename, it will search
for one of the following files (in the order specified).
[Default: Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm ]
Proxyserver: If your access to the Web is via a Proxy
Server, enter the hostname of your HTTP Proxy Server here. Do not
include the "http://".
[Default: none]
Proxyport: The Port number of your HTTP Proxy Server.
[Default: Proxyport = 80]
Noproxy: A comma delimited list of domains and/or
servers that should be accessed directly, bypassing any
configured Proxy Server.
[Default: none]
Example: Noproxy = www.xyz.com,mydomain.com
The above entry will cause LinkScan to bypass the specific server "www.xyz.com" and all servers within the domain "mydomain.com" (e.g. www.mydomain.com, internal.mydomain.com etc.)
Proxyauth: Enter the username and password if your
Proxy Server requires HTTP/1.1 Proxy-Authentication. This information
must be entered exactly as show below.
[Default: none]
Example: Proxyauth = "username:password"
Notes: If you do not know how to configure your Proxy Servers, consult your network administrator. You may also be able to determine the information by examining the configuration of your existing web browser. Users that wish to configure a SOCKS Server should consult the following Application Note.
FTPUser: The username to be used for anonymous FTP.
[Default: FTPUser = anonymous]
FTPPass: The password to be used for anonymous FTP.
It is normally considered "good manners" to use your Email address
as the password for anonymous FTP.
[Default: FTPPass = me@mydomain.com]
Masterhist: Instructs LinkScan to maintain a Global
history file for external links that is shared between all Projects.
[Default: Masterhist = 1 ]
Maxhist: The maximum number of entries maintained
in the History File for each external link.
[Default: Maxhist = 10 ]
Maxgoodhours: The maximum number of hours between
attempts to retest good external links. The scanning of URL's
that have been checked within the specified period is skipped
and the LinkScan Reports display the Status Code from the
prior test (which is highlighted with a "*".
[Default: Maxgoodhours = 4 ]
Maxbadhours: The maximum number of hours between
attempts to retest bad external links. The scanning of URL's
that have been checked within the specified period is skipped
and the LinkScan Reports display the Status Code from the
prior test (which is highlighted with a "*".
[Default: Maxbadhours = 0 (Disabled) ]
Timeout1: The timeout (in seconds) that LinkScan
uses for its first attempt to contact the target site.
[Default: Timeout1 = 20 ]
Timeout2: The timeout (in seconds) that LinkScan
uses for its second attempt to contact the target site.
[Default: Timeout2 = 40 ]
Masterport: A TCP/IP Port Number that is used
internally by LinkScan.
[Default: Masterport = 8010 ]
Dprocs: The number of processes used to test
external links when operating in the "Normal" (default) mode.
[Default: Dprocs = 3 ]
Fprocs: The number of processes used to test
external links when operating in the "Fast" mode (selected
with the -fast option).
[Default: Fprocs = 12 ]
Notapmapoptions: This parameter controls access to the LinkScan/TapMap feature and it Options Menu. The following settings are valid:
[Default: Notapmapoptions = 0 (Unrestricted access) ]
Noprojectlist: Setting this parameter to 1 will
force the user to enter a Project Name in a text box on the
LinkScan Reports rather than displaying a drop-down list of
configured Projects from the 'linkscan.mas' file.
[Default: Noprojectlist = 0 (Drop-down list enabled) ]
Httpauth: The name of an Environment Variable
which, if present, will be used to set the current
Username. This may be used to integrate LinkScan's
access controls with HTTP Authentication schemes.
[Default: Httpauth = REMOTE_USER ]
Serverdeny: The LinkScan HTTP Server will deny access
from IP addresses that match this Regular Expression.
[Default: Serverdeny = ]
Serverallow: The LinkScan HTTP Server will allow access
from IP addresses that match this Regular Expression.
[Default: Serverallow = ]
Note: Serverdeny is processed before Serverallow
Serverauth: The LinkScan HTTP Server will force basic
authentication.
[Default: Serverauth = ]
[Example: Serverauth = username:password ]
Serverindex: Controls whether the LinkScan HTTP
Server will display a directory listing when no index page is
present.
[Default: Serverindex = 1]
Access: Access commands are used to control user access to the LinkScan CGI scripts. Multiple Access commands are permitted and the format of each is:
Access = username : password : project-list : owner-list : menu-options
An asterisk character may be used as a wildcard for any or all of the above parameters.
Indeed, a default LinkScan installation will create an entry providing unrestricted access:
Access = * : * : * : * : *
See How to define new Users and their access controls for more examples.
Other required parameters and settings are included in linkscan.sys. We suggest you do not modify these unless you are completely familiar with LinkScan's features.
Purpose: To maintain a list of configured Projects
Location: The LinkScan directory only
Required: When multiple Projects are configured
Applies to: All configured Projects
Inheritance: Not applicable
This file contains a one line entry for each configured Project. The syntax is:
directory-name [*]
You may configure additional Projects manually or ask LinkScan to help you by executing:
perl linkscan.pl -newproject my-new-project
See How to define new Projects or remove old ones for more examples.
Purpose: The Project configuration file
Location: The LinkScan directory and the Project Directory
Required: Always
Applies to: The selected Project
Inheritance: The LinkScan directory is checked first. Those
settings are overriden by the configuration file in the selected
Project Directory
Homedir: The absolute pathname to the
Home Directory of the web site to be tested.
[Example: /home/elsop/public_html/ ]
Homeurl: The URL to the Home Directory
of the web site to be tested. It is essential that Homedir and Homeurl point
to the same directory using the absolute pathname and URL respectively.
[Example: http://www.elsop.com/ ]
Homefile: The relative pathname of your
Home Page. If Homefile is left blank or points at a directory entry, LinkScan will
search for the list of filenames specified in
Defaultpages.
[Default: index.html]
Organization: A simple description of this
Project, as you wish it to appear on the LinkScan menus
and reports.
[Example: Organization = Electronic Software Publishing ]
To select a Home Page that is beneath the server root directory use:
Do not use:
Autohttp: Automatically attempt HTTP Access to a
a document if File System Access fails.
[Default: 0 ]
Casesensitive: Pathnames on the local server
are assumed to be case sensitive.
[Default: Casesensitive = 1 ]
Htmlfiles: A comma delimited list of file extensions.
Files with these extensions are considered to be HTML files that
are thoroughly explored by LinkScan. LinkScan checks for the
existence of other file types but does not examine them for links
to other files. The special token NULL may be used to include files
without any extension.
[Default: Htmlfiles = html, shtml, htm]
Mapfiles: A comma delimited list of file extensions.
Files with these extensions are considered to be server-side
imagemap files.
[Default: Mapfiles = map ]
Pdffiles: A comma delimited list of file extensions.
Files with these extensions are considered to be PDF documents.
[Default: Pdffiles = ]
Defaultpages: If LinkScan finds a reference to a
directory without any specific file names, it will search for
files of this name (in the order specified).
[Default: Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm ]
Indexoptions: Controls the behavior of LinkScan
when it finds a reference to a directory that does not contain
a valid index page (see Defaultpages above). If
Indexoptions is set to zero, LinkScan will report a
"No_Default_Page_Found" error. If Indexoptions is set to one,
LinkScan will automatically generate a "Virtual Page" based on
a directory listing and follow the links accordingly.
[Default: Indexoptions = 0 ]
For flexibility, the following linkscan.sys parameters may be overriden within the Project-specific linkscan.cfg file:
Timeout1
Timeout2
Dprocs
Nprocs
Masterport
Multiple customization commands may be included in either the Global linkscan.cfg file (applies to all Projects) and/or the Project linkscan.cfg file (applies to that specific Project). Please see Customizing LinkScan for details.
Alias relative-path-expression absolute-path-expression
Auth server-name "realm-name" username password
Cookie server-name cookie-value
Defaultowner owner-name
Exclude relative-path-expression
Exclude absolute-url-expression
Execute relative-path-expression
Extraheader http-header-line
Hostalias server-url
Mailalias owner-name list-of-addresses
Nofollow relative-path-expression
Noorphans relative-path-expression
Onlyfollow relative-path-expression
Onlyinclude relative-path-expression
Onlyorphans relative-path-expression
Owner relative-path-expression owner-name
Owner *1
Redirect relative-path-expression absolute-url-expression
Selecturl expression
Statuscode = statuscode, severity
Multiple customization commands may be included in either the Global linkscan.cfg file (applies to all Projects) and/or the Project linkscan.cfg file (applies to that specific Project). Please see How To Customize the SiteMap/TapMap for details.
Mapdefaulttitle, [ string ] [ !PATH | !FILE ] [ string ]
Mapinclude, relative-path-expression
Maphide, relative-path-expression
Mapmove, relative-document-path, new-parent-relative-path, position, new-title
Maptitle, relative-document-path, new-title
In a default configuration, the following parameters are set in the Global linkscan.cfg file. You may edit these settings to reconfigure All Projects, or insert one of more of these commands in an individual Project linkscan.cfg file to modify a specific Project.
Httpdlogfile: The absolute pathname to the HTTPD access
log file on your server. This optional feature will allow you to display
per-document hit counts on the SiteMap reports.
[Default: none]
Expandssi: SSI Include tags are expanded when using File
System navigation. SSI EXEC/CGI tags are not processed.
[Default: Expandssi = 1 ]
Collectmeta: Reserved for future use.
[Default: Collectmeta = 0 ]
Maxgoodint: The maximum number of links to any
given document that are stored in the database for Good
Internal Links.
[Default: Maxgoodint = 100 ]
Maxbadint: The maximum number of links to any
given document that are stored in the database for Bad
Internal Links.
[Default: Maxbadint = 100 ]
Maxext: The maximum number of links to any
given URL that are stored in the database for External Links.
[Default: Maxext = 100 ]
Maxservertries: The maximum number of links that
should be tested on any given server when that server is
apparently "dead". Once this limit is exceeded, all other
links to that server are skipped and assigned an
URL Skipped - Bad Server (801)
Status Code.
[Default: Maxservertries = 10 ]
Maxcgi: The maximum number of times any single URL
should be probed with different query parameters. This prevents
LinkScan from trying to validate a CGI script or dynamic page
with a potentially infinite number of query parameters.
[Default: Maxcgi = 10 ]
Maxftp: The maximum number of links to any single
FTP server that should be validated. Once this limit is exceeded,
all other FTP links to that server are skipped and assigned an
URL Skipped - FTP Limit (802)
Status Code.
[Default: Maxftp = 10 ]
Note: See the Controlling Excessively Frequent Testing and Controlling Duplicate Links Sections of this document for further discussion.
Maxdirlevels: Limit the File System scan for orphaned
files to "n" levels from www root.
[Default: 10 ]
Linkprefix: On the LinkScan Reports, prefix all
Internal Link names with this string.
[Default: none ]
Hidelinkprefix: On the LinkScan Reports, remove
this string from the prefix of all Internal Link names.
[Default: none ]
Ownerdir: Name of a subdirectory (within the current
Project directory) where LinkScan/Dispatch will save reports as
'Owner.*'.
[Default: users/ ]
Mailheadtext: The contents of this file (in the
current Project Directory) will be included in the header of
each LinkScan/Dispatch Text report.
[Default: Mailheadtext = mailhead.txt ]
Mailfoottext: The contents of this file (in the
current Project Directory) will be included in the footer of
each LinkScan/Dispatch Text report.
[Default: Mailfoottext = mailfoot.txt ]
Mailheadhtml: The contents of this file (in the
current Project Directory) will be included in the header of
each LinkScan/Dispatch HTML report.
[Default: Mailheadhtml = mailhead.html ]
Mailfoothtml: The contents of this file (in the
current Project Directory) will be included in the footer of
each LinkScan/Dispatch HTML report.
[Default: Mailfoothtml = mailfoot.html ]
Maxsev: Maximum severity of errors to include
in LinkScan/Dispatch reports.
[Default: Maxsev = 3 ]
Sorterr: Sort LinkScan/Dispatch reports so that
documents with the most errors are listed first rather than
listing documents in alphabetical order.
[Default: Sorterr = 1 ]
Mailnoerr: When creating LinkScan/Dispatch Email
reports, send mail to File Owners even if there are no errors.
[Default: Mailnoerr = 0 ]
Ownertags: Allow users to override the per-document
ownership attributes through the
Special HTML Owner tag.
[Default: Ownertags = 1 ]
The mime.types file controls the MIME-type header that the LinkScan Server transmits for each request based on the file extension of the requested document/file. The version of the mime.types file installed with LinkScan includes most of the common/standard associations. For example:
# MIME type Extension text/html shtml html htm
This entry causes the LinkScan Server to transmit the following HTTP response header with each request for a .htm, .html or .shtml file:
Content-Type: text/html
# LINKSCAN CUSTOMIZATION FILE - LINKSCAN.REP
#
# Lines beginning with "#" are comments
#
# Purpose: Select options for command line reports
# Location: The LinkScan Project Directory
# Required: Yes (for command line reports)
# Applies to: The selected Project
# Inheritance: Not applicable
#
# DO NOT EDIT SECTION HEADERS - lines within [square brackets]
#
[sr Summary/Detail Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Unclean = 0 # 0 = List all documents; 1 = List only documents with errors
Sort = 1 # 1 = Most errors first; 2 = Alphabetically; 3 = Newest first
# 4 = Least errors first; 5 = Reverse alphabetically; 6 = Oldest first
Incl = # relative-path-expression
Excl = # relative-path-expression
[xr Summary Statistics Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
[dr Detailed Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Intext = 3 # 1 = Internal only; 2 = External only; 3 = Internal and External
Sev0 = 0 # 1 = Display No Status
Sev1 = 1 # 1 = Display Errors
Sev2 = 1 # 1 = Display Possible Errors
Sev3 = 1 # 1 = Display Warnings
Sev4 = 0 # 1 = Display Advisories
Sev5 = 0 # 1 = Display Good Links
Sort = 1 # 1 = By referer; 2 = By status code; 3 = By links alphabetically
Match = 3 # 1 = Match on referer; 2 = Match on target; 3 = Match on either
Incl = # relative-path-expression
Excl = # relative-path-expression
[cr Selected Status Codes Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Intext = 3 # 1 = Internal only; 2 = External only; 3 = Internal and External
Stat1 = 1 # Good HTML Files
Stat2 = 1 # Missing HTML Files
Stat3 = 1 # Good non-HTML Files
Stat4 = 1 # Missing non-HTML Files
Stat5 = 1 # Good Anchors
Stat6 = 1 # Missing Anchors
Stat7 = 1 # Unsafe Characters
Stat8 = 1 # Status Unknown
Stat9 = 1 # Good URL
Stat10 = 1 # Moved Permanently
Stat11 = 1 # Moved Temporarily
Stat12 = 1 # Trailing Missing from URL
Stat13 = 1 # Server Not Found - No DNS Entry
Stat14 = 1 # URL Not Found
Stat15 = 1 # Timed Out
Stat16 = 1 # Other
Sort = 1 # 1 = By referer; 2 = By status code; 3 = By links alphabetically
Match = 3 # 1 = Match on referer; 2 = Match on target; 3 = Match on either
Incl = # relative-path-expression
Excl = # relative-path-expression
[mr SiteMap Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Custom = 0 # 1 = Use alternate header/footer
Levels = 10 # Maximum number of levels to display
Filenames = 0 # 1 = Display relative-path on report
Truncate = 100 # Maximum line length (characters)
Decimal = 1 # 1 = Display dot-decimal notation
Font = -1 # Relative font size for titles
New = 1 # 1 = Flag new files
Newdays = 5 # Defines "New" (in days)
Anchors = 1 # 1 = Display anchors (Link order format only)
Indent = 0 # Number of spaces to indent (default is to use tab)
Files = 0 # 1 = Display file size and date on report
Linkmap = 0 # 0 = Use directory structure format; 1 = Link order format
[hr Site History Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Incl = # absolute-url-expression
[or Orphaned Files Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Sort = 2 # 2 = Alphabetically; 3 = Newest first
# 5 = Reverse alphabetically; 6 = Oldest first
Incl = # relative-path-expression
Excl = # relative-path-expression
[ar All Files Linking to ... Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Intext = 3 # 1 = Internal only; 2 = External only; 3 = Internal and External
Match = 5 # 4 = Exact match; 5 = Partial match
Incl = # relative-path-expression | absolute-usr-expression
[rr Redirections Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
[pr System Configuration Report]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
[qr LinkScan/QuickCheck]
Html = 1 # 0 = TEXT format; 1 = HTML format
Graphics = 1 # 0 = Graphics off; 1 = Graphics on
Sev0 = 0 # 1 = Display No Status
Sev1 = 1 # 1 = Display Errors
Sev2 = 1 # 1 = Display Possible Errors
Sev3 = 1 # 1 = Display Warnings
Sev4 = 0 # 1 = Display Advisories
Sev5 = 0 # 1 = Display Good Links
Source = 1 # 1 = Display full source code
Linkscan = 1 # 1 = Display link status
Weblint = 1 # 1 = Display weblint errors
Combo = 1 # 1 = Combined format
Http = 2 # 0 = Read via file system; 2 = Read via HTTP; 3 = Automatic
Now = 0 # 0 = Link status from database; 1 = Check link status now
This tab-delimited file contains an audit trail of each scan on a per Project basis and it may be imported into spreadsheets or other applications for management reports. The file is formated with one record per scan. The data fields are tab delimited and include:
Field 0 LinkScan Version Number Field 1 Date and Time of Scan (Seconds since 00:00:00 UTC, January 1, 1970) Field 2 Total HTML Documents Scanned Field 3 Total HTML Documents Missing Field 4 Total HTML Documents Containing Hard Errors Feild 5 Total non-HTML Files Scanned Field 6 Total non-HTML Files Missing Field 7 Total Anchors Found Field 8 Total Anchors Broken Field 9 External URL's - Total Checked Field 10 External URL's - Errors Field 11 External URL's - Possible Errors Field 12 External URL's - Warnings Field 13 Total Orphaned Files
These data items correspond to those displayed on the Summary Statistics Report.
[Previous] [Contents] [QuickRef] [Next]
Electronic Software Publishing Corporation (Elsop)
[ Elsop ] -
[ About ] -
[ Contact ] -
[ LinkScan ] -
[ SiteMap ]
© Copyright 1997-99 Electronic
Software Publishing Corporation
Updated: November 28, 1999