webfgrep

Langue: en

Version: Feb 1999 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

webfgrep - a poor man's web search engine.

SYNOPSIS

webfgrep [-ahist] [-p prefix] -- [key1,...] html-files

DESCRIPTION

webfgrep uses memory mapped file access and can therefore search a large number of html pages in a short time. With webfgrep and a cgi-bin front-end it is possible to build a fast web search engine for small web sites with about 1Mb of html pages. You can specify up to 3 key word. A web page matches when it contains all 3 keys.

Please note that you must consider a number of important security issues when writing a cgi-bin front-end. The minimum security is to escape all non word characters ([^A-Z_a-z0-9]) before passing the search keys to the webfgrep command line. A better security mechanism would remove any "garbage characters" and use the -s option to feed the user input directly to webfgrep without passing this data to the shell. 2 sample cgi-bin front-ends are provided in the distribution of webfgrep. The sample cgi-bins are designed for searching English web pages but can easily be modified to search also web pages based on other character sets. You mainly need to address the issue of how characters that are specific to your language are represented in html format.

OPTIONS

-a
Anchor search, search whole words no substring search
-h
Prints a little help/usage information.
-i
Search case insensitive (works only with ISO-8859-1 character sets)
-p prefix
Path prefix to add when displaying the result
-s
Read the keys form stdin rather than from the command line.
-t
Text output (default is html)

EXAMPLE

Search for the complete words guido and File in all webpages in the current directory. Only web pages that contain both words do match:
webfgrep -a -p http://some.hostname.com/ -- guido,File *.html

Search all html files for the sub string Linux:
(cd /home/http/html;webfgrep -p http://some.hostname.com/ -- linux `find . -name '*.htm*' -print`)

BUGS

no known bugs

AUTHOR

Guido Socher (guido.s@writeme.com)

SEE ALSO

fgrep(1), hrefgrep(1), srcgrep(1), blnkcheck(1), lshtmlref(1), taggrep(1)