urifind.1p

Langue: en

Version: 2010-08-15 (ubuntu - 24/10/10)

Section: 1 (Commandes utilisateur)

NAME

urifind - find URIs in a document and dump them to STDOUT.

SYNOPSIS

     $ urifind file
 
 

DESCRIPTION

urifind is a simple script that finds URIs in one or more files (using "URI::Find"), and outputs them to to STDOUT. That's it.

To find all the URIs in file1, use:

     $ urifind file1
 
 

To find the URIs in multiple files, simply list them as arguments:

     $ urifind file1 file2 file3
 
 

urifind will read from "STDIN" if no files are given or if a filename of "-" is specified:

     $ wget http://www.boston.com/ -O - | urifind
 
 

When multiple files are listed, urifind prefixes each found URI with the file from which it came:

     $ urifind file1 file2
     file1: http://www.boston.com/index.html
     file2: http://use.perl.org/
 
 

This can be turned on for single files with the "-p" (``prefix'') switch:

     $urifind -p file3
     file1: http://fsck.com/rt/
 
 

It can also be turned off for multiple files with the "-n" (``no prefix'') switch:

     $ urifind -n file1 file2
     http://www.boston.com/index.html
     http://use.perl.org/
 
 

By default, URIs will be displayed in the order found; to sort them ascii-betically, use the "-s" (``sort'') option. To reverse sort them, use the "-r" (``reverse'') flag ("-r" implies "-s").

     $ urifind -s file1 file2
     http://use.perl.org/
     http://www.boston.com/index.html
     mailto:webmaster@boston.com
 
     $ urifind -r file1 file2
     mailto:webmaster@boston.com
     http://www.boston.com/index.html
     http://use.perl.org/
 
 

Finally, urifind supports limiting the returned URIs by scheme or by arbitrary pattern, using the "-S" option (for schemes) and the "-P" option. Both "-S" and "-P" can be specified multiple times:

     $ urifind -S mailto file1
     mailto:webmaster@boston.com
 
     $ urifind -S mailto -S http file1
     mailto:webmaster@boston.com
     http://www.boston.com/index.html
 
 

"-P" takes an arbitrary Perl regex. It might need to be protected from the shell:

     $ urifind -P 's?html?' file1
     http://www.boston.com/index.html
 
     $ urifind -P '\.org\b' -S http file4
     http://www.gnu.org/software/wget/wget.html
 
 

Add a "-d" to have urifind dump the refexen generated from "-S" and "-P" to "STDERR". "-D" does the same but exits immediately:

     $ urifind -P '\.org\b' -S http -D 
     $scheme = '^(\bhttp\b):'
     @pats = ('^(\bhttp\b):', '\.org\b')
 
 

To remove duplicates from the results, use the "-u" (``unique'') switch.

OPTION SUMMARY

-s
Sort results.
-r
Reverse sort results (implies -s).
-u
Return unique results only.
-n
Don't include filename in output.
-p
Include filename in output (0 by default, but 1 if multiple files are included on the command line).
-P $re
Print only lines matching regex '$re' (may be specified multiple times).
-S $scheme
Only this scheme (may be specified multiple times).
-h
Help summary.
-v
Display version and exit.
-d
Dump compiled regexes for "-S" and "-P" to "STDERR".
-D
Same as "-d", but exit after dumping.

AUTHOR

darren chamberlain <darren@cpan.org> (C) 2003 darren chamberlain

This library is free software; you may distribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

URI::Find