sgmltoken

Langue: en

Version: 113496 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

sgmltoken - a sample LT NSL program for tokenising the text in an

SYNOPSIS

usage: sgmltoken [-d ddb-file] [-u base-url] [input-file]

DESCRIPTION

The material below may be out of date: consult LT XML documentation please.

The input file to sgmltoken is an nSGML file which contains <TEXT...> and <P> elements. All text inside such <TEXT...> elements will be tokenised into 'words' and punctuation represented by <C> elements.

DESCRIPTION: Input/Output

Description of the input/output files involved in this program.
Input ==> An nSGML file : [<filename> or stdin]
Output ==> An nSGML file containing <C> elements for the 'words',
in addition to the existing markup: [stdout]

EXPECTED DOCTYPE of INPUT

The <!DOCTYPE> for the input file should contain at least the following:
       <!element text - - (#PCDATA|c|w|s|p)*>

       <!element p - - (#PCDATA|c|w|s)*>

       <!element s - - (w)*>

       <!attlist s     

       id      ID      #IMPLIED>

       <!element w - - (c)*>

       <!attlist w

               id      ID      #IMPLIED

               type    CDATA   #IMPLIED

               lemma   CDATA   #IMPLIED>

       <!element c - - (#PCDATA)*>

       <!attlist c

               id      ID      #IMPLIED

               rend    CDATA   #IMPLIED>

OPTIONS

-d <ddbfile>
is the name of a file containing a representation of a DTD. Can be used if the DTD is not specified (in a <?NSL DDB ...> statement) in the input document iself.

SEE ALSO

ltxml(1), mknsg(1), sgmlsb(1)

AUTHOR

Henry Thompson (ht@cogsci.ed.ac.uk)
David McKelvie (dmck@cogsci.ed.ac.uk)

Language Technology Group, Human Communication Research Centre, Edinburgh University,
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND
Tel:(44) 131 650-4630
Fax:(44) 131 650-4587 email: dmck@cogsci.ed.ac.uk

Comments, suggestions, and bug reports are always welcome.