sgmltrans

Langue: en

Version: 113497 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

sgmltrans - LT NSL program for converting nSGML files into different formats.

SYNOPSIS

usage: sgmltrans -r rulefile [-p] [filename ...]

DESCRIPTION

The material below may be out of date: consult LT XML documentation please.

Sgmltrans is a program for translating nSGML files into some other format (which could be nSGML or Latex or ...). It is loosely based on COST and other SGML programs, in that one specifies actions to do at SGML start tags, end tags and text content. In sgmltrans, these actions are restricted to printing some text to the output stream.

DESCRIPTION: Input/Output

Description of the input/output files involved in this program.
Input ==> An nSGML file : [<filename> or stdin]
Output ==> A text file : [stdout]

OPTIONS

-r <rulefile>
<rulefile> is the name of a file which contains a set of rules describing how the input nSGML file will be converted. See below for a description of the format of the rule file.
-p 
If specified, then sgmltrans will only print out what it thinks the rules are, and not process the input file. Used for debugging.

SGMLTRANS RULE FORMAT

The sgmltrans rule file consists of an ordered list of rules. A rule consists of an LT NSL query (see ltxml-query(5)) which describes the SGML elements to which the rule will apply; and a pair of strings, which specify the strings that will be printed when we encounter (a) a start tag for a matching element, and (b) when we encounter an end tag.
The format strings may contain special variables denoting the name of the SGML element and the values of attributes. These are $gi and $<attributeName>, where <attributeName> is the name of an attribute defined for the element. The strings ' and '\' will be replaced by a newline character and '' respectively. The lines containing format strings must start with a tab.

For example, given the rule:

        .*/W

               ""

               "/$TAG

the input file:

        <W TAG=A>The</W>

       <W TAG=B>cat</W>

will be converted into

        The/A

       cat/B

For each SGML element found in the input file, the rules are tried in their order in the rule file, until one is found whose query matches the element.

Default rules

Every rule file should contain a default rule which matches all elements, which will be used for elements which do not match any earlier rule. The default rule

        .*

               ""

               ""

prints nothing for elements which match it.

Data rules

Finally, rules can also be specified to apply particular transformations to text bodies of elements. A rule query which ends in These rules are called data rules. Instead of a pair of start/end format strings, data rules contain a set of text transformations, currently just literal strings, but hopefully in future general regular expressions, of the form

                "<searchString>" --> "<replacementString>"

So for example:

        .*/W/#

               "&#60;" --> "$<$"

could be useful if you were trying to produce Latex source from an SGML file.

BUGS AND RESTRICTIONS

This is still an experimental program. Thus it is not particularly efficient and its functionality is limited in a number of ways. We intend to improve it on the basis of experience. For more complex manipulation of SGML files see also sgrpg(1).

SEE ALSO

ltxml(1), mknsg(1), sggrep(1), sgrpg(1), ltxml-query(5)

AUTHOR

Henry Thompson (ht@cogsci.ed.ac.uk)
David McKelvie (dmck@cogsci.ed.ac.uk)

Language Technology Group, Human Communication Research Centre, Edinburgh University,
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND
Tel:(44) 131 650-4630
Fax:(44) 131 650-4587 email: dmck@cogsci.ed.ac.uk

Comments, suggestions, and bug reports are always welcome.