Langue: en

Version: 384592 (fedora - 01/12/10)

Autres sections - même nom

Section: 5 (Format de fichier)


lgc - the lgs source file format for the lgc compiler


Source files of the Logiweb compiler lgc (lgc(1)) are expressed in the LoGiweb Source language (lgs). The lgs language allows to express mathematics in a seminatural style. To learn lgs, simply read the Logiweb source of the 'base' page at http://logiweb.eu/1.0/doc/pages/base/source.lgs. The comments in there give much more details than could reasonably be included here. Then read the 'lgc' page found same place. It documents the lgc compiler including lots of details on lgs. An overview is given in the following, however.


The lgc compiler translates lgs into Logiweb vectors, racks, and renderings. The Logiweb standard defines the format of Logiweb vectors and racks, and defines precisely how vectors are translated to racks. The Logiweb standard does not, however, define the lgs format. The lgc compiler is the compiler which happens to come with the Logiweb distribution and the lgs format happens to be the input format of the lgc compiler. But Logiweb does not consider lgs as part of the standard. Any compiler which produduces vectors, racks, and renderings may be used in connection with Logiweb. The Logiweb standard partially defines what a rendering is: A rendering is a file tree rooted at a 'rendering directory'. The rendering directory is supposed to contain a file named vector.lgw which contains the page in vector format, a file named rack.lgr which contains the page in rack format, and a subdirectory named page which contains the rendering of the page. Compilers for Logiweb are free to produce additional contents of the rendering directory such as an index.html file. Logiweb compilers are only required to (1) produce a vector.lgw file in Logiweb vector format, (2) to produce an associated rack.lgr file which is derived from vector.lgw in exactly the same way as lgc does, and (3) a 'page' subdirectory which is derived from rack.lgr in exactly the same way as lgc does.


Each lgs file is expressed in Unicode UTF-8. Lines may be terminated by LF (code 10), CR (code 13), CRLF (code 10 followed by code 13), or LFCR (code 13 followed by code 10). Internally, Logiweb uses LF for terminating lines. More specificially, plain text inside Logiweb vectors and Logiweb racks uses LF for terminating lines. The purpose of this is to ensure interoperability between different platforms. lgc translates to LF when reading lgs files and translates to host newline convention when producing renderings.


The only reserved character in lgs is the double quote character. The lgs language uses double quote characters for many different purposes. We shall refer to a sequence of two or more double quote characters as a 'multiquote' and to an isolated double quote character as a 'lone quote'. We shall refer to a multiquote followed by a non-quote as a 'directive'.


Comments start with ""{ or ""; directives (i.e. with two or more double quote characters followed by a left brace or a semicolon). Comments that start with ""; end at the end of the line. Comments that start with ""{ can span any number of lines. They end at the first ""} directive which has exactly the same number of double quote characters as the opening directive. This is an example of a comment:
    """{ A ""} ends a comment starting with ""{ """}
Note that the comment is enclosed in brace directives with three double quotes. The brace directives with two double quotes are part of the comment. Comments may occur anywhere except after a double quote since if it did then that double quote would be considered to be part of the directive. In particular, comments may occur inside strings and in the middle of keywords. If the first four characters of a file constitute the magic code "";; then the first line of the file is considered to be a 'header'. All hex characters from the magic code and up to the first non-hex character suggests what the reference of the page might be. Whenever a source file with a header is translated, the suggested reference is used if it fits the contents. Otherwise, a new reference is generated and the compiler writes the new reference back into the header. To use this facility, let your source file start with a line containing nothing but "";;. At first translation, a reference will be stored back in the header. After that, whenever you retranslate the source without having done changes to it, the page will get the same reference as last time it was translated. Without a header, the page will get a new time stamp at each translation.


The following is a wellformed lgs file:
     ""P my page
     ""R base
     " square
     "We have that "[[ 2 square ]]" is four."


Each lgs file must contain one ""P directive which defines the name of the page being defined. The page name comprises all characters from the directive until the end of the line. One may use a newline directive (""n) instead of the end of the line to delimit the page name. Lone quotes after the ""P directive have a special meaning described in the section named QUALIFIERS below. Comments in page names are ignored. Note that if the line defining the page name ends with a ""; comment then the end of line is ignored and the page name effectively continues on the next line. A similar remark holds for ""{ comments which spans several lines. By convention, the ""P directive of an lgs file should occur at the beginning of the file, possibly after a "";; header and a comment about copyright.


Each lgs file may contain zero, one, or more ""R directives. Each ""R directive names a page being referenced. The name of the referenced page comprises all characters from the ""R directive until the end of the line or until the first ""n directive, whatever comes first. The page named by the first ""R directive is reference number 1, the one named by the second is reference number 2, and so on. Implicitly, the page being defined is considered to be 'reference number 0'. Lone quotes after ""R directives have a special meaning described in the section named QUALIFIERS below. By convention, all ""R directives should come right after the ""P directive. Referenced pages may be pointed at in many, different ways. Some examples read:
 ""R file:/usr/share/logiweb/name/base/vector.lgw
 ""R file:~/.logiweb/name/base/vector.lgw
 ""R file:../name/base/vector.lgw
 ""R http://logiweb.eu/1.0/doc/pages/base/vector.lgw
 ""R base
 ""R lgw:017451CF6643931035C71796AC493D382EC8357EE9A390D5D6DBCDAA0806
The first three reference Logiweb vectors in the local file system, relative to the root directory, the home directory, and the current directory, respectively. The fourth one references a particular http url. The fifth makes a reference by name which is resolved by the 'namepath' parameter of the lgc compiler. The last one uses a Logiweb reference which is resolved by the 'path' parameter of the lgc compiler. See the 'lgc' Logiweb page or http://logiweb.eu/ for more details on references.


Each lgs file may contain zero, one, or more ""D directives. Each ""D directive defines zero, one, or more syntactical constructs. Each line following a ""D directive and until the first ""P, ""R, ""D, or ""B directive defines one syntactical construct (blank lines are ignored, though). In construct definitions, lone quotes serve as placeholders. Three examples of constructs read:
     " square
     " < "
     if " then " else "
The constructs above allow to write expressions like
     if 2 square < 3 square then 4 else 5
Each page has a Logiweb reference of about 30 bytes and each construct defined on a page has an index. The first construct defined has index 1, then second has index 2 and so on. Implicitly, the page name is also considered to be a construct. The page name has index 0. When a page defines a construct, that page is considered to be the 'home page' of the construct. Each Logiweb page is identified by its world wide unique Logiweb reference. Each Logiweb construct is uniquely identified by its index together with the reference of its home page. By convention, ""D sections come after the ""R sections.


One may assign a 'charge' to defined constructs. As an example, it is customary to assign a larger charge to addition than to multiplication such that e.g.
     2 * 3 + 4 * 5
     ( 2 * 3 ) + ( 4 * 5 )
A charge is the opposite of a priority such that constructs with high charge has low priority and vice versa. Charges are expressed as lists of integers, separated by dots. As an example, 2.-3.4 is an example of a charge. Charges are sorted lexicographically such that e.g.
     1.2.-1 < 1.2 < 1.2.2 < 2.1
When comparing two charges of different length, the shorter one is padded with zeros at the end. As an example 1.2 and 1.2.0 denote the same charge. One may include a charge between a ""D directive and the first newline character after it. The charge applies to all constructs introduced by the given ""D section. As an example, the following definitions assign charge 1.6 to multiplication and 1.8 to addition and subtraction:
     ""D 1.6
     " * "
     ""D 1.8
     " + "
     " - "
One may also give a charge indirectly. As an example, the following assigns the charge of multiplication to division:
     ""D " * "
     " / "
By convention, all constructs which neither start nor end by a lone quote should have charge zero. The page symbol always has charge zero. If no charge is given after a ""D directive then all constructs defined by the directive get charge zero. A charge is said to be odd/even if its last, nonzero element is odd/even. As an example, is odd. As a special case, charge zero is considered to be even. Constructs with even charge are preassociative. A preassociative construct is left associative in text written left to right, right associative in text written right to left, and counterclockwise associative in text written in clockwise spirals. Constructs with odd charge are postassociative. As an example, if subtraction has charge 1.8 then subtraction is preassociative. man pages are written left to right so preassociative means left associative here. Hence,
     6 - 2 - 3
     ( 6 - 2 ) - 3


The body of a page comprises all of an lgs file except comments, page name, references, and definitions. By convention, the body comes after the ""D sections. The ""B directive may be used to terminate a ""D section. Terminating a ""D section, however, implicitly starts or resumes the body section, so one may think of ""B as a 'body directive'. The body of a page is made up of constructs, strings, and body directives. The constructs may be constructs defined on the page itself or constructs defined on directly referenced pages. Directly referenced pages are those mentioned in ""R directives, as opposed to transitively referenced pages which are the directly referenced pages plus the pages transitively referenced by directly referenced pages.


The lgs language treats all characters almost equal, the exceptions being the characters in the range 0 to 32 (inclusive). Characters with codes 0-8, 11, and 14-31 are ignored. In the body and outside strings, any sequence of spaces (code 32), vertical tabs (code 9), line feeds (code 10), form feeds (code 12), and carriage returns (code 13) are treated as a single space character. Apart from that, space characters are treated like any other character. As an example, consider addition:
     ""D 1.6
     " + "
 The definition allows to interpret
     2   +   3
 as the sum of 2 and 3 whereas
 is unparseable due to missing spaces around the sum sign.
The la


Strings are arbitrary sequences of characters enclosed in string delimiters. A string can start with a lone quote or by a ""- directive. A string can end with a lone quote or a "". directive. The empty string, however, cannot be enclosed in lone quotes since that would produce two double quotes in a row which counts as the beginning of a directive. The "". directive, however, may be used both for ending a string and for representing the empty string. One can always tell from context which meaning "". has. The following four lines all represent an emtpy string.
The lgc compiler applies 'newline translation' to strings: CR, CRLF, LFCR, and FF are translated to LF, TAB characters are translated to space characters, and characters with codes below 32 (Space) other than TAB, LF, FF, and CR are removed. Each TAB character is translated to one and only one space character. To include characters like CR and TAB in strings, one has to use directives. Inside strings, one may use the following directives:
     ""- No character
     ""! Double quote
     ""f Form feed
     ""n Line feed
     ""r Carriage return
     ""t Horizontal tab
     ""x Characters given in hexadecimal (until period)
As an example of use of the ""x directive, "I""x4A4B4C.M" means "IJKLM".


The directives that can be used in the body are:
     ""# (until lone quote) include given file verbatim as a string
     ""$ (until lone quote) same, but with newline processing
     ""S include the lgs source text itself as a string
     ""N include name definitions
     ""C include charge definitions
For details on these directives, consult the lgc Logiweb page or http://logiweb.eu/. A short list of examples follow, however:
Include the Logiweb icon as a string of raw bytes. Keep the bytes as they are.
Include the given README as a string and apply newline translation to it.
Include the lgs source file itself as a string. Inclusion is like ""# but with a twist: If the lgs file does not start with a header, a line containing nothing but "";; is prepended. And if the lgs file does start with a header then all hex digits in the header are removed. The latter ensures that an lgs file with a header gives the same result if translated twice. The former ensures that if the source.lgs file generated as part of the rendering is retranslated then the result is identical to the result of the first translation. A README consists of plain text, so it is reasonable to apply newline processing. A png file contains binary data, so translation of CR to LF could corrupt the file. It is debatable how e.g. an html file should be included. An html file is near-plain without being completely plain. Furthermore, the html standard specifies CRLF to be used as line terminator. One may choose to include it with newline processing in which case one should remember to translate back to CRLF if writing it back to disk. Or one may choose to include it raw and consider the CRLFs to be part of the html format. Note that lgs has nothing which resembles #include of the C programming language: The three include directives of Logiweb only allow to include a file as a single string. Beta-test versions of Logiweb had a #include like feature, but the feature has been removed. The ""N directive expands into a list of definitions which records the relationship between construct indexes and construct names. The ""C directive expands into a list of definitions which records the relationship between construct indexes and construct charges. The body of a page should include one ""N and one ""C directive placed in a suitable context. Otherwise, information about construct names and charges are lost in translation. Look at the lgs sources of the pages that come with Logiweb for examples on how to use ""N and ""C.


When referencing pages one may run into the problem that two distinct constructs may have the same name. To cope with that, ""R directives allows constructs to be qualified. Qualifiers modify constucts as they are imported. After the ""R directive, one may list an arbitrary number of qualifiers before the reference, separated by lone quotes As an example, suppose the base page defines these constructs:
     if " then " else "
     " + "
Furhtermore, suppose a page references the base page using the following reference:
     ""R abc " def " base
The reference is to the base page and has qualifiers abc and def. With the reference above, one may refer to the if-then-else and the addition constructs under these names:
     abc if " then " else "
     def if " then " else "
     " abc + "
     " def + "
One may include the empty qualifier in the list of qualifiers. If the empty qualifier is included, it has to appear first. As an example, the reference
    ""R" abc " def " base
allows to reference the if-then-else construct under these names:
     if " then " else "
     abc if " then " else "
     def if " then " else "
As can be seen, each construct may be known under more than one name and distinct constructs may have the same name. If a name belongs to more than one construct, then lgc will protest if that name is used in the body. For more on qualifiers, including handling of spaces, see the lgc Logiweb page or http://logiweb.eu.


The frontend of the lgc compiler translates an lgs source text into a Logiweb vector. The Logiweb vector consists of a bibliography, a dictionary, and a body, c.f. logiweb(5). The bibliography consists of the references of all referenced pages, starting with reference zero (the reference of the page itself). The dictionary records the relationship between construct indexes and construct arities. The arity of a construct equals the number of lone quotes in the construct. The body is no more than the parse tree of the body expressed in Polish prefix. The codifier of the lgc compiler translates the vector to a rack. The renderer of the lgc compiler than translates the rack to a rendering. These translations have little to do with the lgs format. See the lgc Logiweb page or http://logiweb.eu/ for more.


Klaus Grue, http://logiweb.eu/


lgc(1), logiweb(5), http://logiweb.eu/