Boulder.3pm

Langue: en

Version: 2000-06-20 (fedora - 01/12/10)

Section: 3 (Bibliothèques de fonctions)

NAME

Boulder - An API for hierarchical tag/value structures

SYNOPSIS

    # Read a series of People records from STDIN.
    # Add an "Eligibility" attribute to all those whose
    # Age >= 35 and Friends list includes "Fred"
    
    use Boulder::Stream;
    
    my $stream = Boulder::Stream->newFh;
    
    while ( my $record = <$stream> ) {
       next unless $record->Age >= 35;
       my @friends = $record->Friends;
       next unless grep {$_ eq 'Fred'} @friends;
 
       $record->insert(Eligibility => 'yes');
       print $stream $record;
     }
 
 

Related manual pages:

   basics
   ------
   Stone            hierarchical tag/value records
   Stone::Cursor    Traverse a hierarchy
 
   Boulder::Stream  stream-oriented storage for Stones
   Boulder::Store   record-oriented storage for Stones
   Boulder::XML     XML conversion for Stones
   Boulder::String  conversion to strings
 
   genome-related
   ---------------
 
   Boulder::Genbank   parse Genbank (DNA sequence) records
   Boulder::Blast     parse BLAST (basic local alignment search tool) reports
   Boulder::Medline   parse Medline (pubmed) records
   Boulder::Omim      parse OMIM (online Mendelian inheritance in man) records
   Boulder::Swissprot parse Swissprot records
   Boulder::Unigene   parse Unigene records
 
 

DESCRIPTION

Boulder IO

Boulder IO is a simple TAG=VALUE data format designed for sharing data between programs connected via a pipe. It is also simple enough to use as a common data exchange format between databases, Web pages, and other data representations.

The basic data format is very simple. It consists of a series of TAG=VALUE pairs separated by newlines. It is record-oriented. The end of a record is indicated by an empty delimiter alone on a line. The delimiter is ``='' by default, but can be adjusted by the user.

An example boulder stream looks like this:

         Name=Lincoln Stein
         Home=/u/bush202/lds32
         Organization=Cold Spring Harbor Laboratory
         Login=lds32
         Password_age=20
         Password_expires=60
         Alias=lstein
         Alias=steinl
         =
         Name=Leigh Deacon
         Home=/u/bush202/tanager
         Organization=Cold Spring Harbor Laboratory
         Login=tanager
         Password_age=2
         Password_expires=60
         =
 
 

Notes:

(1)
There is no need for all tags to appear in all records, or indeed for all the records to be homogeneous.
(2)
Multiple values are allowed, as with the Alias tag in the second record.
(3)
Lines can be any length, as in a potential 40 Kbp DNA sequence entry.
(4)
Tags can be any alphanumeric character (upper or lower case) and may contain embedded spaces. Conventionally we use the characters A-Z0-9_, because they can be used without single quoting as keys in Perl associative arrays, but this is merely stylistic. Values can be any character at all except for the reserved characters {}=% and newline. You can incorporate binary data into the data stream by escaping these characters in the URL manner, using a % sign followed by the (capitalized) hexadecimal code for the character. The module makes this automatic.

Hierarchical Records

The simple boulder format can be extended to accomodate nested relations and other intresting structures. Nested records can be created in this way:
  Name=Lincoln Stein
  Home=/u/bush202/lds32
  Organization=Cold Spring Harbor Laboratory
  Login=lds32
  Password_age=20
  Password_expires=60
  Privileges={
    ChangePasswd=yes
    CronJobs=yes
    Reboot=yes
    Shutdown=no
  }
  =
  Name=Leigh Deacon
  Home=/u/bush202/tanager
  Organization=Cold Spring Harbor Laboratory
  Login=tanager
  Password_age=2
  Password_expires=60
  Privileges={
    ChangePasswd=yes
    CronJobs=no
    Reboot=no
    Shutdown=no
  }
  =
 
 

As in the original format, tags may be multivalued. For example, there might be several Privilege record assigned to a login account. Each subrecord may contain further subrecords.

Within the program, a hierarchical record is encapsulated within a ``Stone'', an opaque structure that implements methods for fetching and settings its various tags.

Using Boulder for I/O

The Boulder API was designed to make reading and writing of complex hierarchical records almost as easy as reading and writing single lines of text.
Boulder::Stream
The main component of the Boulder modules is Boulder::Stream, which provides a stream-oriented view of the data. You can read and write to Boulder::Streams via tied filehandles, or via method calls. Data records are flattened into a simple format called ``boulderio'' format.
Boulder::XML
Boulder::XML acts like Boulder::Stream, but the serialization format is XML. You need XML::Parser installed to use this module.
Boulder::Store
This is a simple persistent storage class which allows you to store several (thousand) Stone's into a DB_File database. You must have libdb and the Perl DB_File extensions installed in order to take advantage of this class.
Boulder::Genbank
Boulder::Unigene
Boulder::OMIM
Boulder::Blast
Boulder::Medline
Boulder::SwissProt
These are parsers and accessors for various biological data sources. They act like Boulder::Stream, but return a set of Stone objects that have certain prescribed tags and values. Many of these modules were written by Luca I.G. Toldo <luca.toldo@merck.de>.

Stone Objects

The Stone object encapsulates a set of tags and values. Any tag can be single- or multivalued, and tags are allowed to contain subtags to any depth. A simple set of methods named tags(), get(), put(), insert(), replace() and so forth, allows you to examine the tags that are available, get and set their values, and search for particular tags. In addition, an autoload mechanism allows you to use method calls to access tags, for example:
    my @friends = $record->Friends;
 
 

is equivalent to:

    my @friends = $record->get('Friends');
 
 

A Stone::Cursor class allows you to traverse Stones systematically.

A full explanation of the Stone class can be found in its manual page.

AUTHOR

Lincoln D. Stein <lstein@cshl.org>, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. This module can be used and distributed on the same terms as Perl itself.

SEE ALSO

Boulder::Blast, Boulder::Genbank, Boulder::Medline, Boulder::Unigene, Boulder::Omim, Boulder::SwissProt