Bio::SeqIO::table.3pm

Langue: en

Version: 2010-05-19 (ubuntu - 24/10/10)

Section: 3 (Bibliothèques de fonctions)

NAME

Bio::SeqIO::table - sequence input/output stream from a delimited table

SYNOPSIS

   #It is probably best not to use this object directly, but
   #rather go through the SeqIO handler system. Go:
 
   $stream = Bio::SeqIO->new(-file => $filename, -format => 'table');
 
   while ( my $seq = $stream->next_seq() ) {
         # do something with $seq
   }
 
 

DESCRIPTION

This class transforms records in a table-formatted text file into Bio::Seq objects.

A table-formatted text file of sequence records for the purposes of this module is defined as a text file with each row corresponding to a sequence, and the attributes of the sequence being in different columns. Columns are delimited by a common delimiter, for instance tab or comma.

The module permits specifying which columns hold which type of annotation. The semantics of certain attributes, if present, are pre-defined, e.g., accession number and sequence. Additional attributes may be added to the annotation bundle.

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.
   bioperl-l@bioperl.org                  - General discussion
   http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
 
 

Support

Please direct usage questions or support issues to the mailing list:

bioperl-l@bioperl.org

rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution.

Bug reports can be submitted via email or the web:

   http://bugzilla.open-bio.org/
 
 

AUTHOR - Hilmar Lapp

Email hlapp at gmx.net

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

new

  Title   : new
  Usage   : $stream = Bio::SeqIO->new(-file => $filename, -format => 'table')
  Function: Returns a new seqstream
  Returns : A Bio::SeqIO stream for a table format
  Args    : Named parameters:
 
              -file    name of file to read
              -fh      filehandle to attach to
              -comment leading character(s) introducing a comment line
              -header  the number of header lines to skip; the first
                       non-comment header line will be used to obtain
                       column names; column names will be used as the
                       default tags for attaching annotation.
              -delim   the delimiter for columns as a regular expression;
                       consecutive occurrences of the delimiter will
                       not be collapsed.
              -display_id the one-based index of the column containing
                       the display ID of the sequence
              -accession_number the one-based index of the column
                       containing the accession number of the sequence
              -seq     the one-based index of the column containing
                       the sequence string of the sequence
              -species the one-based index of the column containing the
                       species for the sequence record; if not a
                       number, will be used as the static species
                       common to all records
              -annotation if provided and a scalar (but see below), a
                       flag whether or not all additional columns are
                       to be preserved as annotation, the tags used
                       will either be 'colX' if there is no column
                       header and where X is the one-based column
                       index, and otherwise the column headers will be
                       used as tags;
 
                       if a reference to an array, or a square
                       bracket-enclosed string of comma-delimited
                       values, only those columns (one-based index)
                       will be preserved as annotation, tags as before;
 
                       if a reference to a hash, or a curly
                       braces-enclosed string of comma-delimited key
                       and value pairs in alternating order, the keys
                       are one-based column indexes to be preserved,
                       and the values are the tags under which the
                       annotation is to be attached; if not provided or
                       supplied as undef, no additional annotation will
                       be preserved.
              -colnames a reference to an array of column labels, or a
                       string of comma-delimited labels, denoting the
                       columns to be converted into annotation; this is
                       an alternative to -annotation and will be
                       ignored if -annotation is also supplied with a
                       valid value.
              -trim    flag determining whether or not all values should
                       be trimmed of leading and trailing white space
                       and double quotes
 
            Additional arguments may be used to e.g. set factories and
            builders involved in the sequence object creation (see the
            POD of Bio::SeqIO).
 
 

next_seq

  Title   : next_seq
  Usage   : $seq = $stream->next_seq()
  Function: returns the next sequence in the stream
  Returns : Bio::Seq::RichSeq object
  Args    :
 
 

comment_char

  Title   : comment_char
  Usage   : $obj->comment_char($newval)
  Function: Get/set the leading character(s) designating a line as
            a comment-line.
  Example :
  Returns : value of comment_char (a scalar)
  Args    : on set, new value (a scalar or undef, optional)
 
 

header

  Title   : header
  Usage   : $obj->header($newval)
  Function: Get/set the number of header lines to skip before the
            rows containing actual sequence records.
 
            If set to zero or undef, means that there is no header and
            therefore also no column headers.
 
  Example :
  Returns : value of header (a scalar)
  Args    : on set, new value (a scalar or undef, optional)
 
 

delimiter

  Title   : delimiter
  Usage   : $obj->delimiter($newval)
  Function: Get/set the column delimiter. This will in fact be
            treated as a regular expression. Consecutive occurrences
            will not be collapsed to a single one.
 
  Example :
  Returns : value of delimiter (a scalar)
  Args    : on set, new value (a scalar or undef, optional)
 
 

attribute_map

  Title   : attribute_map
  Usage   : $obj->attribute_map($newval)
  Function: Get/set the map of sequence object initialization
            attributes (keys) to one-based column index.
 
            Attributes will usually need to be prefixed by a dash, just
            as if they were passed to the new() method of the sequence
            class.
 
  Example :
  Returns : value of attribute_map (a reference to a hash)
  Args    : on set, new value (a reference to a hash or undef, optional)
 
 

annotation_map

  Title   : annotation_map
  Usage   : $obj->annotation_map($newval)
  Function: Get/set the mapping between one-based column indexes
            (keys) and annotation tags (values).
 
            Note that the map returned by this method may change after
            the first next_seq() call if the file contains a column
            header and no annotation keys have been predefined in the
            map, because upon reading the column header line the tag
            names will be set automatically.
 
            Note also that the map may reference columns that are used
            as well in the sequence attribute map.
 
  Example :
  Returns : value of annotation_map (a reference to a hash)
  Args    : on set, new value (a reference to a hash or undef, optional)
 
 

keep_annotation

  Title   : keep_annotation
  Usage   : $obj->keep_annotation($newval)
  Function: Get/set flag whether or not to keep values from
            additional columns as annotation.
 
            Additional columns are all those columns in the input file
            that aren't referenced in the attribute map.
 
  Example :
  Returns : value of keep_annotation (a scalar)
  Args    : on set, new value (a scalar or undef, optional)
 
 

annotation_columns

  Title   : annotation_columns
  Usage   : $obj->annotation_columns($newval)
  Function: Get/set the names (labels) of the columns to be used for
            annotation.
 
            This is an alternative to using annotation_map. In order to
            have any effect, it must be set before the first call of
            next_seq(), and obviously there must be a header line (or
            row) too giving the column labels.
 
  Example :
  Returns : value of annotation_columns (a reference to an array)
  Args    : on set, new value (a reference to an array of undef, optional)
 
 

trim_values

  Title   : trim_values
  Usage   : $obj->trim_values($newval)
  Function: Get/set whether or not to trim leading and trailing
            whitespace off all column values.
  Example :
  Returns : value of trim_values (a scalar)
  Args    : on set, new value (a scalar or undef, optional)
 
 

Internal methods

All methods with a leading underscore are not meant to be part of the 'official' API. They are for use by this module only, consider them private unless you are a developer trying to modify this module.

_attribute_map

  Title   : _attribute_map
  Usage   : $obj->_attribute_map($newval)
  Function: Get only. Same as attribute_map, but zero-based indexes.
 
            Note that any changes made to the returned map will change
            the map used by this instance. You should know what you are
            doing if you modify the returned value (or if you call this
            method in the first place).
 
  Example :
  Returns : value of _attribute_map (a reference to a hash)
  Args    : none
 
 

_annotation_map

  Title   : _annotation_map
  Usage   : $obj->_annotation_map($newval)
  Function: Get only. Same as annotation_map, but with zero-based indexes.
 
            Note that any changes made to the returned map will change
            the map used by this instance. You should know what you are
            doing if you modify the returned value (or if you call this
            method in the first place).
 
  Example :
  Returns : value of _annotation_map (a reference to a hash)
  Args    : none
 
 

_header_skipped

  Title   : _header_skipped
  Usage   : $obj->_header_skipped($newval)
  Function: Get/set the flag whether the header was already
            read (and skipped) or not.
  Example :
  Returns : value of _header_skipped (a scalar)
  Args    : on set, new value (a scalar or undef, optional)
 
 

_next_record

  Title   : _next_record
  Usage   :
  Function: Navigates the underlying file to the next record.
 
            For row-based records in delimited text files, this will
            skip all empty lines and lines with a leading comment
            character.
 
            This method is here is to serve as a hook for other formats
            that conceptually also represent tables but aren't
            formatted as row-based text files.
 
  Example :
  Returns : TRUE if the navigation was successful and FALSE
            otherwise. Unsuccessful navigation will usually be treated
            as an end-of-file condition.
  Args    :
 
 

_parse_header

  Title   : _parse_header
  Usage   :
  Function: Parse the table header and navigate past it.
 
            This method is called if the number of header rows has been
            specified equal to or greater than one, and positioned at
            the first header line (row). By default the first header
            line (row) is used for setting column names, but additional
            lines (rows) may be skipped too. Empty lines and comment
            lines do not count as header lines (rows).
 
            This method will call _next_record() to navigate to the
            next header line (row), if there is more than one header
            line (row). Upon return, the file is presumed to be
            positioned at the first record after the header.
 
            This method is here is to serve as a hook for other formats
            that conceptually also represent tables but aren't
            formatted as row-based text files.
 
            Note however that the only methods used to access file
            content or navigate the position are _get_row_values() and
            _next_record(), so it should usually suffice to override
            those.
 
  Example :
  Returns : TRUE if navigation past the header was successful and FALSE
            otherwise. Unsuccessful navigation will usually be treated
            as an end-of-file condition.
  Args    :
 
 

_get_row_values

  Title   : _get_row_values
  Usage   :
  Function: Get the values for the current line (or row) as an array in
            the order of columns.
 
            This method is here is to serve as a hook for other formats
            that conceptually also represent tables but aren't
            formatted as row-based text files.
 
  Example :
  Returns : An array of column values for the current row.
  Args    :