Bio::Tools::BPlite.3pm

Langue: en

Autres versions - même langue

Version: 2008-06-24 (ubuntu - 07/07/09)

Section: 3 (Bibliothèques de fonctions)

NAME

Bio::Tools::BPlite - Lightweight BLAST parser

SYNOPSIS

  use Bio::Tools::BPlite;
  my $report = new Bio::Tools::BPlite(-fh=>\*STDIN);
 
   {
     $report->query;
     $report->database;
     while(my $sbjct = $report->nextSbjct) {
         $sbjct->name;
         while (my $hsp = $sbjct->nextHSP) {
             $hsp->score;
             $hsp->bits;
             $hsp->percent;
             $hsp->P;
             $hsp->EXP;
             $hsp->match;
             $hsp->positive;
             $hsp->length;
             $hsp->querySeq;
             $hsp->sbjctSeq;
             $hsp->homologySeq;
             $hsp->query->start;
             $hsp->query->end;
             $hsp->hit->start;
             $hsp->hit->end;
             $hsp->hit->seq_id;
             $hsp->hit->overlaps($exon);
         }
     }
 
     # the following line takes you to the next report in the stream/file
     # it will return 0 if that report is empty,
     # but that is valid for an empty blast report.
     # Returns -1 for EOF.
 
     last if ($report->_parseHeader == -1);
     redo;
   }
 
 

DESCRIPTION

BPlite is a package for parsing BLAST reports. The BLAST programs are a family of widely used algorithms for sequence database searches. The reports are non-trivial to parse, and there are differences in the formats of the various flavors of BLAST. BPlite parses BLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX reports from both the high performance WU-BLAST, and the more generic NCBI-BLAST.

Many people have developed BLAST parsers (I myself have made at least three). BPlite is for those people who would rather not have a giant object specification, but rather a simple handle to a BLAST report that works well in pipes.

Object

BPlite has three kinds of objects, the report, the subject, and the HSP. To create a new report, you pass a filehandle reference to the BPlite constructor.

  my $report = new Bio::Tools::BPlite(-fh=>\*STDIN); # or any other filehandle
 
 

The report has two attributes (query and database), and one method (nextSbjct).

  $report->query;     # access to the query name
  $report->database;  # access to the database name
  $report->nextSbjct; # gets the next subject
  while(my $sbjct = $report->nextSbjct) {
      # canonical form of use is in a while loop
  }
 
 

A subject is a BLAST hit, which should not be confused with an HSP (below). A BLAST hit may have several alignments associated with it. A useful way of thinking about it is that a subject is a gene and HSPs are the exons. Subjects have one attribute (name) and one method (nextHSP).

  $sbjct->name;    # access to the subject name
  $sbjct->nextHSP; # gets the next HSP from the sbjct
  while(my $hsp = $sbjct->nextHSP) {
      # canonical form is again a while loop
  }
 
 

An HSP is a high scoring pair, or simply an alignment. HSP objects inherit all the useful methods from RangeI/SeqFeatureI/FeaturePair, but provide an additional set of attributes (score, bits, percent, P, match, EXP, positive, length, querySeq, sbjctSeq, homologySeq) that should be familiar to anyone who has seen a blast report.

For lazy/efficient coders, two-letter abbreviations are available for the attributes with long names (qs, ss, hs). Ranges of the aligned sequences in query/subject and other information (like seqname) are stored in SeqFeature objects (i.e.: $hsp->query, $hsp->subject which is equal to $hsp->feature1, $hsp->feature2). querySeq, sbjctSeq and homologySeq do only contain the alignment sequences from the blast report.

  $hsp->score;
  $hsp->bits;
  $hsp->percent;
  $hsp->P;
  $hsp->match;
  $hsp->positive;
  $hsp->length;
  $hsp->querySeq;      $hsp->qs;
  $hsp->sbjctSeq;      $hsp->ss;
  $hsp->homologySeq;   $hsp->hs;
  $hsp->query->start;
  $hsp->query->end;
  $hsp->query->seq_id;
  $hsp->hit->primary_tag; # "similarity"
  $hsp->hit->source_tag;  # "BLAST"
  $hsp->hit->start;
  $hsp->hit->end;
  ...
 
 

So a very simple look into a BLAST report might look like this.

  my $report = new Bio::Tools::BPlite(-fh=>\*STDIN);
  while(my $sbjct = $report->nextSbjct) {
      print ">",$sbjct->name,"\n";
      while(my $hsp = $sbjct->nextHSP) {
                 print "\t",$hsp->start,"..",$hsp->end," ",$hsp->bits,"\n";
      }
  }
 
 

The output of such code might look like this:

  >foo
      100..155 29.5
      268..300 20.1
  >bar
      100..153 28.5
      265..290 22.1
 
 

AUTHORS

Ian Korf (ikorf@sapiens.wustl.edu, http://sapiens.wustl.edu/~ikorf), Lorenz Pollak (lorenz@ist.org, bioperl port)

ACKNOWLEDGEMENTS

This software was developed at the Genome Sequencing Center at Washington Univeristy, St. Louis, MO.

CONTRIBUTORS

Jason Stajich, jason@cgt.mc.duke.edu Copyright (C) 1999 Ian Korf. All Rights Reserved.

DISCLAIMER

This software is provided ``as is'' without warranty of any kind.

new

  Title   : new
  Function: Create a new Bio::Tools::BPlite object
  Returns : Bio::Tools::BPlite
  Args    : -file     input file (alternative to -fh)
            -fh       input stream (alternative to -file)
 
 

next_feature

  Title   : next_feature
  Usage   : while( my $feat = $res->next_feature ) { # do something }
  Function: SeqAnalysisParserI implementing function. This implementation
            iterates over all HSPs. If the HSPs of the current subject match
            are exhausted, it will automatically call nextSbjct().
  Example :
  Returns : A Bio::SeqFeatureI compliant object, in this case a
            Bio::Tools::BPlite::HSP object, and FALSE if there are no more
            HSPs.
  Args    : None
 
 

query

  Title    : query
  Usage    : $query = $obj->query();
  Function : returns the query object
  Example  :
  Returns  : query object
  Args     :
 
 

qlength

  Title    : qlength
  Usage    : $len = $obj->qlength();
  Function : returns the length of the query 
  Example  :
  Returns  : length of query
  Args     :
 
 

pattern

  Title    : pattern
  Usage    : $pattern = $obj->pattern();
  Function : returns the pattern used in a PHIBLAST search
 
 

query_pattern_location

  Title    : query_pattern_location
  Usage    : $qpl = $obj->query_pattern_location();
  Function : returns reference to array of locations in the query sequence
             of pattern used in a PHIBLAST search
 
 

database

  Title    : database
  Usage    : $db = $obj->database();
  Function : returns the database used in this search
  Example  :
  Returns  : database used for search
  Args     :
 
 

nextSbjct

  Title    : nextSbjct
  Usage    : $sbjct = $obj->nextSbjct();
  Function : Method of iterating through all the Sbjct retrieved 
             from parsing the report 
  Example  : while ( my $sbjct = $obj->nextSbjct ) {}
  Returns  : next Sbjct object or null if finished
  Args     :