Bio::Tools::HMM.3pm

Langue: en

Version: 2008-06-24 (ubuntu - 08/07/09)

Section: 3 (Bibliothèques de fonctions)

NAME

Bio::Tools::HMM - Perl extension to perform Hidden Markov Model calculations

SYNOPSIS

   use Bio::Tools::HMM;
   use Bio::SeqIO;
   use Bio::Matrix::Scoring;
 
   # create a HMM object
   # ACGT are the bases NC mean non-coding and coding
   $hmm = new Bio::Tools::HMM('-symbols' => "ACGT", '-states' => "NC");
 
   # initialize some training observation sequences
   $seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta');
   $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta');
   @seqs = ($seq1, $seq2);
 
   # train the HMM with the observation sequences
   $hmm->baum_welch_training(\@seqs);
 
   # get parameters
   $init = $hmm->init_prob; # returns an array reference
   $matrix1 = $hmm->transition_prob; # returns Bio::Matrix::Scoring
   $matrix2 = $hmm->emission_prob; # returns Bio::Matrix::Scoring
 
   # initialize training hidden state sequences
   $hs1 = "NCNCNNNNNCCCCNNCCCNNNNC";
   $hs2 = "NCNNCNNNNNNCNCNCNNNCNCN";
   @hss = ($hs1, $hs2);
 
   # train the HMM with both observation sequences and hidden state
   # sequences
   $hmm->statistical_training(\@seqs, \@hss);
 
   # with the newly calibrated HMM, we can use viterbi algorithm
   # to obtain the hidden state sequence underlying an observation 
   # sequence
   $hss = $hmm->viterbi($seq); # returns a string of hidden states
 
 

DESCRIPTION

Hidden Markov Model (HMM) was first introduced by Baum and his colleagues in a series of classic papers in the late 1960s and 1970s. It was first applied to the field of speech recognition with great success in the 1970s.

Explosion in the amount sequencing data in the 1990s opened the field of Biological Sequence Analysis. Seeing HMM's effectiveness in detecing signals in biological sequences, Krogh, Mian and Haussler used HMM to find genes in E. coli DNA in a classical paper in 1994. Since then, there have been extensive application of HMM to other area of Biology, for example, multiple sequence alignment, CpG island detection and so on.

DEPENDENCIES

This package comes with the main bioperl distribution. You also need to install the lastest bioperl-ext package which contains the XS code that implements the algorithms. This package won't work if you haven't compiled the bioperl-ext package.

TO-DO

1.
Allow people to set and get the tolerance level in the EM algorithm.
2.
Allow people to set and get the maximum number of iterations to run in the EM algorithm.
3.
A function to calculate the probability of an observation sequence
4.
A function to do posterior decoding, ie to find the probabilty of seeing a certain observation symbol at position i.

FEEDBACK


Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated.

   bioperl-l@bioperl.org                  - General discussion
   http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
 
 

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web:

   http://bugzilla.open-bio.org/
 
 

AUTHOR

         This implementation was written by Yee Man Chan (ymc@yahoo.com).
         Copyright (c) 2005 Yee Man Chan. All rights reserved. This program
         is free software; you can redistribute it and/or modify it under
         the same terms as Perl itself. All the code are written by Yee
         Man Chan without borrowing any code from anywhere.
 
 

likelihood

  Title   : likelihood
  Usage   : $prob = $hmm->likelihood($seq)
  Function: Calculate the probability of an observation sequence given an HMM
  Returns : An floating point number that is the logarithm of the probability
            of an observation sequence given an HMM
  Args    : The only argument is a string that is the observation sequence
            you are interested in. Note that the sequence must not contain
            any character that is not in the alphabet of observation symbols.
 
 

statistical_training

  Title   : statistical_training
  Usage   : $hmm->statistical_training(\@seqs, \@hss)
  Function: Estimate the parameters of an HMM given an array of observation 
            sequence and an array of the corresponding hidden state 
            sequences
  Returns : Returns nothing. The parameters of the HMM will be set to the 
            estimated values
  Args    : The first argument is a reference to an array of observation 
            sequences. The second argument is a reference to an array of
            hidden state sequences. Note that the lengths of an observation
            sequence and a hidden state sequence must be the same.
 
 

baum_welch_training

  Title   : baum_welch_training
  Usage   : $hmm->baum_welch_training(\@seqs)
  Function: Estimate the parameters of an HMM given an array of observation 
            sequence
  Returns : Returns nothing. The parameters of the HMM will be set to the 
            estimated values
  Args    : The only argument is a reference to an array of observation 
            sequences.
 
 

viterbi

  Title   : viterbi
  Usage   : $hss = $hmm->viterbi($seq)
  Function: Find out the hidden state sequence that can maximize the 
            probability of seeing observation sequence $seq.
  Returns : Returns a string that is the hidden state sequence that maximizes
            the probability of seeing $seq.
  Args    : The only argument is an observation sequence.
 
 

symbols

  Title     : symbols 
  Usage     : $symbols = $hmm->symbols() #get
            : $hmm->symbols($value) #set
  Function  : the set get for the observation symbols
  Example   :
  Returns   : symbols string
  Arguments : new value
 
 

states

  Title     : states
  Usage     : $states = $hmm->states() #get
            : $hmm->states($value) #set
  Function  : the set get for the hidden states
  Example   :
  Returns   : states string
  Arguments : new value
 
 

init_prob

  Title     : init_prob
  Usage     : $init = $hmm->init_prob() #get
            : $hmm->transition_prob(\@init) #set
  Function  : the set get for the initial probability array
  Example   :
  Returns   : reference to double array
  Arguments : new value
 
 

transition_prob

  Title     : transition_prob
  Usage     : $transition_matrix = $hmm->transition_prob() #get
            : $hmm->transition_prob($matrix) #set
  Function  : the set get for the transition probability mairix
  Example   :
  Returns   : Bio::Matrix::Scoring 
  Arguments : new value
 
 

emission_prob

  Title     : emission_prob
  Usage     : $emission_matrix = $hmm->emission_prob() #get
            : $hmm->emission_prob($matrix) #set
  Function  : the set get for the emission probability mairix
  Example   :
  Returns   : Bio::Matrix::Scoring 
  Arguments : new value