estim_mtd

Langue: en

Version: 111135 (mandriva - 01/05/08)

Section: 1 (Commandes utilisateur)

NAME

estim_mtd - Mixture Transition Distribution Markov model estimation tool.

SYNOPSIS

estim_mtd arguments [options]

DESCRIPTION

estim_mtd performs Mixture Transition Distribution Markov model estimation and statistics calculus. The model is estimated on input sequence(s). The stationary law is also computed. The resulting model can then be used to simulate sequences with the simul_m program.

ARGUMENTS

sequence_file
Either the name of a file containing a set of sequences in FASTA format, or the name of a file containing a list of filenames, each of which containing a set of sequences in FASTA format.
-d --mtd_order=INTEGER
Order of the Markov model.
-k --mkv_order=INTEGER
Order of the Markov model of the matrices in the MTD.

OPTIONS

-p --phase=INTEGER
Number of phases (default = 1).
-a --alphabet=FILENAME
A file describing the alphabet to use (DNA alphabet, default setting).
-A --Alphabet=EXPRESSION
An expression describing the alphabet to use: [number<10 of characters for each pattern]+[:]+[alphabet patterns list] (DNA alphabet, default setting).
--dna
Use DNA alphabet (1:AGCT, default setting).
--protein
Use amino acid alphabet (1:IVLFCMAGTWSYPHEQDNKR).
-o --output=FILENAME
Result file containing the parameters of the estimated MTD Markov model.
--identical
Imposes that the matrices are identical.
--seed=INTEGER
Number of seeds for the EM algorithm (NBSEED, default setting).
--iter=INTEGER
Maximum iterations number of the EM algorithm (NBITERMAX, default setting).
--eps=FLOAT
Value of the epsilon of the EM algorithm (EPS, default setting).
--log
Log the successive likelihood values and save them in the file "em.log".
-l --likelihood=FILENAME
Compute the likelihood under selected model on the sequences contained in FILENAME or on the sequences whose filenames are listed in FILENAME.
-L --Likelihood
Compute the likelihood under selected model on the sequences specified by the sequence_file argument.
-b --bic=FILENAME
Compute the BIC under selected model on the sequences contained in FILENAME or on the sequences whose filenames are listed in FILENAME.
-B --Bic=FILENAME
Compute the BIC under selected model on the sequences specified by the sequence_file argument.
--all
Compute the total BIC/likelihood for all the given sequences.
-v --version
Display the version number and exit.
-h --help
Print this help and exit.

Examples

Estimate a MTD Markov model of order 5 with matrices of order 2 on the list of sequence files contained in file seq.list. The sequences contain tokens of an alphabet described in file sample.alpha. Generate the estimated model in file model.desc. Log the successive likelihood values of the EM algorithm.

estim_mtd seql.list -d 5 -k 2 -a sample.alpha -o model.desc --log

Estimate a MTD Markov model of order 3 with matrices of order 1 on the list of sequences contained in seq.faa. The sequences contain tokens of the amino-acids alphabet. rot.part is the partition file (see next section). The number of seeds, iterations and the epsilon are given.

estim_mtd seq.faa -d 3 -k 1 --seed 20 --iter 100 --eps 0.001 --protein

AUTHORS

estim_mtd is part of the seq++ package, developed by Vincent Miele <miele@genopole.cnrs.fr>, David Robelin <robelin@genopole.cnrs.fr>, Pierre-Yves Bourguignon <bourguignon@genopole.cnrs.fr>, Gregory Nuel <nuel@genopole.cnrs.fr> and Hugues Richard <richard@genopole.cnrs.fr>. Sophie Lebre <lebre@genopole.cnrs.fr> has inspired this work on MTD models.

SEE ALSO

estim_m(1), estim_pm(1), estim_vlm(1), simul_m(1), dist_m(1)

More information on seq++ is available at <http://stat.genopole.cnrs.fr/seqpp>.