
                         ----------------------------
			 A ROUGH GUIDE TO PRC OPTIONS
			 ----------------------------


Library versus pairwise runs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The main PRC binary has two modes, "pairwise" and "library". In the 
pairwise mode, PRC scores a model against another model and simply prints 
out the results. In the library mode, it runs a model against a library of 
models and saves the output to output.scores (and, if requested, the 
alignments to output.aligns). E-values are only reported for local-local 
library runs. (E-values are the most sensitive scores for homology 
recognition.)

PRC assumes that each model file contains exactly one model, and that the 
library file lists model files (using absolute paths), one filename per 
line.


-algo  <> : forward (default), viterbi
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Forward means, roughly, "sum across all possible alignments". Viterbi 
means, roughly, "find the best possible alignment". Forward is considered 
to be better, because it can find similarity even when the exact alignment 
is unclear.

For forward, the alignments (as opposed to scores) are calculated using 
the Maximum Alignment Accuracy algorithm introduced by Holmes and Durbin. 
This is about 3x slower than calculating the single best alignment using 
Viterbi, but there is now empirical evidence (T-COFFEE, PROBCONS) that the 
benefit is worthwhile.


-MMfn  <> : match-match scoring function; dot1, dot2 (default)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dot1 was the function used by PRC up to 1.5.0. Dot2, suggested by Johannes 
Soeding, appears to perform marginally better and has become the default 
starting with PRC 1.5.1.


-mode  <> : local-local (default), local-global, etc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alignment mode. Local is Smith-Waterman ("alignment can start and end 
anywhere"), global is Needleman-Wunsch ("alignment starts at the start of 
the model and ends at the end"). Local-global means "local to HMM1, global 
to HMM2".


-align <> : alignment style; none (default), prc, sam1, sam2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, no alignments are printed out or saved (corresponding to the 
option 'none'). For the remaining three options, the correspondence 
between what's printed out and internal PRC states is as follows:

               MM       MI       IM     DM,DI    MD,ID      DD      II
            -------  -------  -------  -------  -------  ------- -------
     prc  | 'M','M'  'M','I'  'I','M'  'D','~'  '~','D'  'D','D' 'I','I'
     sam1 |   'M'      'm'      'I'      'd'      '-'      'D'     'i'
     sam2 |   'M'      'I'      'm'      '-'      'd'      'D'     'i'

Prc is the easiest to understand, but it requires two lines per alignment 
(one for each model). Sam1 and sam2 are inspired by SAM formats; the 
difference is that for sam1, hmm1 is the "sequence" (and hmm2 the 
"model"), and vice versa for sam2.


-Emax  <> : only report hits with E-value <= Emax (default: 10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If E-values are calculated for a run, only hits better than Emax get 
reported.

Currently, only local-local runs against a library have E-values.


-stop  <> : stop looking for more hits when simple < stop
-hits  <> : stop looking for more hits when hit_no > hits
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PRC uses a variation of the Waterman-Eggert procedure to try and find all 
plausible alignments between a pair of models. However, sometimes there 
are hundreds of such alignments, so the question is: when do you stop 
looking for more? The PRC answer is: whenever an alignment is found with a 
simple score less than the stop parameter, or whenever the number of hits 
is exceeded. If E-values are calculated for the run, the search will also 
be terminated when an alignment gets an E-value > 10*Emax.

Beware of setting stop too low for fully automated runs (without resetting 
hits) -- it can easily result in a near-infinite loop.


Model names
~~~~~~~~~~~
The model names are printed out in the hmm1 and hmm2 columns, and in the
alignment file. Where do they come from? For SAM and PSI-BLAST files, they are
taken from the filename. If the filename is

/dir1/dir2/XYZ.1.extension

the model name will be "XYZ.1". For HMMER models, the name is taken from the
ACC line, or the NAME line if there is no ACC, or the filename if there is no
ACC or NAME. For FASTA files, it is taken from the > line. Finally, PRC models
store the name inside the file. For models created using convert_to_prc the
name is taken from the input file.


If you have any questions, please get in touch!

Martin Madera <martin.madera@gmail.com>
