Analysis of Molecular data
The principle of analyzing molecular data is similar to that of binary numerical data coded from morphological characters. However, which method should be used for particular molecular technique depending upon the principles and assumptions of the particular techniques. The methods employed should not violate the principles and assumptions.
Types of Data
All the experimental data gathered by molecular techniques fall into one of two broad categories: discrete characters, and similarities (or distance). The discrete data are qualitative data with the possible states of two or more discrete values. The distance (or similarities) data are quantitative data with character of varying continuously and measuring on an interval scale.
Maximum Parsimony (MP) method
In this method, the DNA (or amino acid) sequences of ancestral species are inferred from those of extant species, considering a particular tree topology, and the minimum number of evolutionary changes that are required to explain all the observed differences among the sequences is computed. This number is obtained for all possible topologies, and the topology which shows the smallest number of evolutionary changes is chosen as the final tree.
This method is used mainly for finding the topology of a tree, but branch lengths can be estimated under certain assumptions. When the MP method is applied to morphological characters, it is customary to assume that the primitive and derived character states are known.
In the case of molecular data, this assumption generally does not hold, and different character states are often reversible. It is, therefore, important to use the MP method, which permits reversible mutations. In numerical taxonomy , this type of MP method is sometimes called the Wagner parsimony method.
Neighbor-Joining method (Saitou & Nei, 1987)
In contrast to cluster analysis, neighbor joining keeps track of nodes on a tree rather than taxa or clusters of taxa. The raw data are provided as a distance matrix, and the initial tree is a star tree. A modified distance matrix is constructed in which the separation between each pair of nodes is adjusted on the basis of their average divergence from all other nodes.
The tree is constructed by linking the least-distant pair of nodes as defined by this modified matrix. When two nodes are linked, their common ancestral node is added to the tree and the terminal nodes with their respective branches are removed from the tree.
Maximum Likelihood (ML) method (Felsenstein, 1981)
In this method, the nucleotides of all DNA sequences at each nucleotide site are considered separately, and the log-likelihood of having these nucleotides are computed for a given topology by using a particular probability model.
This log-likelihood is added for all nucleotide sites, and the sum of the log-likelihood is maximized to estimate the branch length of the tree. This procedure is repeated for all possible topologies, and topology that shows the highest likelihood is chosen as the final one.
Computer Software Packages Available for Data Analysis
PAUP* 4.0 - Phylogenetic Analysis Using Parsimony and other Methods.
PHYLIP - package of programs for inferring phylogenies.
MacClade - An Useful Software package for Phylogenetic Analysis.
Hennig86 - A PC-DOS program for phylogenetic analysis.
- Molecular Evolutionary Genetics Analysis. MEGA is a DOS program for analyzing molecular data.
Developed by Sudhir Kumar, Koichiro Tamura and Masatoshi Nei (1993).
- Data Analysis in Molecular Biology and Evolution, developed by X. Xia at the
University of Hong Kong.
LVB - Reconstructing evolution with parsimony and simulated annealing.
Other Programs - Maintained by Joe Felsenstein.
Returned to Biology