Molecular Systematics and Population Genetics
The simplest definition for molecular systematics is that study systematics using molecular data or study systematics at the molecular level. Reconstruction of phylogenetic relationships of organisms on earth (building tree of life) using molecular data is one of the major aims of systematic biology today.
Hillis et al. (1996)* defined three main applications of molecular systematics:
* See Hillis, David M., Craig Moritz, and Barbara K. Mable. 1996. Molecular Systematics. 2nd ed. Sinauer Associates, Inc. & Publishers, Sunderland, Massachusetts, USA.
Phylogenetic Analysis of Molecular Data
The principle of analyzing molecular data is similar to that of binary numerical data coded from morphological characters. However, which method should be used for particular molecular technique depending upon the principles and assumptions of the particular techniques. The methods employed should not violate the principles and assumptions.
Type of Data
All the experimental data gathered by molecular techniques fall into one of two broad categories: discrete characters, and similarities (or distance). The discrete data are qualitative data with the possible states of two or more discrete values. The distance (or similarities) data are quantitative data with character of varying continuously and measuring on an interval scale.
Maximum Parsimony (MP) method
In this method, the DNA (or amino acid) sequences of ancestral species are inferred from those of extant species, considering a particular tree topology, and the minimum number of evolutionary changes that are required to explain all the observed differences among the sequences is computed. This number is obtained for all possible topologies, and the topology which shows the smallest number of evolutionary changes is chosen as the final tree.
This method is used mainly for finding the topology of a tree, but branch lengths can be estimated under certain assumptions. When the MP method is applied to morphological characters, it is customary to assume that the primitive and derived character states are known.
In the case of molecular data, this assumption generally does not hold, and different character states are often reversible. It is, therefore, important to use the MP method, which permits reversible mutations. In numerical taxonomy , this type of MP method is sometimes called the Wagner parsimony method.
Neighbor-Joining method (Saitou & Nei, 1987)
In contrast to cluster analysis, neighbor joining keeps track of nodes on a tree rather than taxa or clusters of taxa. The raw data are provided as a distance matrix, and the initial tree is a star tree. A modified distance matrix is constructed in which the separation between each pair of nodes is adjusted on the basis of their average divergence from all other nodes.
The tree is constructed by linking the least-distant pair of nodes as defined by this modified matrix. When two nodes are linked, their common ancestral node is added to the tree and the terminal nodes with their respective branches are removed from the tree.
Maximum Likelihood (ML) method (Felsenstein, 1981)
In this method, the nucleotides of all DNA sequences at each nucleotide site are considered separately, and the log-likelihood of having these nucleotides are computed for a given topology by using a particular probability model.
This log-likelihood is added for all nucleotide sites, and the sum of the log-likelihood is maximized to estimate the branch length of the tree. This procedure is repeated for all possible topologies, and topology that shows the highest likelihood is chosen as the final one.
Bayesian analysis is a phylogenetic analysis method developed recently (Rannala and Yang 1996, Mau and Newton 1997, Mau et al. 1999). It is very similar to that of maximum likelihood method, but differing in the notion of posterior probabilities: probabilities that are estimated, based on some model (prior expectations). Instead of seeking the tree that maximizes the likelihood of observing the data, it seeks those trees with the greatest likelihood of given the data. Bayesian analysis produces a set of trees of roughly equal likelihoods.
Supertrees are phylogenies (rooted evolutionary trees) assembled from smaller phylogenies that share some but not necessarily all taxa (leaf nodes) in common. Thus, supertrees can make novel statements about relationships of taxa that do not co-occur on any single input tree while still retaining hierarchical information from the input trees. As a method of combining existing phylogenetic information, supertrees potentially solve many of the problems associated with other methods (e.g., absence of homologous characters, incompatible data types, or non-overlapping sets of taxa). In addition to helping synthesize hypotheses of relationships among larger sets of taxa, supertrees can suggest optimal strategies for taxon sampling (either for future supertree construction or for experimental design issues such as choice of outgroups), can reveal emerging patterns in the large knowledge base of phylogenies currently in the literature, and can provide useful tools for comparative biologists who frequently have information about variation across much broader sets of taxa than those found in any one tree. (from: M. J. Sanderson, D. Gusfield, and O. Eulenstein)
Links to Sites of Systematics and Phylogenetics
Computer Software Packages Available for Data Analysis
Population Genetics Links
This page has been accessed 131419 times since January 1, 2001
1997-present by Q. Q. Fang