Nematode systematics - 

Potential of the database system using the nematode structure

Alexander Y. Ryss & Andrei L. Lobanov
Zoological Institute, Russian Academy of Sciences, St. Petersburg, Russia,;

Note for the user: to read the paper as the slide series, click either  >>next>> or the embedded pictures.


General structure of the databases in diagnostic and phylogenetic computer systems are very similar. 

The simplest database used both in diagnostic and phylogenetic computer systems of characters consists of 2 linked tables:

1) Basic matrix of taxa/characters (represented by records of subtaxa with fields of characters coded by digits of the character states) and 

2) Table of characters and their states (represented by records of characters and their states with fields of character numbers and their states digits.


Why the diagnostic and phylogenetic systems have the similar basic structure?

While using the matrix for the identification and phylogeny analysis it is necessary to take into account that the TAXONOMIC CHARACTER MAY HAVE AT LEAST 

1) Diagnostic significance (value) - the importance for identification, capability to reach the final identification by the minimum number of steps. 

The character which splits the set of species (by its alternative states) to the maximum number of the subsets (groups) has the largest identification value. 


2) Phylogenetic significance is the predictive importance of the character (ability to predict the numerous biological features, not included in the analysis - host range, geographical distribution, protein structure, resistance, etc.). 

Complex characters (such as the head or tail or lateral sensilla patterns in nematodes, combined with the shape of sensilla) are of the most phylogenetic significance. 

It is not recommended to split the complex characters to the simple ones as in the diagnostics!!!

In the parsimonic cladogram the comparative phylogenetic significance is defined by C.I. (consistency index) and the R.I. (retention index) of the character.

Both indicies are the measures of homoplasy = the probability of the independent character state origin, which is the "noise" in the phylogeny constraction.


3) Evolutionary significance, i.e. importance of the character for the origin and evolution of the adaptations within the taxon. Characters are the rows of the states having the adaptive significance. 

Similarities in the advanced states may not reflect the cognation but they may arise independently because of the adaptive expediency for the species selection (e.g., the gland lobe formation in oesophagus of the plant parasitic nematode taxa). 

If characters are represented mainly by the unidirectional rows, the most important characters 
have the largest frequencies of the advanced stages .

Sometimes we can not construct one row of the character states and there will be several rows from the same primitive (plesiomorphic) stage. If the alternative advanced states have the same frequency within the supertaxon we can conclude that the character has the most important EVOLUTIONARY (DIVERGENCE) SIGNIFICANCE. 


Correspondingly, the matrix of taxa & characters may be used:

1. to develop the efficient identification keys based on the characters of the largest diagnostic significance 
2. to analyse the phylogenetic relations on morphological data (using PAUP and MacClade programs) 
3. to analyse the morphological adaptations which were important for the origin and evolution of the taxon under consideration.




the modern identification key has the following stages of the identification:


- User-friendly (interactive) choice of character


- Choice of the character state corresponding to the specimen under identification


- Final identification

Each identification includes three stages: 

1) selection of characters and construction of the classification of taxa based on the characters; 
2) comparison of the characters of unidentified object with the selected characters (checking of characters); 
3) establishment of the identity of the object to already known taxa (identification in situ).



Before the analysis of programs it is necessary to establish the terminology and explain the principal structure of computerized key. 

Taxon is the concrete natural group of organisms (species, genus, family, order, class, phylum). 

Character is the element of recognition of organisms (feature, peculiarity), e.g. tail shape, pattern of anal-vulval plate. 

State of character is one of the possible variants of the character, e.g. filiform tail, distinct underbridge. 

Diagnosis (single identification) is an algorithm of actions of expert and computer leading to an identification of an unidentified specimen (or a group of specimens). 


Step is an operating element of the diagnosis, usually including choice of the character, input of the character state of the unidentified specimen into computer and obtaining the answer from computer with a set of taxa having the selected character state. 

Path of diagnosis is a sequence of steps of one diagnosis with the characters used during these steps. 

Length of path = number of steps which were done to reach the answer having single meaning (an identification of the specimen or a refusal to identify). 

Diagnostic value of the concrete character at identification step is the quantitative evaluation of the character capability to minimize of the possible length of path.


Principal features of the computerized key 

Features of the key can be conventionally divided in 2 groups - the structural and dynamic ones. 

Structural features are the peculiarities of the key database, whereas

the dynamic features is specific features of the identification step which is the interactive dynamic repeating cycle of the diagnosis.


Structural features of the key


1. The most important feature is the number of entries to start the new diagnosis. There are monoentry keys and polyentry ones. 

User of monoentry key should use the only proposed character. User of multientry key selects the most convenient and reliable characters among several proposed ones, at each step of diagnosis.


2. The second important feature of the key is the number of states of character. There are dichotomous keys and polytomous ones. 

Dichotomous key has characters with only two states. 

Polytomous key has characters with three and more states, at least in some used characters. 

In dichotomous keys (from the times of Carolus Linnaeus)  one species (the most different from the others) is splitted off of the group at each step. Consequently   the number of states in a character  equals 2, namely: feature of the species to split off  and that of all others. Diagnosis is a procedure of the selection (splitting off) species "one by one".


3. Third important structural feature of the key is a capability to operate images of characters and character states. There are image-operating keys and wording-operating ones. Image-operating keys use frames with images as the main tools (screen buttons) of diagnosis and the information in form of images are the main content of the database fields. Wording-operating keys use traditional text alternatives for diagnosis, main content of DB fields are symbols. 

4. Fourth structural feature of the key is a capability to operate quantitative characters. Numerical information in the key database gives an opportunity to filter the initial set of taxa using the range of character (from minimum to maximum) that can minimize the path significantly.


Dynamic features of the key


They are features conditioned by the step peculiarities. General algorithm of the step is the following (the step-maker is placed the first):
1. Computer: the estimation of all possible characters of current set of taxa and proposal them to user for a choice.
2. User: choice of the most convenient and reliable character, input the data and on the character and its states in computer.
3. Computer: filter the initial set of taxa to reduce it; only taxa having the chosen character states remains in the current taxa set. After this stage of step the program returns the stage 1, or if the identification is finished, to the final stage:
4. Computer: Identification (or refusal to identify an object using the available characters). Presentation of information about taxon and its image.


1. It is clear from above mentioned stages that the step includes the alternating acts (dialogue) of computer and user. It is typical for all modern interactive keys. It was no dialogue with the first-generation computer; user transferred the set of data to an operator and over a fixed time obtained the final result from non-interactive key. 

Besides of interactivity, there are other dynamic features of the key. User can use one character or several at each step, consequently, there are mono-character step keys and multi-character step keys. The latter type allows sometimes to reach identification in one step, it has preferences in diagnosis of taxa with numerous quantitative characters, e. g. nematodes (see Ryss, 1997a, 1997b). 


Advanced programs use special built-in algorithms to minimize the path. They calculate the diagnostic value of characters at each step of identification. The sense of algorithm is to split the current set of taxa to minimum subsets and thus to make shorter the average path of identification. At each step the program proposes the characters in the order depending on their diagnostic values. User can use any character that seems more convenient to him, but in the last case the path will be longer. Earlier the algorithms to calculate the diagnostic values were published and discussed actively (Payne & Preece, 1980; Sviridov, 1994) but in modern programs the used algorithms is the secret information of the key developers.



TABLE . World Identification Systems - Review 

Entry Image-operating  OS

System & Key 

Authors  Country  Year  Lowest taxa Higher taxon URL
mono  obligatory  WIN  Guide to Palearctic Flea Beetle Genera  A.Konstantinov USA   1998  57 genera Insecta, Coleoptera, Chrysomelidae Alticinae 


mono obligatory  WIN  Interactive Atlas of Gymnamoebae  A.Smirnov, A.Goodkov & D.Goobanov Russia  1999 35 species Gymnamoebae
mono  obligatory  WIN TAXOKEY: Brief Illustrated Key to European Bark Beetles J.Byers  Sweden 1996 154 species Insecta, Coleoptera, Scolytidae
mono  auxiliary  WIN  KEYS  D.Remsen USA  1995  - - 
mono  auxiliary  WIN  TAXAKEY: Aphids on the World's Crops R.Blackman, V.Eastop & G.Kibby UK 1998 species Insecta, Homoptera, Aphididae
multi  obligatory  DOS  AXEX: Gastropoda of the Black Sea T E.Butakov & S.Lelekov Ukraine  1994  67 species Mollusca, Gastropoda
multi  obligatory  WIN  PICKEY (BIKEY): Common Palaearctic Beetles  M.Dianov & A.Lobanov Russia  1999  130 species Insecta, Coleoptera 
multi  obligatory  WIN  Interactive Key to Katydids of La Selva, Costa Rica P.Naskrecki USA  1997  70 species Insecta, Orthoptera, Tettigoniidae
multi  auxiliary  DOS  CABIKEY:Common Thysanoptera of Europe  I.White UK  1994  20 species Insecta, Thysanoptera
multi  auxiliary  DOS  ONLINE (PANKEY): British Orchids  R.Pankhurst UK  1994  53 species Orchidaceae
multi  auxiliary  WIN  Discover Mushrooms Technology Developments Co. USA  1998  1000 species Basidiomycetes
multi  auxiliary  WIN  IdentifyIt (Linnaeus II) Arbuscular Mycorrhizae Fungi F.MacIntyre & K.Estep Netherlands 1996  14 species Zygomycetes, Endogonaceae
multi  auxiliary  WIN  INTKEY (DELTA):Beetle Larvae of the World M.Dallwitz & R.Payne Australia  1996  385 tribes & families  Insecta, Coleoptera (larvae)
multi  auxiliary  WIN  LUCID: Key to Insect Orders K.Thiele & G.Rutter Australia  1996  31orders  Insecta
multi  auxiliary  WIN  MEKA: Key to the Families of Angiosperm C.Meacham USA  1996  411 families Angiospermae
multi  auxiliary  WIN  FusKey: Fusarium Interactive Key K.Sifert Canada 1996  30 species Deuteromycetes
multi  auxiliary  WIN  NaviKey: Pezizales species of genus Phillipsia M.Bartley USA  1999  19 species Discomycetes
multi  absent  WIN  Pilz2000: 92 Pilze-Gattungen U.Lade, H.Thomas & R.Winkler Germany  1996 92 genera Basidiomycetes
multi  absent  WIN  SynKey: Synoptic Key of Crepidotus nach Senn-Irlet R.Senn Switzerland  1992  22 species & 1 var. Crepidotus
multi  absent  WIN  Flowering Plant Family Identification  R. Phillips USA  1999  411 families Angiospermae
multi  absent  WIN  Key to Genera of the Sarcoscyphineae D.Pfister & N.Cross USA  1999  24 genera Discomycetes


Key to Radopholus generated automatically by diagnostic system


Analysis of phylogeny

Modelling of a taxa phylogeny can not be based directly on the e-key matrix. Most complicated morphological characters or a large number of simple non-correlated characters, have to be selected. It is better to construct matrices based on the  complicated and simple characters separately and independently use them for the parsimony analysis with a calculation of the final consensus tree . Character states should be ranged into the polar rows, from the primitive state (0-state, which are the most close to the outgroup) to the most advanced states (character expressions). Outgroup has to be included into the matrix, to root  the tree and to fix  the tree branches arrangement . Transformation of the e-key matrix to the matrix available for the phylogeny analysis includes the selection of the fields (i.e. characters) and change of the numbers of the character states within the cells of matrix, depending of the new polarity of the character states rows. This character states numbers substitution can be done automatically, using "Replace" command in a database management system. 


Here below is only a demonstration example of the of the phylogenetic analysis based on morphological characters in the PAUP system. 

Rows of character states are given according to the general tendencies of specialization to the parasitism inside of the plant roots body length is decreased, stylet is shortened and cephalic region is becoming more flattened (Paramonov, 1970; Siddiqi, 1980): 


Morphological characters with notes on their evolution

Table. Matrix of characters 

Numbers of characters and their states correspond to those in index of morphological characters above the matrix, multistate characters are given in brackets. 

The most primitive (plesiomorphic) state  is marked as the 0-state. It is the difference from the matrix used for the computerized diagnostic system.

The matrix of characters (Table) has been imported into the file of the Nexus format.

NEXUS file has been operated in the MacClade and PAUP packages.  

Peculiarity of the matrix is the presence of multistate sets of characters for the majority of species (the multistate sets are placed in brackets). All characters (considered to be of equal weight in the beginning) are coded as ordered with exception of a few characters coded as irreversible (marked by ***).

Comparative phylogenetic weight of the character  is defined finally by C.I. (consistency index) and R.I. (retention index).


Phylogenetic tree 

 Then the phylogeneic trees 10000 trees (length =540) were generated using heuristic search in the PAUP package. Consensus tree was calculated using 50% majority rule. Results of the phylogenetic analysis: apomorphies and reversions in nodes and characters are given below.
Phylogenetic tree of the genus includes 4 main nodes, and the species of the Indo-Asian origin (not inhabiting Australia and Oceania) form the separate monophyletic branch.


Analysis of morphological adaptations

Frequency diagram of character states allows to make conclusions on the main adaptations of the taxon (see below on example of the genus Radopholus)

The most important characters (for the progressive evolution) are those which have the peak (increase) of frequencies in the most advanced character states: 

Characters 13, 14, 16, 17,18, 19, 20

-Shortening of the body

-Increase of the relative length of the oesophageal gland lobe

-Shortening of the tail and its hyalyne part

-Reduction of males

-Shortening of spicules 


Input of the data and import of the data into the database

Input of the data into the database - it can be done in the Builder part of the identification system 


the matrix can be composed in Excel datasheets and then imported into the system on example of the LucID system.


Input of the taxa names

Input of character names

Input of character state names

Input of matrix data


Input of character images and notes


Input of the taxa notes, images and videos



Import of matrix from Excel datasheet to the Nexus format (phylogenetic systems) Paup, McClade, and dbf format (diagnostic system Pickey)



Matrix has been converted into the text format. Then via the text processor the matrix has been exported into the template of the NEXUS format file prepared for the taxa with the multistate (polymorphic) characters.

Perspectives of the database systems in nematode taxonomy and ecology (grouping of ecologically similar species, "ecological equivalents"). It can be done by the cluster analysis in the statistical packages. 




Modern database systems of nematode morphological characters may be used to:

1. to develop the efficient identification keys based on the characters of the largest diagnostic significance 
2. to analyse the phylogenetic relations on morphological data (using PAUP and MacClade programs) 
3. to analyse the morphological adaptations which were important for the origin and evolution of the taxon under consideration.