Danforth Center Logo Donald Danforth Plant Science Center
Home About Us Research Resources Opportunities News & Media
 


Geminivirus Species Demarcation Criteria Study Case


Proper citation of the study case:
Fauquet, C. (2002). Geminivirus Species Demarcation Criteria Study Case. Webpage 
Geminiviridaewww.danforthcenter.org/iltab/geminiviridae/


C. Fauquet
Director of ILTAB
Donald Danforth Plant Science Center
975 N. Warson Rd.
St Louis, MO 63132
Tel: 314-587-1241
Fax: 314-587-1956
E-mail: iltab@danforthcenter.org
WebPages: www.danforthcenter.org

Summary
With the increasing number of geminivirus sequences, it is more and more difficult to identify with accuracy, the viruses that are strains to already described viruses and those that represent new virus species. There is a great need for clear and simple species demarcation guidelines to help the community to come up with homogeneous decisions. This study aims at evaluating the situation and establishing these guidelines, using as criterium the complete genome A component sequences of geminiviruses, as sequences are more and more used as the sole criterium. The conclusion is that a simple percentage of identity between two sequences is the easiest and best criterium and that 89% is the threshold between species and strains of geminiviruses. Details on the analysis allowing these conclusions are here provided.

Introduction
Firstly it is important to remember that taxonomy is a creation of scientists to help us deal with large and variable sets of organisms, genomes and molecules and that there is not virus taxonomy in nature, we therefore need to find a system that is the most useful.
Because there is no gap between the strain and species peaks of distribution of A component sequence comparisons (Fig. 1), it is important to figure out where to put a threshold between these two categories. The recent proposition was to use the percentage of recombination between two sequences to decide if a new sequence would be representing a strain of an existing virus species or a new species. The question is to have some information to choose a particular percentage rather than another one, therefore a study of all the cases known today was carried to see if the distribution of these percentages was uniformed or multimodal and would permit such decision.

Multimodal distribution of identity percentages correlates with virus taxonomy.
If the sequences available today for all the geminiviruses are compared (212 in Dec. 2001), one can draw a distribution graph showing a multimodal distribution that can be best fitted to 5 peaks of distribution corresponding to 5 possible taxonomic levels in the Geminiviridae family which are: subfamily, genus, subgenus, species and strains.

Figure 1: Distribution of identity percentages between A component nucleic acid sequences of geminiviruses.



It is clear that each peak overlaps with the next one, but if there is no problem in assigning a virus to a particular genus or another one, there is an overlap between species and strains that can only be resolved by using guidelines set-up by the Geminiviridae community. If one look carefully at the distribution at the strain level, there is a multimodal distribution (between 80 and 100%) and there is a peak at 87% that is mostly composed of sequences that do contain recombinant fragments of different lengths and different origins (Fig. 2). Therefore the question really is: when does a strain start and when does a species stop? Should viruses containing recombinant fragments be considered like different species on the mathematical basis and/or on the biological basis?

Figure 2: Distribution of the percentage identity of pairwise comparisons of genome A component nucleic acid sequences of geminiviruses between 80 and 100%.



A comparison of 212 sequences representing 22336 comparisons lead to the diagram in Figure 2 and shows that over all the geminivirus genera there are 3 to 4 peaks at what could be considered as the strain level. The peak at 87% represents mostly recombinant viruses as well as a number of viruses currently classified as MSVs. We investigated if it was possible to include the recombinant viruses in the strain category and if so what would be the percentage of shared genome necessary to have a clear distribution. We also investigated if using the 89% identity percentage as species/strain threshold in the entire set of geminiviruses would provide a better clear cut.

Importance of genome recombination for taxonomic identification
Because there is no biological information that can currently help us to solve that question, the only possibility is to look into the existing data set and figure out if we can use it to define the best possible partition between species and strains. We have compared all the sequences that are in the range of 80 to 90% identity and we have calculated he percentage of shared genome at the strain level. It appears that there is a continuum between 10 and 100% of recombination, and it is consequently difficult or impossible to pick a number that would be more relevant than another one. There is effectively a huge peak at 10-15% corresponding to many small fragments integrated all over the genome, but they do not play a role in the 80-90% identity peak of the distribution. There is another peak at 60% that do play a role in that range and we could eventually consider 50% as a limit to demarcate strains and species (Fig. 3), however, this is not absolute and we have several cases of viruses sharing 50±2% of their genome, and this will increase with time. Furthermore as stated in Table II, if we do so there are several question marks and non-obvious decisions to take.

Figure 3: Distribution of genome sharing percentages between A component nucleic acid sequences of recombinant geminiviruses.



Need for a species reference type
If we use the recombination system with a percentage of shared genome, there is another problem in the fact that, a reference type for each species would be required in order to compare each new sequence and finally to decide if a particular isolate is a strain of an existing or a member of a new species. For example EACMV-CM and EACMV-TZ are 81-83% identical and share between 45 and 68% of their genome at the strain level (Table I), depending on which strain you compare to. If EACMV-TZ is the type reference for the EACMV species, then EACMV-CM is a new strain of the same species, now if you consider EACMV-UG, they are different species?! Furthermore, when considering EACMV-MW, it is 84-86% identical to EACMV strains and shares 46-59% of its genome to these viruses, so again could be considered a strain of EACMV when using EACMV-TZ as a reference... Finally if you would compare EACMV-CM and EACMV-MW, both strains to EACMV-TZ, they are only 76% identical and share only 29% of their genome, therefore definitively species to each other! Although theoretically we could imagine, using different criteria of the polythetic concept of the species definition, that a virus could belong to two different species, it is practically impossible to handle, beginning with the name!


Table I: Comparison of cassava geminivirus component A sequences:

The upper triangle is the percentage of shared genome
The lower triangle is the percentage of sequence identity
The yellow color indicates possible strain relations and the green color indicates possible strain relation or species relation, light blue indicates species relations. 

  ACMV-KE EACMV-KE EACMV-TZ EACMV-UG2 EACMV-CM EACMV-MW SACMV ICMV SLCMV
ACMV-KE  

 -

 0.00

0.00

0.19

0.00

0.00

0.00

0.00

0.00

EACMV-KE

 63

 -

1.00

0.84

0.63

0.59

0.27

0.00

0.00

EACMV-TZ

 64

 94

-

0.84

0.61

0.46

0.39

0.00

0.00

EACMV-UG2

 68

 92

91

-

0.45

0.52

0.32

0.00

0.00

EACMV-CM

 60

 85

84

81

-

0.29

0.00

0.00

0.00

EACMV-MW

 66

 86

 85

 85

 76

 -

0.55

0.00

0.00

SACMV

68

76

76

75

66

83

 -

0.00

0.00

ICMV

63

61

61

61

58

60

63

 -

0.43

SLCMV

 68

 63

 62

 63

 60

 62

 65

 79

 -

Need for renaming viruses with the new species guidelines
It seems therefore that the most useful and practical guideline to demarcate virus species using genome sequences, is to consider a unique percentage of identity (89%), above which all viruses belong to the same species and under which they pertain to different species. Of course this would result in creating new species each time a virus will recombine and integrate more than 20% of the genome of another species and as a consequence creating a new name for that viral entity. Considering that out of the 344 pairs of virus comparisons carried in this study, 266 (66%) had less than 20% of foreign genome integrated, 19 (6%) had more than 80% (and therefore already above 89% identical), it is only 97 cases that need to be addressed in this way with a new name for about 14 viruses (see Table II).


Table II: List of viruses to be renamed according to the 89% identity rule.

Virus Name

Closest Virus

% identity

New name

Comments

CLCuV-Raj*

CLCuAV

87%

CLCuRV

Different species

CLCuMV

83-84%

OYVMV-201*

OYVMV-301

87%

BYVMV-201

Different species

CLCuMVs

83-85%

AYVV-Tai, Tw

AYVV

87-88%

AYVTWV-Tai, Tw

Different species

TYLCV-IR

TYLCVs

87-90%

TYLCIRV

Different species

TYLCSV-SP1,2

TYLCSV

87%

?

Different species

TYLCSDV-1,2

TYLCVs

78-83%

TYLCSDV-1,2     

Different species  

TYLCV-SP27

TYLCVs

80-90%

 

TYLCSPV

Different species

 

TYLCSVs

79-85%

 

 

 

EACMV-CM

EACMV-TZ

81-85%

 

WACMV

Different species

EACMV-MW

EACMV-TZ

84-86%

 

SoACMV

Different species

SiGMHNV-YV*

SiGMHNV

88%

 

SiYVV

Different species

SiGMFloV-A11

SiGMFloV-A1

81%

 

?

Different species

PYMV-TT

PYMV-VE

85%

 

PYMTTV

Different species

PYMV-PA

PYMV-VE

85%

 

PYMPAV

Different species

SPLCV-Ipo

SPLCV

87%

 

IYVV

Different species

MSV-B*

MSV-A

87%

 

?

Different species

MSV-Set *

MSV-A

73%

 

SetSV

Different species

MSV-D[Raw] *

SetSV

90%

 

SetSV-[Raw]

A strain of SetSV

MSV-E[Pat] *

SetSV

77%

 

DYSV

Different species

PanSV-KE*

PanSV-Kar

83%

 

PanSKV

Different species



Table III: List of viruses to be renamed according to the 80-90% identity and 50% sharing genome rule.

Virus Name

Virus

% identity

% sharing

New name