|
Geminivirus
Species Demarcation Criteria Study Case
Proper citation of the study
case: Fauquet, C. (2002). Geminivirus Species Demarcation
Criteria Study Case.
Webpage Geminiviridae: www.danforthcenter.org/iltab/geminiviridae/
C. Fauquet Director of ILTAB Donald
Danforth Plant Science Center 975 N. Warson Rd. St Louis, MO
63132 Tel: 314-587-1241 Fax: 314-587-1956 E-mail:
iltab@danforthcenter.org WebPages:
www.danforthcenter.org
Summary
With the increasing number of geminivirus sequences, it
is more and more difficult to identify with accuracy, the
viruses that are strains to already described viruses and
those that represent new virus species. There is a great need
for clear and simple species demarcation guidelines to help
the community to come up with homogeneous decisions. This
study aims at evaluating the situation and establishing these
guidelines, using as criterium the complete genome A component
sequences of geminiviruses, as sequences are more and more
used as the sole criterium. The conclusion is that a simple
percentage of identity between two sequences is the easiest
and best criterium and that 89% is the threshold between
species and strains of geminiviruses. Details on the analysis
allowing these conclusions are here provided.
Introduction Firstly it is important
to remember that taxonomy is a creation of scientists to help
us deal with large and variable sets of organisms, genomes and
molecules and that there is not virus taxonomy in nature, we
therefore need to find a system that is the most
useful. Because there is no gap between the strain and
species peaks of distribution of A component sequence
comparisons (Fig. 1), it is important to figure out where to
put a threshold between these two categories. The recent
proposition was to use the percentage of recombination between
two sequences to decide if a new sequence would be
representing a strain of an existing virus species or a new
species. The question is to have some information to choose a
particular percentage rather than another one, therefore a
study of all the cases known today was carried to see if the
distribution of these percentages was uniformed or multimodal
and would permit such decision.
Multimodal distribution of identity
percentages correlates with virus taxonomy. If the
sequences available today for all the geminiviruses are
compared (212 in Dec. 2001), one can draw a distribution graph
showing a multimodal distribution that can be best fitted to 5
peaks of distribution corresponding to 5 possible taxonomic
levels in the Geminiviridae family which are:
subfamily, genus, subgenus, species and strains.
Figure 1: Distribution of identity
percentages between A component nucleic acid sequences of
geminiviruses.

It is clear that each peak overlaps with the
next one, but if there is no problem in assigning a virus to a
particular genus or another one, there is an overlap between
species and strains that can only be resolved by using
guidelines set-up by the Geminiviridae community. If one look
carefully at the distribution at the strain level, there is a
multimodal distribution (between 80 and 100%) and there is a
peak at 87% that is mostly composed of sequences that do
contain recombinant fragments of different lengths and
different origins (Fig. 2). Therefore the question really is:
when does a strain start and when does a species stop? Should
viruses containing recombinant fragments be considered like
different species on the mathematical basis and/or on the
biological basis?
Figure 2: Distribution of the
percentage identity of pairwise comparisons of genome A
component nucleic acid sequences of geminiviruses between 80
and 100%.

A comparison of 212 sequences representing 22336
comparisons lead to the diagram in Figure 2 and shows that
over all the geminivirus genera there are 3 to 4 peaks at what
could be considered as the strain level. The peak at 87%
represents mostly recombinant viruses as well as a number of
viruses currently classified as MSVs. We investigated if it
was possible to include the recombinant viruses in the strain
category and if so what would be the percentage of shared
genome necessary to have a clear distribution. We also
investigated if using the 89% identity percentage as
species/strain threshold in the entire set of geminiviruses
would provide a better clear cut.
Importance of genome
recombination for taxonomic identification Because there is no biological
information that can currently help us to solve that question,
the only possibility is to look into the existing data set and
figure out if we can use it to define the best possible
partition between species and strains. We have compared all
the sequences that are in the range of 80 to 90% identity and
we have calculated he percentage of shared genome at the
strain level. It appears that there is a continuum between 10
and 100% of recombination, and it is consequently difficult or
impossible to pick a number that would be more relevant than
another one. There is effectively a huge peak at 10-15%
corresponding to many small fragments integrated all over the
genome, but they do not play a role in the 80-90% identity
peak of the distribution. There is another peak at 60% that do
play a role in that range and we could eventually consider 50%
as a limit to demarcate strains and species (Fig. 3), however,
this is not absolute and we have several cases of viruses
sharing 50±2% of their genome, and this will increase with
time. Furthermore as stated in Table II, if we do so there are
several question marks and non-obvious decisions to
take.
Figure 3: Distribution of genome
sharing percentages between A component nucleic acid sequences
of recombinant geminiviruses.

Need for a species reference
type If we use the recombination system with a
percentage of shared genome, there is another problem in the
fact that, a reference type for each species would be required
in order to compare each new sequence and finally to decide if
a particular isolate is a strain of an existing or a member of
a new species. For example EACMV-CM and EACMV-TZ are 81-83%
identical and share between 45 and 68% of their genome at the
strain level (Table I), depending on which strain you compare
to. If EACMV-TZ is the type reference for the EACMV species,
then EACMV-CM is a new strain of the same species, now if you
consider EACMV-UG, they are different species?! Furthermore,
when considering EACMV-MW, it is 84-86% identical to EACMV
strains and shares 46-59% of its genome to these viruses, so
again could be considered a strain of EACMV when using
EACMV-TZ as a reference... Finally if you would compare
EACMV-CM and EACMV-MW, both strains to EACMV-TZ, they are only
76% identical and share only 29% of their genome, therefore
definitively species to each other! Although theoretically we
could imagine, using different criteria of the polythetic
concept of the species definition, that a virus could belong
to two different species, it is practically impossible to
handle, beginning with the name!
Table I: Comparison of cassava geminivirus
component A sequences: The upper triangle is the
percentage of shared genome The lower triangle is the
percentage of sequence identity The yellow color indicates
possible strain relations and the green color indicates
possible strain relation or species relation, light blue
indicates species relations.
| |
ACMV-KE
|
EACMV-KE
|
EACMV-TZ
|
EACMV-UG2
|
EACMV-CM
|
EACMV-MW
|
SACMV
|
ICMV
|
SLCMV
|
| ACMV-KE
|
- |
0.00 |
0.00 |
0.19 |
0.00 |
0.00 |
0.00 |
0.00 |
0.00 |
|
EACMV-KE |
63 |
- |
1.00 |
0.84 |
0.63 |
0.59 |
0.27 |
0.00 |
0.00 |
|
EACMV-TZ |
64 |
94 |
- |
0.84 |
0.61 |
0.46 |
0.39 |
0.00 |
0.00 |
|
EACMV-UG2 |
68 |
92 |
91 |
- |
0.45 |
0.52 |
0.32 |
0.00 |
0.00 |
|
EACMV-CM |
60 |
85 |
84 |
81 |
- |
0.29 |
0.00 |
0.00 |
0.00 |
|
EACMV-MW |
66 |
86 |
85 |
85 |
76 |
- |
0.55 |
0.00 |
0.00 |
|
SACMV |
68 |
76 |
76 |
75 |
66 |
83 |
- |
0.00 |
0.00 |
|
ICMV |
63 |
61 |
61 |
61 |
58 |
60 |
63 |
- |
0.43 |
|
SLCMV |
68 |
63 |
62 |
63 |
60 |
62 |
65 |
79 |
- |
Need for renaming viruses with the new
species guidelines It seems therefore that the most
useful and practical guideline to demarcate virus species
using genome sequences, is to consider a unique percentage of
identity (89%), above which all viruses belong to the same
species and under which they pertain to different species. Of
course this would result in creating new species each time a
virus will recombine and integrate more than 20% of the genome
of another species and as a consequence creating a new name
for that viral entity. Considering that out of the 344 pairs
of virus comparisons carried in this study, 266 (66%) had less
than 20% of foreign genome integrated, 19 (6%) had more than
80% (and therefore already above 89% identical), it is only 97
cases that need to be addressed in this way with a new name
for about 14 viruses (see Table II).
Table II: List of viruses to be renamed
according to the 89% identity rule.
|
Virus
Name
|
Closest
Virus
|
%
identity
|
New
name
|
Comments
|
|
CLCuV-Raj*
|
CLCuAV
|
87%
|
|
CLCuRV
|
Different
species
|
|
|
CLCuMV
|
83-84%
|
|
|
|
|
OYVMV-201*
|
OYVMV-301
|
87%
|
|
BYVMV-201
|
Different species |
|
|
CLCuMVs
|
83-85%
|
|
|
|
|
AYVV-Tai, Tw |
AYVV
|
87-88%
|
|
AYVTWV-Tai, Tw |
Different species |
|
TYLCV-IR |
TYLCVs
|
87-90%
|
|
TYLCIRV
|
Different species |
|
TYLCSV-SP1,2 |
TYLCSV
|
87%
|
|
?
|
Different species |
|
TYLCSDV-1,2 |
TYLCVs
|
78-83%
|
|
TYLCSDV-1,2
|
Different species |
|
TYLCV-SP27 |
TYLCVs
|
80-90%
|
|
TYLCSPV
|
Different species |
|
|
TYLCSVs
|
79-85%
|
|
|
|
|
EACMV-CM
|
EACMV-TZ
|
81-85%
|
|
WACMV
|
Different species |
|
EACMV-MW
|
EACMV-TZ
|
84-86%
|
|
SoACMV
|
Different species |
|
SiGMHNV-YV* |
SiGMHNV
|
88%
|
|
SiYVV
|
Different species |
|
SiGMFloV-A11 |
SiGMFloV-A1 |
81%
|
|
?
|
Different species |
|
PYMV-TT
|
PYMV-VE
|
85%
|
|
PYMTTV
|
Different species |
|
PYMV-PA
|
PYMV-VE
|
85%
|
|
PYMPAV
|
Different species |
|
SPLCV-Ipo
|
SPLCV
|
87%
|
|
IYVV
|
Different species |
|
MSV-B*
|
MSV-A
|
87%
|
|
?
|
Different species |
|
MSV-Set *
|
MSV-A
|
73%
|
|
SetSV
|
Different species |
|
MSV-D[Raw] * |
SetSV
|
90%
|
|
SetSV-[Raw] |
A strain of SetSV |
|
MSV-E[Pat] * |
SetSV
|
77%
|
|
DYSV
|
Different species |
|
PanSV-KE*
|
PanSV-Kar
|
83%
|
|
PanSKV
|
Different species
|
Table III: List
of viruses to be renamed according to the 80-90% identity and
50% sharing genome rule.
|
Virus Name |
Virus
|
% identity |
% sharing
|
New name
|
| |