Ab initio gene finding in maize.
Training gene finders to annotate the maize genome.
Funded by NSF
NSF-PGRP #0501758
Project Objectives:
- Construct a "training-set" of maize gene sequence and
use this to modify the TWINSCAN probability model. TWINSCAN is a dual-genome
de novo prediction program that integrates traditional probability models with
information from the alignments between two genomes.
- Benchmark TWINSCAN and compare its performance to that
of other commonly used gene finders.
- Select a subset of maize hypothetical gene models
predicted by TWINSCAN for wet-bench verification.
- Release the maize specific version of TWINSCAN to the
scientific community as open source software.
- Use maize TWINSCAN to predict genes in available maize
genomic sequence. Provide public access to the predictions.
- The bioinformatics analysis conducted during the course
of this project is well suited as a training tool to future bioinformatics
students. We have entered into an out-reach collaboration with Truman State
University which includes training of student interns, seminars and invited
lectures integrated into Truman's bioinformatics courses.
Experimental Approaches:
- Computationally identify high quality maize genes from
public and proprietary collections of maize cDNA and genomic sequence. The
identified collection of annotated maize genes will serve as a training set
for modification of TWINSCAN.
- Use TWINSCAN to predict genes in available maize public
genome sequence.a The results will be displayed in a web-based genome browser
hosted at the Danforth Center.
- Benchmark TWINSCAN relative to other commonly used ab
initio gene-finders, and re-train (when feasible) other open source gene
prediction packages for maize.
- Wet-bench verify a subset of TWINSCAN predicted maize
hypothetical gene models. Testing hypothetical predictions provides a second
measure of TWINSCAN's specificity. Comparing experimentally verified models to
predictions will effect further re-training.
Practical Applications of
Research:
Maize genome sequence is the knowledge infrastructure for
the next generation of plant molecular genetics and crop improvement. A broad
understanding of the genes present in maize would provide the identities, and
eventually the map positions, of many of the genes responsible for controlling
agronomically important traits. Therefore, high throughput computational tools
that can accurately identify genes within maize genomic sequence are absolutely
necessary for annotating and understanding the maize genome. This project also
illustrates the relative power of industry and academia collaborations focused
on solving problems that plague research and development in both communities,
and it serves as a paradigm for future industry-academia cooperative research
activities.
| | |
|
| 975 N. Warson Rd. · St. Louis, Missouri 63132 · 314-587-1211 Karel Schubert: Project Coordinator · maize@danforthcenter.org 2009© Donald Danforth Plant Science Center · All rights reserved | |
|
| | |