There are two types of databases developed and maintained by the PMN project: a reference database called PlantCyc and single species/taxon databases, such as AraCyc.
PMN databases (created and/or maintained at the PMN)
PlantCyc 7.0
PlantCyc is a metabolic pathway reference database containing more than 800 pathways and their catalytic enzymes and genes, as well as compounds from over 300 plant species (See Database Statistics).
The majority of the pathways have been curated from experimental literature by curators at the PMN and collaborators' sites. In addition, PlantCyc includes hypothetical pathways that are published in peer-reviewed journals based on the educated conjectures of experts, and computationally predicted pathways that have been manually validated by PMN curators.
Similarly, enzymes in PlantCyc may have experimental support or may be based solely on computational predictions.
For both pathways and enzymes, evidence codes are assigned to clearly indicate the type of support associated with these database items.
- PlantCyc Pathways
- PlantCyc Content Statistics
- Taxonomic range:primarily Viridiplantae
- Protein sequence source: Varies according to source: All enzymes have been imported from PMN databases, MetaCyc, LycoCyc, RiceCyc, or MedicCyc.
- Enzyme functional annotation method: Varies according to source: All enzymes have been imported from PMN databases, MetaCyc, LycoCyc, RiceCyc, or MedicCyc.
- Enzyme evidence:
- Substantial manual curation of enzymes
- In addition, large-scale computational predictions of enzyme function not subject to curator review
- Pathway prediction program: All pathways have been manually imported from PMN databases or MetaCyc
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Substantial manual curation of pathways
- Some computational prediction of pathways followed by curator review
- Additional pathways imported from MetaCyc based on curator inference
AraCyc 10.0
AraCyc is the most highly curated species-specific database present at the PMN. It has a large number of experimentally supported enzymes and metabolic pathways, but it also houses a substantial number of computationally predicted enzymes and pathways
.
- AraCyc Pathways
- AraCyc Content Statistics
- Taxonomic range: Arabidopsis thaliana col
- Protein sequence source: TAIR: various genome releases including
TAIR10_pep_20110103_representative_gene_model
- Enzyme functional annotation method: Various methods over time including E2P2 1.0 and process described in Zhang 2010
- Enzyme evidence:
- Substantial manual curation of enzymes
- In addition, large-scale computational predictions of enzyme function not subject to curator review
- Pathway prediction program: several versions of Pathway Tools
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Substantial manual curation of pathways
- Some computational prediction of pathways followed by curator review
- Additional pathways imported from MetaCyc based on curator inference
More information on AraCyc curation is available.
________________________________________________________________________ Top
CassavaCyc 2.0
- CassavaCyc Pathways
- CassavaCyc Content Statistics
- Taxonomic range: Manihot esculenta esculenta
- Protein sequence source: Phytozome 7.0: Mesculenta_147_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
ChlamyCyc 2.0
- ChlamyCyc Pathways
- ChlamyCyc Content Statistics
- Taxonomic range: Chlamydomonas reinhardtii
- Protein sequence source: NCBI (organellar) and Phytozome 7.0 (nuclear) JGI: Creinhardtii_169_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Primarily large-scale computational predictions of enzyme function not subject to curator review supplemented with data and references curated by the GoFORSYS project at the Max Planck Institute of Molecular Plant Physiology
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and CVP 1.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc and ChlamyCyc 1.0 (created by the GoFORSYS project at the Max Planck Institute of Molecular Plant Physiology) based on experimental evidence or curator inference
CornCyc 2.0
- CornCyc Pathways
- CornCyc Content Statistics
- Taxonomic range: Zea mays mays
- Protein sequence source: Maizesequence.org (filtered set):http://ftp.maizesequence.org/current/filtered-set/
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc and MaizeGDB based on experimental evidence or curator inference
GrapeCyc 2.0
- GrapeCyc Pathways
- GrapeCyc Content Statistics
- Taxonomic range: Vitis vinifera
- Protein sequence source: Phytozome 7.0: Vvinifera_145_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
________________________________________________________________________ Top
MossCyc 1.0
- MossCyc Pathways
- MossCyc Content Statistics
- Taxonomic range: Physcomitrella patens
- Protein sequence source: Phytozome 7.0: Ppatens_152_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
PapayaCyc 1.0
- PapayaCyc Pathways
- PapayaCyc Content Statistics
- Taxonomic range: Carica papaya
- Protein sequence source: Phytozome 7.0: Cpapaya_113_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
PoplarCyc 5.0
- PoplarCyc Pathways
- PoplarCyc Content Statistics
- Taxonomic range: Populus trichocarpa and other Populus species and hybrids
- Protein sequence source: Phytozome 7.0: Ptrichocarpa_156_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
More information on PoplarCyc curation is available.
________________________________________________________________________ Top
SelaginellaCyc 1.0
- SelaginellaCyc Pathways
- SelaginellaCyc Content Statistics
- Taxonomic range: Selaginella moellendorffii
- Protein sequence source: Phytozome 7.0: Smoellendorffii_91_peptide.fa
- Enzyme functional annotation method: E2P2 1.0
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
SoyCyc 3.0
- SoyCyc Pathways
- SoyCyc Content Statistics
- Taxonomic range: Glycine max
- Protein sequence source: Phytozome 7.0: Gmax_109_peptide.fa and JGI:Glyma1 (10-20-08)
- Enzyme functional annotation method: E2P2 1.0 and process described in Zhang 2010
- Enzyme evidence:
- Almost exclusively large-scale computational predictions of enzyme function not subject to curator review
- Small number of manually curated enzymes
- Pathway prediction program: Pathway Tools 12.5 and 15.0
- Automated pathway validation lists:
NPP 3.0 and UPP 3.0
- Pathway evidence:
- Primarily computational prediction followed by curator review
- Additional pathways imported from MetaCyc based on experimental evidence or curator inference
To see comparable data for past releases, please see our Database Overview Archive.
________________________________________________________________________ Top
External Metabolic Pathway Databases
Although several external single species databases are affiliated with the PMN, those databases are developed and maintained exclusively by our collaborators. Some of the data from those databases has been incorporated into PlantCyc, as described in our release notes. We provide collaborator contact information and direct links to their homepages below.
Gramene databases
Noble Foundation database
SGN (Solanaceae Genomics Network) databases
SRI International databases
SRI International plays a special role as a collaborator.
Through the MetaCyc database, SRI provides some of the data content for PlantCyc. Prior to the creation of PlantCyc, any plant metabolic pathways with experimental or literature support were entered into MetaCyc by curators from the PMN or collaborating groups. More information about the inclusion of MetaCyc data in PlantCyc is described in our release notes.
- MetaCyc (one database / many organisms)
SRI also provides access to a number of externally generated species-specific databases from outside the plant kingdom through the BioCyc website. They are not part of the PMN, but they may prove useful for comparative studies.
- BioCyc (multiple databases / many organisms)
The Pathway Tools software used for displaying existing PMN databases and for predicting new databases is also generated and supported by the Bioinformatics Research Group at SRI International.
_______________________________________________________________________________________
Enzyme functional annotation method
E2P2 1.0 - Ensemble Enzyme Prediction Pipeline 1.0:
The functional annotation of protein sequences was performed by the in-house Ensemble Enzyme Prediction Pipeline (E2P2, version 1.0). E2P2 systematically integrates results from three molecular function annotation algorithms using an ensemble classification scheme. For a given genome, all protein sequences are submitted as individual queries against the base-level annotation methods. The individual methods rely on homology transfer to annotate protein sequences, using single sequence (BLAST, E-value cutoff <= 1e-30, subset of SwissProt 15.3) and multiple sequence (Priam, November 2010; CatFam, version 2.0, 1% FDR profile library) models of enzymatic functions. The base-level predictions are then integrated into a final set of annotations using an average weighted integration algorithm, where the weight of each prediction from each individual method was determined via a 0.632 bootstrap process over 1000 rounds of testing. The training and testing data for E2P2 1.0 and the BLAST reference database were drawn from RPSD 1.0 (Reference Protein Sequence Database 1.0). These highly trusted protein sequences were obtained from SwissProt release 15.3. Attempts were made to limit the dataset to proteins that have experimental support of their
existence.