E2P2 v3.1 - Ensemble Enzyme Prediction Pipeline version 3.1

The functional annotation of protein sequences was performed by the PMN Ensemble Enzyme Prediction Pipeline (E2P2, version 3.1). E2P2 annotates protein sequences using homology transfer by integrating both single sequence (BLAST, E-value cutoff <= 1e-2) and multiple sequence (Priam) models of enzymatic function. The ensemble algorithm relies on an maximum weighted integration scheme (absolute weight cutoff >= 0.5) where the weight of each predicted model was determined via a 5-by-3 nested cross-validation routine. The training of E2P2 and the reference databases used in the annotation process are based on the Reference Protein Sequence Dataset (RPSD) 3.1. Data for RPSD was compiled from protein sequences with experimental support of existence from SwissProt, MetaCyc, and BRENDA.