PlantGDB's Comprehensive plant Gene Annotation Tool (CpGAT) workflow allows users to annotate any genomic region using any combination of transcript and protein datasets. The pipeline uses EVM (EVidence Modeler) to evaluate transcript- and ab initio-derived exons, and incorporates PASA to derive UTR regions and alternative splice variants. The pipeline outputs a GFF3-formatted file of gene structures.
How it works: The user chooses a genome region to annotate, and then selects transcript datasets (same species and/or related species) and/or protein datasets (related species) based on taxonomic similarity to the genome of interest. The user selects a splice-site model as close as possible representing the species of interest. The pipeline then does the following:
A new BioExtract Server workflow was developed in parallel with CpGAT, allowing the execution and customization of the cpGAT pipeline from within The BioExtract Server. This page contains the documentation for the specialized tools created to implement the CpGAT workflow on the BioExtract Server. Here is a summary of the workflow steps. Some of these steps may be repeated at different points throughout the workflow.