dictyBase SOPs: GO Curation
Return to SOPs Index

Gene Ontology (GO) Curation

Last updated April 3, 2006
General Info General Guidelines
The With/From Column
dictyBase Unpublished References
Annotation Use of Evidence Codes
The NOT Qualifier
Data Not Shown
Confusing GO Terms
Notes Similarity-Based GO Annotation
GO issues in need of further discussion
Gene Ontology Resources


General Guidelines [TOP]
Always annotate to all three ontologies:
In other words, annotate to biological_process, molecular_function, and cellular_component. If the term is not known, use the "unknown" term:
  • biological_process unknown ; GO:0000004
  • molecular_function unknown ; GO:0005554
  • cellular_component unknown ; GO:0008372
[this practice currently under review by the Gene Ontology Consortium]

Multiple annotations:
  • In general, dictyBase practices multiple annotations to a term with different references.
  • When a paper shows the same thing twice using two very different assays resulting in the same annotation with two different evidence codes, annotate to that term twice. If the assays result in terms with different granularity, annotate to parent and child term.

The With/From Column [TOP]
Entering information in the with/from column requires certain formatting:
  • When entering multiple database objects in the with field, do not enter the database for the first db object, however, you must enter the database for subsequent db objects. For example:
    DDB:(selected db in drop-down)     DDB0185021|DDB:DDB0232349|DDB:DDB0215364
    
  • When using the IC (Inferred by Curator) evidence code, you must enter a GO ID in the with/from field. When doing so, it is imperative that you enter all seven digits of the GO ID. For example, enter GO:0003700 rather than GO:3700. Failure to do this will result in an improper display.

Entering database objects with the IPI evidence code:
  • If there is only one possible gene product that can be entered in the with column (using a specific protein binding term, e.g., Rho GTPase binding, profiling binding, etc., as opposed to GO:5515 protein binding), use IDA. [Specific examples?]
  • If there are multiple possible gene products that can be entered in the with column, use IPI with the specific gene product. [Specific examples?]

dictyBase Unpublished References [TOP]
  • ND: dictyBase 'No biological Data' Unpublished (reference_no=9851)
  • ISS: dictyBase 'Inferred from Sequence or structural Similarity' Unpublished (reference_no=10155)
  • IC: dictyBase 'Inferred by Curator' Unpublished (reference_no=11067)
  • NAS: (unpublished information from authors) dictyBase (2005) 'Personal communication to dictyBase' Unpublished (reference_no=11050; note this reference changes each calendar year)

Use of Evidence Codes [TOP]
IC (Inferred by Curator):
  • TAS versus IC: A more liberal use of the IC evidence code is a recurring theme from the Stanford GO Annotation Camp. IC is newer than TAS and so some people are just not accustomed to using it since TAS was in use for so long. TAS is now mostly used for statements in reviews.
  • When an author states that something is something-or-other (based on orthology or other evidence), rather than using TAS (or when ISS is inappropriate), use IC. Example genes: nox genes, fpaA/B.

ISS (Inferred from Sequence or Structural Similarity):
  • Change dictyBase unpublished (reference_no=10155) to a published reference when it is published. If the dictyBase unpublished ISS annotation is more granular than the published ISS annotation, it is okay to keep the dictyBase unpublished.
  • If the first paper for a gene has an ISS annotation and then a second paper shows experimental evidence for that same term, annotate with both the original ISS annotation and the new experimental evidence code.
  • If a single reference has ISS plus experimental evidence for a GO term, use only the experimental evidence code. However, if the ISS annotation is more granular than the experimental annotation, use both evidence codes.

TAS (Traceable Author Statement):
  • Do not make multiple TAS annotations for a gene to the same term.
  • Once an experimental evidence code (IDA, IMP, IPI, IEP, IGI) has been entered for an annotation (or more granular annotations), annotations using the TAS evidence code can be deleted.

ND (No Biological Data Available):
  • When a function/process/component is 'unknown,' use dictyBase ND (reference_no=9851).
  • When an author explicitly states that a function/process/component is unknown, annotate the gene product with ND using the reference_no of the publication (this is already part of the GO documentation).

The NOT Qualifier [TOP]
The NOT qualifier is only for truly unexpected results. For example, mlkA is by similarity and function classified as a CAM kinase but it has been shown that it is not regulated by Ca2+/calmodulin, hence the NOT annotations for GO:5516: calmodulin binding and GO:4685: calcium- and calmodulin-dependent protein kinase activity. Negative results from general tests should not be annotated.

Data Not Shown [TOP]
"Data not shown" is acceptable for annotations with experimental evidence codes. Since publications are all peer-reviewed, these statements are presumably reliable.

Confusing GO Terms [TOP]
  • cytoplasm vs. cytosol:
  • cell-matrix adhesion vs. cell-substrate adhesion:
  • cell motility vs. cell migration vs. chemotaxis:
  • development vs. fruiting body formation (or a more specific development term):
  • actin cytoskeleton vs. actin filament: If a gene product colocalizes with actin, annotate to 'actin cytoskeleton' rather than 'actin filament.' Based on the definition, the only gene product that should use 'actin filament' is actin.
  • microtubule cytoskeleton vs. microtubule: If a gene product colocalizes with microtubules, annotate to 'microtubule cytoskeleton' rather than 'microtubule.' Based on the definition, the only gene products that should use 'microtubule' are alpha and beta tubulin.

Similarity-Based GO Annotation [TOP]
  • Using top hits from GOst search and BLAST vs. UniProt, nr, InterPro, and Pfam, look at GO annotations of these sequences; ISS with that database record and use reference "dictyBase 'Inferred from Sequence or structural Similarity' Unpublished" (reference_no=10155).
  • If no non-IEA/ISS/NAS annotations exist for these top hits, you may use the sequence record in the with column, but in this case try to find a reference that provides evidence for the process/function/component (need to import PMID first, then make this annotation).
  • If you have a good annotation for a function and you can logically and confidently infer that the gene product participates in a process or localizes to a cellular component based on other annotations, use the IC evidence code (for example, a protein annotated with function "DNA binding" can be annotated with IC component "nucleus").
  • Alternatively, if you have good hits with InterProScan or ProSite, you may ISS with those records that have GO annotations. (See also InterPro2go and EC2go mappings.)
  • ISS may be done with molecular_function, however, biological_process and cellular_component terms must be used carefully. Very general process terms may be used, and component terms should be discussed.
  • See also the full notes on Similarity-Based Curation.

GO issues in need of further discussion [TOP]
  • Use of 'cytosol' vs. 'cytoplasm.' Cytosol is part of cytoplasm.
    • GO:5737 cytoplasm: All of the contents of a cell excluding the plasma membrane and nucleus, but including other subcellular structures.
    • GO:5829 cytosol: That part of the cytoplasm that does not contain membranous or particulate subcellular components.
  • Annotation of controls/markers in experiments [PMID: 15800059]. If a result for the investigated gene(s) is based on the (also shown) experiment with a known gene product, the latter serves as a control and to interpret the results for the gene product(s) in question. These ‘control’ genes should not be annotated. As an exception, when the known gene does not have any experimental annotation for that term, the ‘control’ experiment can be used to add that annotation (in order to add high quality anotations to more genes efficiently).
  • Use of IGI: what constitutes a "genetic interaction?" Obviously a double mutant is a genetic interaction, but what about overexpression of one protein in a mutant background of another protein? This is also an issue with the literature topics: Genetic Interactions vs. Mutant/Phenotypes.
  • 'Colocalizes_with' is still a tricky issue. Our general consensus is that if something is shown truly transiently, we should use that qualifier. If something is present in a particular location throughout the course of the experiments in a paper, do not use the qualifier. What about "fuzzy" annotations?
  • Use of two different evidence codes for the same term in one reference is a common practice. What about parent/child terms using the same reference and the same evidence code?
  • IMP vs. IDA (issue #0054).
  • ND and IEA in same GO aspect (issue #0063). Similar issue: unknown and IEA (issue #0055).

Gene Ontology Resources [TOP]

AmiGO Gene Ontology Browser

Curate GO using dictyBase curation tools

dictyBase Evidence Codes for the Gene Ontology

dictyBase Gene Ontology help file

Download OBO-Edit or DAG-Edit

Gene Ontology Consortium

Gene Ontology Curator Requests

Gene Ontology Annotation Issues

Gene Ontology Project at SourceForge.net

Home| Contact dictyBase| SOPs| Site Map  Supported by NIH (NIGMS and NHGRI)