dictyBase Help: Download BLAST Databases

dictyBase Help: Download BLAST Databases


Contents



Description

All dictyBase BLAST databases can be downloaded as FASTA files. While the the "chromosomal DNA" and "EST" datasets are updated when new versions become available, all other files are updated weekly. For the most up to date information it is advisable to download desired files frequently.


Databases

For BLAST searches and downloads, dictyBase provides several different databases holding DNA or protein sequences.

Coding Sequences (CDS)

A DNA coding sequence is the region of nucleotides that corresponds to the sequence of amino acids of the predicted protein sequence. The DNA coding sequence includes the start and stop codons, and thus begins with an "ATG" and ends with a stop codon. If the start or stop codon is missing, this indicates that only a partial coding sequence is available. Note that the DNA coding sequence does not correspond to an actual mRNA. These are the coding sequences of the best quality sequence available for a given gene. In case a gene has a curated gene model, the database contains this sequence. Genes that are not yet curated are represented by the gene prediction of the Sequencing Center. In addition, if a gene that is in GenBank has not been mapped to the genome, the sequence from GenBank is contained in the database. In case a gene has more than one transcript, all transcripts are represented.

Genomic Sequences

This database contains the genomic sequences of all genes as described above in the Coding Sequence section. The genomic sequence in general is described as the gene sequence containing all exons and intron plus 1,000 base pairs at each end 5' and 3'. Note that this can mean, as in Dictyostelium gene density is quite high, that the genomic sequence of one gene overlaps with its neighbor, resulting in two partial hits in a blast search. Note also that 1,000 base pairs are only present when available, which might not be the case at the end of a contig or for a non-mapped GenBank record.

Protein Sequences:

This is the protein translationof the DNA "coding sequences (CDS)".

EST Sequences

This database contains EST sequences from the Japanese Sequencing Project as obtained from GenBank, and additional EST sequences contributed by H. Urushihara by direct submission to dictyBase.

Full Chomosomes 1,2,3,4,5,6,M

The entries in this database are the full length chromosomes in dictyBase. In addition to chromosomes 1, 2, 3, 4, 5, 6, and M (mitochondrial), this includes 'floating contigs' which are long stretches of DNA that have been sequenced but have not been fit into an assembly yet. These contigs are in two large arbitrary concatemers, 2F and 3 F, from chromosome 2 and 3, respectively.


Relevant dictyBase Help Documents

Associated Glossary Terms:



Home| Contact dictyBase| SOPs| Site Map  Supported by NIH (NIGMS and NHGRI)