- Translation of coding regions (CDS) that are annotated on the GenBank (INSDC) sequence records and archived in the Nucleotide database. The records are designated by accession numbers of the following format:
[three-letter alphabetical prefix][five digits][.][version number]
- NCBI staff curates many of the GenBank (INSDC) Protein records into the Reference Sequence (RefSeq) collection. The accession format of the RefSeq proteins is distinctly recognizable.
- NCBI also imports records from the Universal Protein Resource (UniProtKB) consortium. The UniProt help documentation describes UniProt accession number format.
- The Protein Data Bank (PDB) records are those protein sequences that accompany three-dimensional protein structures that are available in the NCBI Structure database. The records are designated with unique PDB ID's.