Biological databases are stores of biological information. The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases. can be used to browse and search several biological databases. Furthermore, the developed by the National Institute of Allergy and Infectious Diseases (NIAID) enables searching across databases.

Meta databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism. Originally, metadata was only a common term referring simply to data about data such as tags, keywords, and markup headers.

Model organism databases

Model organism databases provide in-depth biological data for intensively studied organisms.

Nucleic acid databases

DNA databases

The primary databases make up the International Nucleotide Sequence Database (INSD). The include:

DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide sequence data from all organisms. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

Secondary databases are:[clarification needed]

  • HapMap
  • OMIM (Online Mendelian Inheritance in Man): inherited diseases
  • RefSeq
  • 1000 Genomes Project: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
  • a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation.

Other databases

Gene expression databases

Generic gene expression databases

Microarray gene expression databases

Genome databases

These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

Phenotype databases

RNA databases

Amino acid and protein databases

(See also: List of proteins in the human body)

Several publicly available data repositories and resources have been developed to support and manage protein related information, biological knowledge discovery and data-driven hypothesis generation. The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the UniProtKB. Most of these databases are cross-referenced with UniProt / UniProtKB so that identifiers can be mapped to each other.

Proteins in human:

There are about ~20,000 protein coding genes in the standard human genome. (Roughly ~1200 already have Wikipedia articles - the Gene Wiki - about them) if we are Including splice variants, there could be as many as 500,000 unique human proteins

Different types of Protein databases

DB nameDB websiteProviderData sourcesRevenue/Sponsors sourcesIntegratesDesc.SizeDB typeActively maintained
InterProELIXIR infrastructureEuropean Bioinformatics InstituteEMBL, The Welcome trust, BBSRCCATH-Gene3D, CDD, HAMAP, MobiDB, PANTHER, Pfam, SMART, SUPERFAMILY, SFLD, TIGRFAMs,classifies proteins into families and predicts the presence of domains and sitesProtein sequence databasesYes
NeXtProtCALIPHO (is a group at the SIB)Swiss Institute of BioinformaticsUniProt, Cellosaurus, Gnomad, IntAct, SRAA Atlas, Uniprot - GOA, BGEE, COSMIC, MassIVE, Peptide atlasa human protein-centric knowledge resourceProtein sequence databasesYes
Wiki-PiMadhavi K. GanapathirajuAt present Wiki-Pi contains 48,419 unique interactions among 10,492 proteins. However it is not clear if this is unique proteins[13][clarification needed]Protein interaction Database??
Human Protein Reference DatabaseInstitute of Bioinformatics (IOB), Bangalore, IndiaOne source claims 15000 proteins. But it is unclear how many of these are unique
PfamSanger Instituteprotein families database of alignments and HMMsProtein sequence databases
Human ProteinpediaInstitute of Bioinformatics (IOB), Bangalore and Johns Hopkins University,The human Proteinpedia is based on HPRD (Human protein reference database)which is a repository hosting over 30,000 human proteins. However it is unclear how many of these are unique proteins
Human Protein AtlasThe Swedish GovernmentIt contains roughly 10 million IHC images of a bit less than 25,000 antibodies. But once again it is unclear how many of these are unique
PRINTSManchester Universitya compendium of protein fingerprintsProtein sequence databases
PROSITEdatabase of protein domains, families and functional sitesProtein sequence databases
Protein Information ResourceGeorgetown University Medical Center [GUMC]Protein sequence databases
SUPERFAMILYlibrary of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organismsProtein sequence databases
Swiss-ProtSwiss Institute of Bioinformaticsprotein knowledgebaseProtein sequence databases
Protein Data Bank(Research Collaboratory for Structural Bioinformatics - Protein Data Bank)Protein DataBank in Europe (PDBe), ProteinDatabank in Japan (PDBj), and RCSB(PDB)Protein structure databases
Structural Classification of Proteins (SCOP)Protein structure databases
CATH databaseProtein structure databases
ModBaseSali Lab, UCSFdatabase of comparative protein structure modelsProtein model databases
SIMAPdatabase of protein similarities computed using FASTAProtein model databases
Swiss-modelserver and repository for protein structure modelsProtein model databases
AAindexdatabase of amino acid indices, amino acid mutation matrices, and pair-wise contact potentialsProtein model databases
BioGRIDSamuel Lunenfeld Research Institutegeneral repository for interaction datasetsProtein-protein and other molecular interactions
RNA-binding protein databaseProtein-protein and other molecular interactions
Database of Interacting ProteinsUniv. of CaliforniaProtein-protein and other molecular interactions
IntActEMBL-EBIopen-source database for molecular interactionsProtein-protein and other molecular interactions
Stringan open source molecular interaction database to study interactions between proteinsProtein-protein and other molecular interactions
Human Protein AtlasHuman Protein Atlasaims at mapping all the human proteins in cells, tissues and organsProtein expression databases
ProteinModelPortal????3D structure protein databases
SWISS-MODEL RepositoryUniversity of BaselThe Swiss government3D structure protein databases
DisProtELIXIR infrastructureIndiana University School of Medicine, Temple University, University of Paduafunding from the European Union's Horizon 2020Swiss Prot/Uni Prot, CATH, Pfam, Europe PMC, BITEM, ECO, Geneontologydatabase of experimental evidences of disorder in proteins3D structure protein databases, Protein sequence databases
MobiDBJohn Moult, Christine Orengo, Predrag RadivojacUniversity of PaduaItalian Governmentdatabase of intrinsic protein disorder annotation3D structure protein databases, Protein sequence databases
ModBaseUrsula Pieper, Ben Webb, Narayanan Eswar, Andrej Sali Roberto SanchezUCSF, Sali Lab3D structure protein databases
PDBsumEuropean Bioinformatics Institute 2013Wellcome Trust3D structure protein databases
CCDSNCBI??Sequence databases
UniProtKB????Sequence databases
Swiss Prot/Uni ProtandSIB Swiss Institute of BioinformaticsEuropean Bioinformatics Institute (EMBL-EBI)Swiss-Prot has collected over 81 000 variants in roughly 13,000 human protein sequence records from peer-reviewed literature. It is unclear how many unique proteins types are present in the database.

Other protein database links

Signal transduction pathway databases

Metabolic pathway and protein function databases

Taxonomic databases

Numerous databases collect information about species and other taxonomic categories. The Catalogue of Life is a special case as it is a meta-database of about 150 specialized "global species databases" (GSDs) that have collected the names and other information on (almost) all described and thus "known" species.

  • BacDive: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information
  • Catalogue of Life: a meta-database of all species on earth
  • EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences
  • NCBI Taxonomy: a taxonomic database operated by NCBI and concentrating on all taxa for which DNA sequences are available (those sequences are stored by GenBank, another database operated by NCBI).

Image databases

Images play a critical role in biomedicine, ranging from images of anthropological specimens to zoology. However, there are relatively few databases dedicated to image collection, although some projects such as iNaturalist collect photos as a main part of their data. A special case of "images" are 3-dimensional images such as protein structures or 3D-reconstructions of anatomical structures. Image databases include, among others:

Radiologic databases

Additional databases

Exosomal databases

  • ExoCarta
  • Extracellular RNA Atlas: a repository of small RNA-seq and qPCR-derived exRNA profiles from human and mouse biofluids

Mathematical model databases

  • Biomodels Database: published mathematical models describing biological processes
  • : published, community-contributed, and educational multi-scale and multicellular models for systems biology

Databases on antimicrobial resistance rates and antibiotic consumption

Databases on antimicrobial resistance mechanisms

Wiki-style databases

Specialized databases

  • Barcode of Life Data Systems: database of DNA barcodes
  • Bacterial Pesticidal Protein Database
  • The Cancer Genome Atlas (TCGA): provides data from hundreds of cancer samples obtained using high-throughput techniques such as gene expression profiling, copy number variation profiling, SNP genotyping, genome-wide DNA methylation profiling, microRNA profiling, and exon sequencing of at least 1,200 genes
  • Cellosaurus: a knowledge resource on cell lines
  • CTD (Comparative Toxicogenomics Database): describes chemical-gene-disease interactions
  • DiProDB: a database to collect and analyse thermodynamic, structural and other dinucleotide properties
  • web-based tool for searching cell specific candidate reference genes/transcripts suitable for qPCR experiment normalization. HRT Atlas also describes a complete list of human and mouse housekeeping genes and transcripts
  • Dryad: repository of data underlying scientific publications in the basic and applied biosciences
  • Edinburgh Mouse Atlas
  • EPD Eukaryotic Promoter Database
  • FINDbase (the Frequency of INherited Disorders database)
  • GigaDB: repository of large scale datasets underlying scientific publications in the biological and biomedical research
  • HGNC (HUGO Gene Nomenclature Committee): a resource for approved human gene nomenclature
  • International Human Epigenome Consortium: integrates epigenomic reference data from well-known national endeavors such as the Canadian CEEHRC, European Blueprint, European Genome-phenome Archive (EGA), US ENCODE and NIH Roadmap, German DEEP, Japanese CREST, Korean KNIH, Singapore's GIS and China's EpiHK
  • MethBase: database of DNA methylation data visualized on the UCSC Genome Browser
  • Minimotif Miner: database of short contiguous functional peptide motifs
  • Oncogenomic databases: a compilation of databases that serve for cancer research
  • PubMed: references and abstracts on life sciences and biomedical topics
  • RIKEN integrated database of mammals
  • TDR Targets: a chemogenomics database focused on drug discovery in tropical diseases
  • TRANSFAC: a database about eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles
  • JASPAR: a database of manually curated, non-redundant transcription factor binding profiles.
  • : a database about methionine sulfoxidation sites and its functional roles in proteins
  • Healthcare Cost and Utilization Project (HCUP) is the largest collection of hospital care data in the United States. It includes hundreds of millions of inpatient, outpatient, and emergency records.
  • curates descriptions of biological experiments from PMC articles.
  • Bovine Metabolome Database is a free web database that lists known bovine metabolites
  • Mouse Models of Human Cancer database is an curated database of mouse models of human cancer

External links

  • – over 1,600 databases