The World of Biological Databases

The world of biological databases is in a gigantic mess, and the problem gets bigger as we go from genome sequences to transcriptomes and other expression- related datasets. The mess is expected to get messier with increasing amount of NGS sequences being available. Let us explain in more detail.

To understand the larger context, we need to start from the principles of evolution. Life on earth originated and evolved from a common ancestor over billions of years. The common ancestor of vertebrates likely appeared hundreds of million years back during Cambrian explosion. Externally, different organisms evolved unique sets of body parts to address different environmental challenges, but those body parts, such as fish fins, bird wings and human legs, came from the same set of genetic tool kits. In that sense, a researcher investigating fish eye should find help from transcriptional studies in human eye or mouse eye.

Unfortunately, various transcriptomal data sets are not easily comparable, and the difficulty gets more acute as we move further away on the evolutionary tree. Often the reason is more political than technical. Scientific communities are divided into different groups such as ‘cancer researchers’, ‘entomologists’, ‘fish geneticists’, ‘neurologists’, ‘plant biologists’, etc. Inter-comparison of data generated by those groups is barely possible at the level of genome sequences, but when it comes to gene expressions, all bets are off.

Here is a large list of various biological databases that were created to till date. We are sure the list is far from complete, because we checked for two or three databases that we are well familiar with and they were not included in the wiki page. For example, the Signal database at Salk Institute is a very useful transcriptome database for plants not mentioned in the wiki page. Rise database from BGI is helpful to researchers working on rice.SpBase hosts data for sea urchin community. Readers are welcome to mention any other useful database that is not included in the wiki page.

Useful links:

Wiki: list of biological databases,

Metadatabase: A database to link to all biological databases,

GEO: NCBI gene expression depository,

SRA: NCBI short read archive also hosts transcriptome data,

Arrayexpress: Gene expression depository at EBI,

emouse: Gene expression database on mouse,

‹»On Discovery of 'God Particle'« »Quarterly Growth of Array Express«›