The invention of various techniques and instruments for analyzing living being at the molecular level has lead to an explosion of scientific data generated by the scientific community. This data cannot be stored on paper. It must be stored, organized, and indexed in an electronic database. In addition we need tools to view, verify, analyze and interface this data with other databases.
An electronic biological database is a large, organized body of persistent data that can be queried to add, update, extract, and remove data. Biological databases have to respond to the needs of its various users. A certain biological data often means very different things to different researchers. For example, a physicist, a biochemist, and a biologist sitting in the same room would be interested in different aspects of the same protein. They might even use different taxonomy to refer to the same protein. Even two biologists would be interested in looking at the protein from different perspectives.
Biological data is often very connected and these connections are essential for comprehension and discovery. A nucleotide sequence is linked to a protein it codes for. Nucleotide sequences are grouped into genes. A gene may code for one protein, several proteins or none at all. This protein might have different names in different species. A protein belongs to protein family and it must be linked to its evolutionary progeny. We would also like to have links to scientific publications related to our protein, find out the methods and instruments used for its discovery, and even the parameters of the instrument used. Researchers frequently repeat experiments conducted by others to verify and improve their processes.
Why do we need biological databases?
Back in the 70s, researchers refered to the "Atlas of Protein Sequences and Structures" by Margaret Dayhoff to find information on their protein of interest. Since then biological has exploded to a point that we can no longer imagine publishing all the data on paper. One of the earliest electronic database was PIR (http://pir.georgetown.edu) which was essentially run by a group of researchers. This was a significant improvement since it offered the advantage of adding, updating, deleting and most importantly searching the data is a much more effecient manner. Today PIR is no longer in service. It is live but it only serves as an archive. It could not cope with the growing demands while databases such as SwissProt are built to cope with the needs..
Today, biology is a data-rich science where each experiment generates enormous amounts of data. We can no longer analyze all this data by a pair of eyes. We need powerful data analysis tools to help us interpret and understand the significance of this data. Biological databases offer data storage facility and various tools which help understand and analyze the data.