Introduction to Databases

History of Databases

It is important to first realize that the organized, systematic methodology of storing records we know and heavily depend on in databases is not a recent invention. What is recent is the computerization of this methodology beginning in the 1960s. Note that even paper-based records, including ledger-based bookkeeping, are (technically) all forms of a database. That is, a database does not necessarily have to be computerized. Computerization only produced a database management system (DBMS), which is obviously several orders of magnitude more powerful, accurate and capable than what a humble ledger or a puny human brain can achieve. And although we are mostly using the term "database" to refer to the DBMS, the two are not the same thing; all thumbs (DBMSs) are fingers (databases), but not all fingers are thumbs.

The ancient Egyptians used elaborate record-keeping systems to keep stock of grain harvests. The Library of Alexandria employed a sophisticated method to keep track of huge numbers of books and scrolls. These were all early examples of databases, although of course their capabilities would be laughable compared to the hugely capable computerized DBMSs of the 21st century.

But even way back in time, back when the entire field of computing was still in its single-celled-organism stage (the 1960s), many people could already visualize that computers would be truly useful if they could provide a way of reliably storing and retrieving data. The development of databases therefore occurred almost in perfect step with the general development and growth of the computing capabilities of the day. As disk capacity and processor speed grew, so did the storage capacity and feature sets of the contemporary database offerings. One important leap that occurred in the mid-1960s was the switch from tape-based storage to direct access storage, or disks. This change allowed multitasking interactive data access, as opposed to the single-operator, batch-type processing necessitated by tapes.

The earliest database systems were navigational in nature. This means that applications processed and read data by using pointers embedded in the data itself. The pointer led to the next data item and could be doubly linked, allowing linkage to both the previous and next data items. This is similar to how hyperlinks work on a Web page by leading the reader to a related Web page from the current one. The two main data models at this time were the hierarchical model epitomized by IBM's IMS system, and the Codasyl, or network, model. But all these were bested and reduced to mere interesting footnotes in history by the emergence of the relational model by a brilliant computer scientist by the name of E.F. Codd.

E.F. Codd and the Relational Model

The relational model was a radical departure from the reigning hierarchical model in that it focused on the ability to search a database by content rather than by following a linked navigation system. This offered the significant advantage of allowing databases to grow and store more and more data, all without having to change or rewrite the applications that accessed that data. Essentially, Codd single-handedly designed a way to divorce the skeleton or structure of the database from the data records held in the database. So elegant was this model that it is the de facto standard for database design to this day, with such databases termed relational databases. There are a few very important non-relational databases (especially with the advent of big data and Web 2.0), but the relational model is still used for the overwhelming majority of commercial database offerings.

Today, E.F. Codd’s name would mostly evoke a nonchalant "E.F. who?" among most people, even many in the IT industry. However, his work has directly led to the huge benefits and efficiency that relational databases provide. His contribution to the world of computing is comparable in scale to that of Sir Isaac Newton’s to the world of physics.

Codd attended Oxford college, studying mathematics and chemistry, then worked as a pilot in the Royal Air Force during WWII before moving to the U.S. in 1948 to work as a mathematical programmer for IBM. After spending a decade in Canada, he returned to the U.S. in 1963 and received his Ph.D. in 1965.

In 1970, Codd published a paper on data management titled "A Relational Model of Data for Large Shared Data Banks" for IBM. The giant company, however, was heavily invested in the hierarchical model via its Information Management System (IMS), and Big Blue executives were not interested in developing a competitor for one of their own lucrative product lines. Showing guile rarely seen in academic or scientific types, Codd slyly showed his model to select IBM customers, who upon viewing it needed little convincing of its superiority. The influential customers in turn put pressure on the very same IBM executives to develop the model and they reluctantly (and, one imagines, seething quietly with fury at Codd) placed the model under development in IBM’s Future Systems project, with the system itself known as System R.

However, the head honchos were still unwilling to threaten IMS, and sabotaged Codd’s work by placing the System R project in the hands of developers who were unfamiliar with it. The developers thus failed to use Codd’s own Alpha language for development, instead electing to use a much simpler language known as SEQUEL. This turned out to be an accidental masterstroke, however, since SEQUEL is much easier to understand and use. For copyright reasons, the name was changed to SQL, and is very familiar to database developers and administrators today as the language of choice for writing database queries.

A shrewd young businessman who was developing his own database system read about SQL at a conference in 1979. He recognized its superiority and copied the language into a database product by his own small company. The businessman had also previously seen Codd’s work on the relational model, and became convinced that it was the way to go for database systems. He based his own product on it, even though IBM refused to share System R’s code with him. Remember, IBM was not interested in the relational model. That small company has grown quite a bit; today it’s known as Oracle Corp. As for the businessman, his name is Larry Ellison, and his conviction helped him become one of the richest people in the world. It just goes to show how badly IBM miscalculated the potential of Codd’s relational model. In fact, Oracle DB is the most widely used relational database for corporations today.

Share this:
Written by Dixon Kimani
Profile Picture of Dixon Kimani
Dixon Kimani is an IT professional in Nairobi, Kenya. He specializes in IT project management and using technology to solve real-world business problems. He is also an avid freelance technical writer who specializes in IT and how to use technology to improve organizational efficiency. Full Bio