For many years, relational databases have dominated database management. But not anymore. Today, non-relational, or NoSQL, databases have become a prominent alternative model for database managers. Why? They're cheaper, they're more flexible, they require less management and they're more scalable (something that's becoming increasingly important with the growth of big data).
Here we'll take an introductory look at this growing form of database management.
Some Background on Database Management
A database is a collection of data records in an organized form. In order to store, access and manipulate this data, we need a structure. This structure can be anything from a simple file system to a sophisticated database management system (DBMS). Both have their own set of advantages and disadvantages, but a DBMS is usually preferred for reasons like:
- A DBMS can manage extensive amounts of data.
- System backup, recovery and data-restore functionality is often not supported in a file system.
- Redundancy of data is well taken care of in a DBMS, which prevents the misuse of memory.
- A database can be better equipped with security measures to preserve the state and integrity of the data stored.
- A DBMS supports multi-user access and takes care of the concurrency issues.
- A DBMS provides multiple user views along with several layers of abstraction.
- A DBMS maintains the core ACID properties in the retrieval and updation of data.
Most modern database systems are relational database management systems (RDBMS), in which data resides in tables with minimum duplication. As the name implies, a relational database establishes relationships between data and allows the same data to be extracted from and viewed in the databases in many different ways. However, this control comes at a cost. (For more background reading, check out An Introduction to Databases.)
What Is NoSQL?
NoSQL is a type of database that does not adhere to the widely used relational database management model. In other words, NoSQL databases are not primarily built on tables, and unlike a RDBMS, they do not use SQL to manipulate data – hence the name. NoSQL was created as a support for SQL, not as its replacement. It is based on a model that is less stringent and does not essentially follow a fixed schema. It also may not stick to the ACID properties, and there is no concept like JOIN, unlike in most of the RDBMSs.
A great definition of NoSQL comes from nosql-database.org, which defines the term as:
Next generation databases mostly addressing some of the following points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more.
The History and Roots of NoSQL
Just to get a bit more confusing, there is a RDBMS called NoSQL. It has its roots in the 1990s and was created by Carl Strozzi. This is actually completely different; it's a relational system, but without an SQL interface. Strozzi has commented that what is currently known as NoSQL should actually have been called "NoREL," or something to that effect.
The modern-day movement of NoSQL that we are concerned with revolves around IT giants like Facebook, Google and Amazon, and their need for a database that can scale with the enormous amounts of data they were producing. Of course, the buzzword for this has evolved into what is known as big data, of which NoSQL is a huge part. (To learn more, read The Evolution of Big Data.)
Without getting too specific about the dates, in the 2000s, but especially the last half of the decade, nearly every major web company was involved in NoSQL in one way or another, with database management systems such as BigTable, CouchDB, Amazon Dynamo, MongoDB, Cassandra and Hadoop, among many, many others (see a history here, and a great listing here). What kicked off the NoSQL name was its use by a Rackspace employee by the name of Eric Evans in 2009. He used the name for a meetup regarding "open source, distributed, non-relational databases." After that, the name just stuck.
Why Use NoSQL?
Why do we use NoSQL at all when we have our good, old RDBMSs? The answer is, in some cases, a RDBMS does not suffice, while in others it is overkill. Here are a few drawbacks to relational databases that can make NoSQL a better solution:
- An application might need to store data in a hierarchical network or tree structure.
- You might just want to store the elements of the application in a persistent storage consistently, but RDBMS can be pretty expensive in terms of cost and resources.
- NoSQL fits best when the application entities need a query capability in them.
- RDBMS might fail if you are working on a distributed database or on a cloud-based application for availability and durability.
- NoSQL does not require a rigid schema definition or the storage of metadata to supplement the existing data.
The key thing to understand is that there was a need driven by the massive amounts of data and the change in how databases were required. As the web got more social, it wasn't just about reads, but writes, and how to scale that. In these instances, NoSQL is superior to a more traditional RDBMS.
The Bottom Line on NoSQL
Following in the footsteps of internet giants, many companies and organizations that deal with massive chunks of data are using NoSQL along with their existing DBMSs for higher performance and efficiency. If you are dealing with high-volume web applications, you'll probably need a strong understanding of NoSQL.
For smaller organizations, the value of NoSQL isn't quite as strong, especially since there are serious challenges in terms of implementing it, including lack of support, expertise, administration, analytics and business intelligence. The fact of the matter is that the average small business isn't generating petabytes of data every day. That said, NoSQL's popularity is growing, and is likely to continue to be an increasingly important tool and skill for database managers, so it doesn't hurt to at least know the basics.