Introduction to Databases


Congrats - you've made it this far!

We've literally just scratched the surface. After all, many people devote an entire professional career to databases; there is a lot to learn! Let's recap what we've covered here:

  • A database, in the most general sense, is an organized collection of data. More specifically, a database is an electronic system that allows data to be easily accessed, manipulated and updated.
  • Any business or organization that needs to keep track of large numbers of customers or products can benefit from a database, but large organizations stand to gain the most.
  • The earliest database systems were navigational in nature. This means that applications processed and read data by using pointers embedded in the data itself.
  • The early hierarchical and network data models were bested by E.F. Codd's relational model.
  • The relational model was a radical departure from the reigning hierarchical model in that it focused on the ability to search a database by content rather than by following a linked navigation system.
  • There are a few very important non-relational databases (especially with the advent of big data and Web 2.0), but the relational model is still used for the overwhelming majority of commercial database offerings.
  • A relational database is essentially a group of tables or, to use the technical name, entities (refer to rules 0 and 1 in Codd’s 12 Rules of Relational Databases). Each table is made up of rows (tuples) and columns (attributes). The tables have relationships between them that are defined as using a certain column in one table that references a column in another table.
  • There are 13 "rules" that determine whether a database can be called "relational".
  • The table is the basic data-storage unit in a relational database. Tables consist of columns and rows.
  • Relationships are THE reason why relational databases work so well.
  • In relational databases, a relationship exists between two tables when one of them has a foreign key that references the primary key of the other table.
  • A row, also called a record, represents a set of data about a specific item. Every record in a table has exactly the same structure, but of course different data.
  • A column is a specific set of values in a table of the same type. It defines a specific attribute of the table or data.
  • A primary key is a special column or combination of columns that uniquely identifies each record (row) in the table. The primary key column must be unique for each row, and must not contain any nulls (non-values).
  • The primary key, together with the closely related foreign key concept, are the main way in which relationships are defined. A primary key uniquely defines a record, while a foreign key is used to reference the same record from another table.
  • Structured Query Language (SQL) is the de facto language used for the management and manipulation of data in relational databases. SQL can be used to query, insert, update and modify data.
  • Spreadsheets and databases have some similar capabilities, but the spreadsheet has a number of limitations that make it unsuitable for managing some data situations.
  • One of the most severe limitations of relational databases is that each item can only contain one attribute. Non-relational databases, specifically a database’s key-value stores or key-value pairs, are radically different from this model. Key-value pairs allow you to store several related items in one "row" of data in the same table.
  • A data warehouse is a special type of database optimized for querying, reporting and analysis. The main benefit of reporting using data warehouses, as opposed to the organization’s transactional databases, is that warehouses allow much better and more fine-grained data analysis for business consumption.
  • An index in an RDBMS is a data structure that works closely with tables and columns to speed up data retrieval operations.
  • A schema is the structure behind data organization. It is a visual overview of how different tables are related to each other.
  • Normalization is the process of (re)organizing data in a database so that it meets two basic requirements: there is no data redundancy (all data is stored in only one place), and data dependencies are logical (all related data items are stored together).
  • In the RDBMS world, constraint refers to the exact same thing as in the real world. A constraint is a restriction on the type of data you can input into a certain column.
  • In relational databases, saving a transaction is known as a commit, and undoing any unsaved changes is known as a rollback.
  • ACID is an acronym for Atomicity Consistency Isolation Durability, the four highly desirable properties of an RDBMS.
  • Oracle is one of the behemoths of the RDBMS world. Its flagship product is Oracle DB.
  • Microsoft is another big boy in the world of RDBMS software with its SQL Server product, although it is better known for its universal Windows operating system and Office suite of office-productivity programs.
  • Other commercial RDBMS systems include Postgres, MySQL and DB2.

Share this:
Written by Dixon Kimani
Profile Picture of Dixon Kimani
Dixon Kimani is an IT professional in Nairobi, Kenya. He specializes in IT project management and using technology to solve real-world business problems. He is also an avid freelance technical writer who specializes in IT and how to use technology to improve organizational efficiency.