Without any context, it would be hard to guess what people are talking about when they talk about candidate keys. For the uninitiated, exploring the use of these database objects starts with the simple idea that different tables and data fields in database structures are often called “keys,” and that organizing them in certain ways can provide better data-handling solutions. Here we’ll take a look at candidate keys and how they are used. (For more in-depth reading about databases, check out An Introduction to Databases.)
Primary Keys and Candidate Keys
One relatively straightforward way to explain candidate keys is in relation to something called a primary key. Typically, a database table can have only one primary key, which represents a given column in that table. However, there can be several candidate keys, and these are defined as keys that could be used as primary keys.
Identifying potential primary keys is important because the primary key in a table is a way to keep all of the associated records straight, and to have one unique identifier for each row in the table. For instance, in a list of customers, data handlers need to provide a specific column that has the unique identifier for each individual customer. Using a customer’s first name, or even a last name, won’t work because of the potential for duplication. That’s why a primary key often consists of a created number that guarantees uniqueness in somewhat the same way that long numerical strings provide unique identifiers for financial transactions on our credit and debit cards.
Criteria for Candidate Keys
Now that we know what a candidate key is, we can look at specific criteria for “eligibility.” In order to be a candidate key, a database column must be unique to each row (as mentioned above). It also has to be “non-NULL;” that is, the column must not be able to hold a NULL value, which involves some specific attention to input design. Fields that may be empty on certain records can be disqualified as candidate keys. The third criterion is that the candidate key should be “resistant to change” – that’s not to say that a candidate key, or a primary key for that matter, can never change, but that a lot of developers agree that it’s better to use a more static identifier in these kinds of “anchor” roles that so greatly affect data queries.
Using Candidate Keys
Keeping in mind the idea that many different database columns can be candidate keys, it’s important to understand that there is often more than one way to appoint a primary key and that, in many ways, each professional data manager has his or her own unique methods and outlooks, a sort of secret recipe for making data structures work. Thus, newer database pros might ask more experienced seniors for various pieces of advice about how to set up and use candidate keys successfully.
One common idea is the use of multiple columns to create something that’s called a composite key. This is one way to create a durable primary key, but different professionals have their own opinions. Some consider this a lazy way to code, and point out the burden of carrying over multiple fields in relating the primary key to an outside foreign key from another database.
Other debates include the use of time stamps for primary keys, as well as the use of a simple auto-generated identity column, which consists of a uniquely generated number for each record. Ask a pro whether using a simple ad hoc generated I.D. number is too simple, and you’ll get a response based on that person’s philosophy of what makes good data structures tick. Answers include such points as whether a generated number can be used in specific kinds of queries, and whether one of these automatically created integers can save a DB manager’s bacon if, for example, a whole new set of records suddenly need to be integrated. Some experienced data-crunchers invoke Occam’s Razor and warn against getting too elaborate with candidate and primary keys, or any other element of design.
A Database Reference Point
In a very general sense, one great way to think about a candidate key is as a reference point that works in a database in much the same way that a passport works for a traveler. Your passport is a quick way for customs’ officials or others to reference a whole bunch of information, like your name, gender, etc. It’s also supposed to be unique. But, as in a database with multiple candidate keys, your passport is not the only document that can be used by customs. A driver’s license also can serve the same purpose and provide a lot of the same information. That’s why customs’ planners or others need to choose an identifying document in the same way that programmers have to choose just one candidate key to be the primary key.
This analogy also helps illustrate why someone who doesn’t have this responsibility might want to know about how these keys work in a data structure. With the advances in today’s data and analytics industry, and so many managers and middlemen working closely with tech vendors, a better view of database principles can come in handy, even for a non-techie. It sure can’t hurt.