The Hadoop Distributed File System (HDFS) is a distributed file system that runs on standard or low-end hardware. Developed by Apache Hadoop, HDFS works like a standard distributed file system but provides better data throughput and access through the MapReduce algorithm, high fault tolerance and native support of large data sets.
In the context of databases, cardinality refers to the uniqueness of data values contained in a column. High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range. It is not common, but cardinality also sometimes refers to the relationships between tables. Cardinality between tables can be one-to-one, many-to-one, or many-to-many.
High cardinality columns are those with very unique or uncommon data values. For example, in a database table that stores bank account numbers, the “Account Number” column should have very high cardinality – by definition, every item of data in this column should be totally unique. Normal cardinality columns are those with a somewhat unique percentage of data values. For instance, if a table holds customer information, the “Last Name” column will have normal cardinality. Not every last name will be unique (for example, there will be several occurrences of “Smith”) but on the whole, the data will be fairly non-repetitive. Low cardinality columns are those with very few unique values. In a customer table, a low cardinality column will be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column. Cardinality relationships between tables can take the form of one-to-one, one-to-many (whose reversal is many-to-one), or many-to-many. These terms simply refer to the relationships of data between the tables. For example, the relationship between the “Customers” table and the “Bank Accounts” table is one-to-many, that is, one customer can have several accounts, but one account cannot belong to more than one customer. That is, of course, assuming the bank has never heard of joint accounts!
Read More »
Join 138,000+ IT pros on our weekly newsletter
Home | Advertising Info | Write for Us | About | Contact Us
2010 - 2014
Janalta Interactive Sites: