What Does Schema on Read Mean?
Schema on read refers to an innovative data analysis strategy in new data-handling tools like Hadoop and other more involved database technologies. In schema on read, data is applied to a plan or schema as it is pulled out of a stored location, rather than as it goes in.
Techopedia Explains Schema on Read
Older database technologies had an enforcement strategy of schema on write—in other words, the data had to be applied to a plan or schema when it was going into the database. This was done partially to enforce consistency of data, and that is one of the major benefits of schema on write. With schema on read, the persons handling the data may need to do more work to identify each data piece, but there is a lot more versatility.
In a fundamental way, the schema-on-read design complements the major uses of Hadoop and related tools. Companies want to effectively aggregate a lot of data, and store it for particular uses. That said, they may value the collection of unclean or inconsistent data more than they value a strict data enforcement regimen. In other words, Hadoop can accommodate getting a wide scope of different little bits of data that might not be completely organized. Then, as that information is used, it gets organized. Applying the old database schema-on-write system would mean that the less organized data would probably be thrown out.
Another way to put this is that schema on write is better for getting very clean and consistent data sets, but those data sets may be more limited. Schema on read casts a wider net, and allows for more versatile organization of data. Experts also point out that it is easier to create two different views of the same data with schema on read.
This schema-on-read strategy is one essential part of why Hadoop and related technologies are so popular in today’s enterprise technology. Businesses are using large amounts of raw data to power all sorts of business processes by applying fuzzy logic and other sorting and filtering systems involving corporate data warehouses and other large data assets.