Margaret Rouse is an award-winning technical writer and teacher known for her ability to explain complex technical subjects simply to a non-technical, business audience. Over…
Canonicalization is the process of converting data that involves more than one representation into a standard approved format. Such a conversion ensures that data conforms to canonical rules. This compares different representations to assure equivalence, to count numbers of distinct data structures, to impose a meaningful sorting order and to improve algorithm efficiency, thus eliminating repeated calculations.
Canonicalization is used in numerous Internet and computer applications to generate canonical data from noncanonical information. Canonical representation of data is widely used in search engine optimization (SEO), Web servers, Unicode and XML.
This term is also known as C14N, standarization or normalization.
In SEO, URL canonicalization deals with Web content with more than one possible URL. This may create discrepancies in searches because the search engine may not being aware of which URL should be displayed. Canonicalization picks the best URL from several choices, usually referring to home pages. Although certain URLs appear to be the same, Web servers return different results for the URLs. Search engines consider only one URL in canonical form.
Computer security is based on file name canonicalization. Some Web servers may have a security rule to execute files only under a particular directory. The file is then executed only if the path has the specified directory in its name. Special care has to be taken to check if the file name is a unique representation. Such vulnerability is called directory traversal.
Most of the characters in the Unicode standard have variable-length encodings. This requires a consideration of each string character and makes the string validation more complex. If all character encodings are not considered in the software implementation, there arises a possibility of bugs. This problem can be eliminated using single encoding for every character. The best alternative, which any software can take, is to check if the string is canonicalized. Strings that are not canonicalized can be rejected.
A canonical XML document is an XML document in XML canonical form. It is defined by canonical XML specification. Canonicalization in XML eliminates white space within tags, sorts namespace references and eliminates redundant ones, and uses particular character encodings. It also removes XML and DOCTYPE declarations, in addition to transforming relative URLs into absolute URLs.
Techopedia’s editorial policy is centered on delivering thoroughly researched, accurate, and unbiased content. We uphold strict sourcing standards, and each page undergoes diligent review by our team of top technology experts and seasoned editors. This process ensures the integrity, relevance, and value of our content for our readers.
Margaret is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical business audience. Over the past twenty years, her IT definitions have been published by Que in an encyclopedia of technology terms and cited in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine, and Discovery Magazine. She joined Techopedia in 2011. Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages.
What is Differential Privacy? Differential privacy is a mathematical framework for determining a quantifiable and adjustable level of privacy protection....
Margaret RouseTechnology Expert
What is cPanel Used For? cPanel is a crucial tool to help you access hosting features via a simple, non-technical...
Ilijia MiljkovacTechnology Writer
What is Operational Technology? Operational Technology, or OT, refers to the hardware and software systems that are used to control...
Marshall GunnellIT & Cybersecurity Expert
Trending NewsLatest GuidesReviewsTerm of the Day