Data anonymization is the process of destroying tracks, or the electronic trail, on the data that would lead an eavesdropper to its origins. An electronic trail is the information that is left behind when someone sends data over a network. Forensic experts can follow the data to figure out who sent it. This is often done in criminal cases, but sometimes companies undermine user privacy in order to track user data. Companies use data mining and location tracking to display personal information like addresses to attract more advertisements or for other reasons. This may be a concern to people who value their privacy and makes a good case for using data anonymization techniques.
If someone sends a file, there may be information on the file that leaves a trail to the sender. The sender's information may be traced from the data logged after the file is sent. However, once the file is anonymized, data associated with it being sent cannot be traced to the sender, at least in theory.
Data anonymization is a technique that will not take away the original field layout (position, size and data type) of the data being anonymized, so the data will still look realistic in test data environments.
One aspect of anonymization that may worry individuals who value their privacy is that the process can be reversed. Many current techniques associated with anonymization can be bypassed as there are ways to reveal stripped personally-identifying information (PII) from datasets. One way this information can be revealed is with cross referencing any sets of records still visible. This is called de-anonymizing.
This definition was written in the context of Data