What Does Post-Processing Deduplication Mean?
Post-process deduplication (PPD) refers to a system where software processes filter redundant data from a data set after it has been transferred to a data storage location. This can also be called asynchronous deduplication, and is often used where managers consider it inefficient or unfeasible to remove redundant data before or during transfer.
Techopedia Explains Post-Processing Deduplication
Post-process deduplication can be contrasted to a practice called in-line deduplication where the redundant data is taken out as the data is transferred for storage. One of the reasons that administrators may choose a post-process deduplication approach is when inline deduplication can slow down the transfer process and make it more difficult to easily and efficiently archive data.
While managers or administrators may find it easier to use a post-process deduplication method, there are drawbacks to this type of data optimization. One is the fact that the data storage destination will need to have enough space to fit the larger unfiltered data set. Assuming that data managers have ample storage and that parsing data in storage doesn’t pose technical difficulties, the post-process deduplication method can often be a desirable way to clean up a data set for future use after it has already been carefully tucked away in "cold storage."