ALERT

[FREE DEMO] Deploy Your Enterprise Cloud in Minutes

Data Preprocessing

Definition - What does Data Preprocessing mean?

Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.

Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks).

Techopedia explains Data Preprocessing

Data goes through a series of steps during preprocessing:

  • Data Cleaning: Data is cleansed through processes such as filling in missing values, smoothing the noisy data, or resolving the inconsistencies in the data.
  • Data Integration: Data with different representations are put together and conflicts within the data are resolved.
  • Data Transformation: Data is normalized, aggregated and generalized.
  • Data Reduction: This step aims to present a reduced representation of the data in a data warehouse.
  • Data Discretization: Involves the reduction of a number of values of a continuous attribute by dividing the range of attribute intervals.

Techopedia Deals

Connect with us

Techopedia on Linkedin
Techopedia on Linkedin
Tweat cdn.techopedia.com
"Techopedia" on Twitter


'@Techopedia'
Sign up for Techopedia's Free Newsletter!

Email Newsletter

Join thousands of others with our weekly newsletter

Resources
The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.