Data Discretization

This is a process of converting continuous data into a set of data intervals. Continuous attribute values are substituted by small interval labels. This makes the data easier to study and analyze. If a data mining task handles a continuous attribute, then its discrete values can be replaced by constant quality attributes. This improves the efficiency of the task.

This method is also called a data reduction mechanism as it transforms a large dataset into a set of categorical data. Discretization also uses decision tree-based algorithms to produce short, compact, and accurate results when using discrete values.

Data discretization can be classified into two types: supervised discretization, where the class information is used, and unsupervised discretization, which is based on which direction the process proceeds, i.e., ‘top-down splitting strategy’ or ‘bottom-up merging strategy’.

For example, the values for the age attribute can be replaced by the interval labels such as (0-10, 11-20…) or (kid, youth, adult, senior).

Reference List

https://www.javatpoint.com/data-transformation-in-data-mining#:~:text=Data%20transformation%20is%20an%20essential,them%20into%20clean%2C%20usable%20data.

Boyang Yan

Explorer

Data Discretization

Reference List

Graph View

Backlinks