School of Information Systems

Data Mining – The Process

Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time-consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

According to Larose, data mining is divided into several groups based on the tasks that can be done, namely: Description, Estimation, Predictions, Classification, Clustering and Association.

Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, “Which clients are most likely to respond to my next promotional mailing, and why?”

In the data mining process there are several steps that must be done: First step is Data cleaning; this stage focuses on data cleansing of source data obtained so that the data is not missing value, not noisy data, and get consistent data.

Second step is Data integration: In this stage, the incorporation of data from other sources of information or different into a single database that is required.

Third step is Data selection: At this stage select relevant data inside the database.

Fourth step is Data Transformation. At this stage the data is already completed selected will be used for the modeling process at the stage of data mining, which is useful for analyzing processes that will show you the hidden information to assist in the calculation of data mining in the future. For example the project would like to use the method of classification, then we define a “Predictor Attribute” and “Class Label”

The Fifth step is Data mining, which at this stage determine a pattern or interesting information in the data by using data mining techniques.

There are several major data mining techniques namely association, classification, clustering, prediction, sequential patterns and decision tree.

Reference:

Jiawei Han, Micheline Kamber, Jian Pei , Data Mining: Concepts and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems) 3rd Edition, 2012

Daniel T. Larose, Chantal D. Larose, Data Mining and Predictive Analytics, 2nd Edition, March 2015

Eka Miranda, S.Kom., MMSI.