Data Mining Methods & Association Rule
In data mining, there are several methods that can be implemented:
Classification is the most common method of data mining. Business issues such as Churn Analysis and Risk Management usually involve Classification methods. Classification is an action to assign a group to each situation. Each state contains a group of attributes, one of which is the class attribute. This method needs to find a model that can explain the class attribute as a function of the input attribute.
Clustering is also referred to as segmentation. This method is used to identify the natural group of a case based on a group of attributes, grouping data that have similar attributes. Clustering is an unsupervised data mining method, because no single attribute is used to guide the learning process, so all input attributes are treated the same. Most clustering algorithms build a model through a series of iterations and stop when the model centers or converges (the boundaries of this segmentation have stabilized).
Association is also known as Market Basket Analysis. A typical business problem is analyzing sales transaction tables and identifying products that customers often buy together, for example when people buy chili sauce, they usually buy soy sauce. In terms of association, each item is considered information.
The method of association has two purposes:
– To find out what products are usually sold together
– To find out what are the rules that cause these similarities.
The Regression method is similar to the Classification method, what distinguishes it is that the regression method cannot look for patterns that are described as classes. The regression method aims to look for patterns and determine a numerical value. A simple linear line-fitting technique is an example of regression, where the result is a function to determine the result based on the value of the input. A more sophisticated form of regression already supports category input, so it’s not just numeric input. The most popular techniques used for regression are linear regression and logistic regression. Other techniques supported by SQL Server Data mining are Regression Trees (part of the Microsoft Decission Trees algorithm) and Neural Networks. Regression is used to solve many business problems – for example for estimating distribution methods, distribution capacity, seasons and for estimating wind speed based on temperature, air pressure, and humidity.
Forecasting is also a very important data mining method. Examples are used to answer questions like the following:
– What will the stock value of JP Morgan Chase & Co. (on the NASDAQ, symbolized as JPM) look like on the next day?
– How much will the sales of certain products be in the next month?
Forecasting techniques can help answer the questions above. As input, the Forecasting technique will take a series of numbers that show the value that passes over time and then this Forecasting Technique will connect the future value using various machine-learning techniques and statistical techniques related to seasons, trends, and noise in data.
- Sequence Analysis
Sequence Anlysis is used to look for patterns in a series of events called Sequences. Both the sequence and time-series data are similar, they both contain close views in which the order depends. The difference is that a time-series contains numeric data, and a sequence series contains unique parts.
- Deviation Analysis
Deviation Analysis is used to look for cases that act very differently than normal. The use of deviation analysis is very broad, the most common of which is the detection of credit card misuse. Identifying abnormal cases among the millions of transactions is a very challenging job. Other uses include detecting computer network intrusions, analyzing production errors, and so on.
- Association analysis or association ruleis a data mining technique for finding associative rulesbetween a combination of items. An example of the associative rule of purchasing analysis at a supermarket is
We can find out how likely a customer is to buy bread at the same time as jam, and also to purchase shampoo along with conditioner.
With this knowledge, supermarket owners can arrange the placement of their goods or designing marketing campaigns using discount coupons for certain combinations of goods.
Association analysis became famous for its application to analyze the contents of shopping carts in the market/supermarket. Association analysis is also often referred to as market basket analysis. Association analysis is also known as a data mining technique which is the basis of various techniques other data mining. In particular one of the stages of association analysis is called as high frequency pattern analysis that attracts the attention of many researchers to produce algorithms that sufficiently efficient.
The importance of an associative rule can be determined by two parameters :
– Support Value, is the percentage of the combination of these items in the database
– Certainty Value, is the strength of the relationship between items in the associative rule.
The basic methodology of association analysis is divided into two stages:
– High frequency pattern analysis
This stage looks for a combination of items that meet the minimum requirements of the support value in the database.
– Establishing associative rules
After all high frequency patterns are found, then we find the associative rules that meet the requirements minimum for confidence by calculating the confidence rule associative of A towards B.