Data mining techniques - Selecting important attributes
In reality there are many attributes in data, but all are not important. So, we have to find the important attributes for analysis.
So you might be thinking now that are there any methods for attribute selection in data mining?
Yes, there are several techniques, depending on the types of modeling you are doing.
In general, you may follow following steps as mentioned in order.
1) Reduce variables as per missing percentage.
2) Reduce variables having high VIF (multicollinerity check)
3) If some Xs are highly correlated with Ys, drop them at the beginning of the modeling.
4) Reduce variables as per information value.
5) Reduce variables as per significance Retain P value<0.0001.
6) Out of significant variables, reduce variables having lower |t| or low Chi sq.
7) Drop variables whose signs are "lurking", i.e., sign not same when they are individually introduced in model compared to when they are put together into a model. In both scenarios, sign/magnitude of parameter estimates should be same though estimated value will be different.
8) If still you have "more" variables, then you may try to use factor analysis to reduce number of variables and instead of those, use liner combination of those variables, i.e., by using factor.
9) Last but not the least, reduce variables, which don't make business sense.
Data mining is important for your business and you should know the attibutes properly
Data Mining - a short video introduction to professors
Data mining techniques, examples, samples, and resources
- Data mining model building using attribute importance - - US Patent 7219099
Data mining model building using attribute importance - US Patent 7219099 from PatentStorm. A system, method, and computer program product that uses attribute importance (AI) to reduce the time and computation resources required to build data mining
- Data mining model building using attribute importance - Patent 7219099
A system, method, and computer program product that uses attribute importance (AI) to reduce the time and computation resources required to build data mining models, and which provides a corresponding
- Data Mining: What is Data Mining?
- Data mining - Wikipedia, the free encyclopedia
- Data Mining Algorithms