
The important processes that have to be clearly delineated for Data Mining and Data Modelling are:
Most companies want to know essential information about customers at every point of contact, for example:
Much of the data that they have will have different frequencies of change, refreshment or occurrence. It will be kept for different periods. In some cases, aggregated data may be kept rather than source data. All of these factors effect the data modelling exercise and the eventual modelling software requirements.
Turning the data into useful information requires:
Thereafter, modelling tools and techniques have to be used. These can be divided into two groups: theory driven and data driven.
Theory driven modelling (hypothesis testing) attempts to substantiate or disprove preconceived ideas. Theory driven modelling tools require the user to specify most of the model based on prior knowledge and then tests to see if the model is valid.
Data driven modelling tools automatically create the model based on patterns they find in the data. This also needs to be tested before it can be accepted as valid. Modelling is an iterative process with the final model usually being a combination of prior knowledge and newly discovered information. The engine(s) tools and techniques include:
Written by Richard Hill