Anyone who works with data will recognize it: you receive a question and you want to dive straight into the data and focus on the content. Unfortunately, we see this very often in practice, with the result that a lot of work is done and the solution is ultimately never used. A shame! By using the CRISP-DM methodology, you ensure that the intention of the project is well researched in advance, at what point value is delivered and how best to approach the solution.
CRISP-DM or Cross Industry Standard Process Data Mining is a method consisting of six steps to give structure to data projects. The steps are cyclical and often repeat several times during a project. This may be for a relatively simple analysis, or for more complicated modeling issues.
We take you through the different stages of the process and which roles within the organization have a part in it.

Phase 1 | Business understanding
Good understanding of the business is essential in this first phase. What exactly is being done or delivered and what input is needed to do this? By the end of this phase, you should have a clear picture of the problem, what the goal is and how you can make a difference with your analysis, model or a data application such as AI. You can only contribute to a company's goals if you understand what needs to be done to find the path to value with data and data applications. You document the findings in a plan of action. As an analyst or data scientist, this is where you must assume the role of business analyst if this is not a permanent position within your organization. Indeed, it is very important at this stage that you avoid making assumptions and dare to be critical. Questioning and asking the right questions are the key to success. In addition, it is important to ensure that the right people on the business side are the point of contact. Think for example of a product owners, business analysts or middle managers.
Phase 2 | Data understanding
The next phase focuses on understanding the data. What data do you need to achieve the goals, what sources are there, how useful is the data. Are there unambiguous data definitions and what is the quality of the data? It is important to realize that in this phase you mainly have to expose your data issues, you do not have to solve them right away. At the end of this phase, you will know if and with what data you can work on the problem and
what data, if any, are still missing. It is essential to spend sufficient time exploring the data, because a prediction or analysis can never be sound if it is based on contaminated, incorrect or unreliable data. Thus, at this stage, as an analyst/scientist, you will have to put on the hat of a information/process or systems analyst Or have to make sure that you can ask colleagues within this function the shirt off their back.
Of course, despite the time and energy you put into exploration, you may find that you do not have the necessary data to perform the analysis agreed upon during the business understanding. Your project then stops here. This does not matter because you will find out in time, that the question you had in mind cannot be answered, or is not the right question. You will then have to go back to phase 1 and determine again with the business how you can still contribute to solving this problem. In this way you will arrive at the most optimal question and you will know that you have the means to reach a solution at your disposal.
Phase 3 | Data preparation.
Based on the results of the preceding phase, you will work here to generate input for your data analysis. Selection, cleaning, enrichment and aggregation of the data will be covered. At the end of this phase, you will have a dataset that contains unique records and in which outliers and missing values have been detected and resolved. It is important that enough history is available and you have relevant characteristics with which to start the analysis. In short, the input contains all the characteristics needed to answer the question and model the answer. This phase takes a lot of time, sometimes up to 80% of the entire project. It is therefore also an important part of the work of a data analyst and scientist possibly with the help of a data steward, who is responsible for ensuring quality and fitness for purpose of the organization's data assets.
Phase 4 | Modeling
You may finally get to work performing your analysis or building your model. You test pre-established hypotheses, determine the significance of the characteristics and/or test the predictive power of your model. At the end of this phase, you have an analysis or model with which you can contribute to the issue. Very important in this phase is not only that you are substantively able to perform the analysis, build the model or use an AI application. You must also be able to explain how your analysis is built or how your model or AI application works. In addition to the role of data analyst and data scientist, so you should actually have a good consultant may be. Of course, despite all good preparation, it can happen that the analysis or model cannot differentiate or provide the desired insights. Then you can initially go back to your data preparation to create additional features.
Phase 5 | Evaluation
When you start implementing actions based on your analysis or model, always do a small-scale test first to substantiate the impact within a period of time when significant differences can be measured. Whenever possible, you use a control group. It is wise to also look at (negative) side effects. At the end of this phase, you then know well how your analysis/model performs, This is all part of the work of a data analyst and scientist. You should also be able to give advice on how your analysis or model can be put into practice and on possible follow-up actions, and should therefore again be in the role of consultant creep. Of course, the results of your test may be disappointing or the test may show no improvement. You can then opt for a new test, adjustments in the modeling phase, or even start all over again at phase 1, the business understanding, if all other adjustments do not yield sufficient results.
Phase 6 | Deployment
If the results of the evaluation are positive, you can proceed to step 6 and put your analysis or model into production. In fact, this is the goal; only when your stakeholders actually start using the result have you added value. It is of course important to make sure that your analysis or model is updated regularly and that the results are easy to access. Because it is important to keep monitoring and optimizing, you can ensure regular evaluation, for example in the form of a dashboard. This keeps CRISP-DM a continuous process, as you can continue to learn from the data.
Make every data project a success!
You have been able to read, how much is actually involved in a data project and which roles and expertises you all need. The structure offered by the CRISP-DM method and the clear distinction of roles and tasks provides a continuous contribution to optimizing with data. This allows you to jointly achieve the goals you set and make every data project from dashboard to AI solution a success together!



