How to Promote Awareness and Integrate Ethics into Your Data-Driven Organization
For ages now, business cannot do without data: concepts such as data literacy and data-driven work have become commonplace, not to mention machine learning and AI. More and more reliance is being placed on data in making daily decisions. With this comes an increased focus on making fair and responsible data-driven decisions. This is rightfully so, as the damage of unethical decisions can be great for those involved, and thus your own organization.
A well-known challenge is implementing data-driven processes in an ethical manner, which extends far beyond analyzing the results of predictive models. Ethical practices should be incorporated into every step of the process, making it the responsibility of every employee. A deep understanding of ethical data-driven practices is crucial for all members of an organization.
Ethics as a concept may be closely related to regulations as contained in the General data protection regulation (AVG). These include, for example, gaining access to the data in the first place (who or what role may view what data?), but also what data may be used and for how long. In this article, we intentionally do not focus on these specific regulations, but instead emphasize ethics as an ongoing process.
We will guide you through a step-by-step approach to embedding ethics as a design principle in your data-driven processes, using the widely recognized CRISP-DM framework. This article offers an overview of CRISP-DM, focusing on the ethical considerations within each step, with subsequent articles diving deeper into each phase.
CRISP-DM: A Foundation for Ethical Data-Driven Processes
CRISP-DM (Cross-Industry Standard Process for Data Mining) is a framework for data-driven processes. It provides a planned approach and focuses on collaboration among the various roles involved in such a process. CRISP-DM is widespread but does not offer out of the box guidance on ethics. If you want to know more about CRISP-DM read this article which discusses this further.
Rather than creating lengthy, seldom-used documents, we advocate for concise, actionable guidelines that facilitate ethical discussions and address key concerns for future projects.

Business understanding
The first step involves defining the project's objectives and expected outcomes, ensuring that a broad range of stakeholders from various departments and roles are involved. A key ethical question to consider here is:
Who is the end user, how will they be affected by the analysis/model, and what unintended consequences might arise?
When answering this question, it will become clear whether ethical objections can be raised against the intended goal. Sub-questions that can help with this include, for example:
- What is the goal of the project, and could it raise any ethical concerns?
- What data is needed, and has it been used in previous projects with any identified issues?
- Was the data collected for a purpose aligned with this project's objectives?
- How will the results be applied in practice, and what impact could this have on the end user?
- How do we mitigate potential biases in the data, and how will we evaluate our results?
Ethical oversight must be a continuous process, not limited to a single assessment at the project's outset.
Data understanding
This phase involves exploring and comprehending the data's relevance to the project, with the data analyst playing a key role in evaluating data quality. Central to this is the principle of data minimization-using the least sensitive data necessary to achieve the desired outcome. Ethical considerations at this stage include:
What biases might already exist in the data, and is it still suitable for use? Is the data fairly distributed and representative of the target group?
Other critical questions include:
- Are there known biases, and what measures can be taken to minimize them?
- Are there tests we can run to assess the influence of bias on the final results?
- Could the data contain hidden information that might lead to unintended biases? (It is well known, for example, that a postal code area can also represent income levels.)
- Does the data reflect the intended target group accurately?
As with all stages, ethical concerns related to business goals should be revisited.
Data preparation
In this stage, the data analyst prepares the data for the next phases. The challenges identified in the previous step are addressed and resolved as much as possible, or measures are taken to minimize them. The goal is to create a reliable dataset, free from as much bias as possible for the next phase. The analyst must consider how bias can be removed from the data. For example, she focuses on the question:
Are you aware of unavoidable biases, and what steps are being taken to mitigate them?
It is important to document the potential risks arising from the known bias and how the results from this step will be applied in the workflow. Additionally, it is essential to record the data preparation process itself to ensure transparency regarding the decisions that were made. Additionally, the analyst must ensure the preparation process itself does not introduce new biases.
Other important questions in this phase include:
- Which data needs to be pseudonymized before it can be used?
- How do you ensure that data cannot be traced back to an individual after anonymization?
- Does the preparation process unintentionally create bias?
Modeling
In the Modeling step, it is again primarily the data analyst who takes the lead. Don't be misled by the name "Modeling" his step is not only about developing predictive models but also involves creating reports and analyses, or leveraging generative AI (such as GPT or other large language models). It may also involve creating a descriptive analysis. The central question here is:
Does the product produce undesirable, unfair or biased results, and do I understand how my model arrives at its prediction?
At this stage, it is important to continually verify whether the analyst has validated all her assumptions. Some in-depth questions that can assist in this process include:
- What principles can I use to test my output for bias and ethics? Read more about how to do this using the Python package Fairlearn here.
- Can I develop unit tests that provide insight into specific examples and the outcomes they would lead to (e.g., testing a facial recognition model by showing both men and women)?
- What metric should I use to select the best model? What are the implications of choosing this metric? (e.g., if you are predicting cancer, you may accept more 'false positives' than if you are predicting customer churn).
Evaluation
In this step, you evaluate the performance and outcomes of the model or analysis, and determine to what extent they meet the objectives set in step one, the business understanding. Here, you should challenge yourself and the team with the question: do I want to publish this, and what would the reactions be? This also brings back the earlier question:
What is the impact of the outcome of an analysis or model on the user?
Further questions include:
- Has unintended bias been introduced in the data or model?
- Are the results transparent and explainable?
- What are the consequences of publishing it?
Deployment
This step involves the implementation and launch of a product. Be transparent about any limitations the model may have that could introduce bias. These should have been identified and documented in the previous steps. It is also important to closely monitor how the product evolves over time and whether any unintended side effects or outcomes emerge later on. If such issues arise, is it clear and transparent where they stem from and how they can be addressed? One of the key monitoring questions that must continue to be asked after the product goes live is:
Are adjustments needed in earlier steps?
Be aware that new factors can influence the outcome of the application and may still cause unintended impacts, such as affecting the privacy of the end user. Therefore, consider questions such as:
- What are the long-term consequences of the choices made and the use of the data?
- How is the data stored, and what impact will this have on its usage in the future?
Conclusion
This first article provides practical guidance for fostering awareness and incorporating ethics into your organization's data-driven processes. Our key message is that ethical behavior should permeate every aspect of the organization, rather than resting on one individual or department.
In our upcoming articles, we will delve deeper into the various roles within an organization and further elaborate on the model outlined above. For instance, how do you initiate and navigate the conversation around ethics from different roles?



