Why CRISP DM is dead?
Twenty-five years back 5 major companies namely ISL, Teradata, Daimler AG, NCR Corporation and OHRA conceived CRISP DM as a standard process model that describes a step-by-step approach used by data mining experts. This model has served data science citizens and industry experts diligently for a long time. Twenty-five years hence analytical process has undergone a sea change with deep learning, artificial intelligence being the biggest disruptors in this field. IS CRISP DM still a gold standard in this complex, dynamic and ever-changing landscape of analytics? Why are companies abandoning CRISP DM and adopting newer and better frameworks?
What is CRISP DM?
CRISP-DM breaks down the process of data analysis into six major phases:
- Business Understanding
- Data Understanding
- Data Preparation
Existing drawbacks in CRISP DM:
- Lack of human context: Stakeholders Identification is an important first step in any analytics process model. Stakeholder identification involves both external and internal stakeholders. Internal stakeholders involve Project manager, Solution architect, Data scientist, Data engineer and Subject matter experts (SME). External stakeholders involve Sales Team, Pricing Team, marketing team, database team, Third party vendors for data collection. This is followed by understanding all the pain points of all the stakeholders. In a real time project, it has been observed that there are multiple stakeholders and multiple end users. CRISP DM fails to take “human context” of multiple stakeholders into account. Hence defining the human context becomes critical is the successful execution of any project. CRISP DM methodology is essentially a process driven framework not a human centric framework.
Further CRISP DM also doesn’t suggest practical method and ideas to elicit end user’s pain points, hence shows a lack of empathy with the end user.
Stakeholder identification, Pain point identification and Stakeholder mapping are critical elements which CRISP DM overlooks in the Business Understanding stage. CRISP DM hence can be used for a Single input Single output (SISO) framework. However, for a nonlinear engagement involving Single input multiple output (SIMO), Multiple input single output MISO, Multiple input multiple output (MIMO) CRISP DM falls apart.
This is followed by understanding the existing workflow.
Many times, the client might not give an objective to the data science team and might expect the data science team to frame the objective for the client. This is a tedious process and involves SME’s and engagement with multiple stakeholders. Objective framing requires a thorough knowledge of the market. Here, it becomes the obligation of the data science team to identify the pain points of multiple stakeholders and frame a business objective. In the no objective scenario SME’s, Market experts need to be involved to strategically come with business objectives which drives business KPI’s. These complex scenarios are not addressed by CRISP DM, hence companies, free lance consultants and even academicians are abandoning CRISP DM.
- Abrupt jump: We notice an abrupt jump from data preparation stage to modelling. After data preparation an important stage is hypothesis framing. The client might propose important business ideas to evaluate hence model building should be preceded by hypotheses framing. These business ideas need to be translated into statistical hypotheses before building a machine learning model. Modelling may not necessarily solve a business problem.
- Hypotheses Testing: serves as a useful preliminary step before model building. This step does not find mention in the CRISP DM methodology. Further, not all business problems need a model building approach. Some business requirements can be solved with hypothesis testing and dashboard creation. Building a model may be a overkill given the nature of business requirement.
- Dashboard creation: Visual analytics is a critical element in the analytics and serves as a strong indicator about the past trends and gives early warning signals about the future trends and expected outcomes. If you adhere to CRISP DM, you will miss out this step hence fail to benefit from the key insights of Dashboard creation.
Ideally, CRISP DM should go through these two important steps namely hypotheses testing and dashboard creation before taking a leap into the world of modelling. The adage “If you have only hammer then every problem is a nail”. Thus, every business challenge need not be addressed through model building. Many real-world problems can be addressed either by hypotheses testing or dashboard creation.
- Model Deployment: Once a model is built it must be deployed in a production environment. Every model has an expiry date. The model becomes defunct after its expiry date. Monitoring the model performance by capturing the Data drift and Model drift is the need of the hour. A detailed and a comprehensive framework capturing the model fitness on a regular cadence is imperative.
- CRISP DM methodology is process driven.
- Stakeholder identification, Stakeholder mapping and Pain point identification are not addressed adequately.
- Absence of a detailed and an effective strategy in Model deployment and model monitoring haven’t been captured effectively.
- Abrupt transition from data preparation to model building.
- Hypotheses testing and Dashboard creation can solve many business problems.
A new approach identifying the drawbacks of CRISP DM and embracing the dynamics of current and future data science processes is the need of the hour. Many companies are abandoning CRISP DM and embracing newer frameworks. Are you?
Watch out this space for further discussion on a process model.
Mithun D J
Senior Manager – Data Science