Machine Learning & Deep Learning Deployment Strategies
Machine learning and deep learning models add value to organizations as they provide insights to the end-users. Building reliable, agile and sustainable machine learning and deep learning models that streamline and strengthen business planning and operations require a lot of preparation and diligence. Deployment of ML and DL models is a very crucial step in ML and DL model lifecycle right from the selection of the tools, platforms, and algorithms.
Consumption is the key aspect of building or developing the ML/DL models. ML pipeline enables the automation of the ML workflow and allows the data flow from the raw data to valuable insights. The main goal of creating an ML pipeline is to gain control of the machine learning model.
The deployment of the ML/DL model starts with the ML and DL model lifecycle.
ML/ DL Model Lifecycle
Machine learning/deep learning model building has to follow a cycle as it helps to focus on the consumption of the model, outcome, and its evaluation for the refinement of the dataset. It is mandatory to have a high-quality dataset to train a high-quality model for deployment.
How to train a model?
The ML/DL model lifecycle is a three-phase process that starts with a business problem. The three-phase process includes Pipeline Development, Training Phase, and Inference Phase. After identifying the business problem, the data sources are identified to gather raw data from various data sources. The dataset obtained from these sources will be transformed into a clean dataset.
The next step is data validation to ensure whether the data is validated and useful. Then it uses feature engineering, a process that transforms raw data into features that supports the working of machine learning algorithms. The final
step of the ML model life cycle is the model generation, the initial stage of training ML models.
Hyperparameter optimization is performed once the model generation is completed as certain models require fine-tuning. Hyperparameters are nothing but tuning parameters that bring the best out of an ML model. Once the hyperparameter is optimized, it will validate the model whether it is good or bad.
Versioning of the model
After model validation, the model is stored to perform versioning of the model. Why model versioning is important? As more and more data of different frequencies are added on a daily or weekly basis, there might be a change in the model and creates a variation in the prediction or model accuracy. Hence, it is critical to store various versions of the model.
The best model version is identified from all the versions, then it is added to a model registry. If the model built for a project need or a standalone analysis without performing any application integration, then the comparison of various model versions should be done. The best model version selected is utilized for deployment. Now, the model is ready to be deployed.
Model Deployment
The model deployment can be performed using 13 different strategies using any of the tools like Python, Java, Django, or Scala. In model deployment, the challenge is not the technicalities but to identify the nuances of the model such as the model size, hyperparameter size, the prediction inferences, time taken to generate inferences, time taken how soon the inference is required by the application, model consumptions, and the frequency of generating insights.
The key element in model deployment is to understand where the model is going to be consumed- on an application, device, or offline world. Once the method of deployment is identified, then model serving is used. Model serving is a turnkey solution to host ML models as REST APIs, i.e. serving the inferences. It is important to keep monitoring the results once the model serving starts to identify whether the results are good, bad, requires fine-tuning, or requires any further validation.
Now, the model proving or debugging phase has reached. If the predictions are going offline, then it is important to understand why the model is drifting away from the expected accuracy level or inference generation level. In this phase, whether there is a requirement of model maintenance or the model has been outdated should be analyzed and the model is archived.
Different Types of Model Deployment
The model consumption can be divided into two different segments, projects and products. Project is an offline analysis in which an ML or DL model is used to generate predictions and can be showcased to the clients in the form of a presentation.
Project Pipeline
The model is consumed based on the needs of the end-user and there are different ways of model deployment. A model can be deployed using PMML language (Predictive Model Markup Language), in which the model is built or developed in a way that the entire modeling script can be transferred or converted into PMML language (an XML language). This kind of model can be integrated with any platform.
E.g. If a PMML model is made out of SAS or R software, such a model can be imported into Python, and yet inferences or predictions can be generated. Since PMML is an XML language, it can be integrated with front-end applications like Java-based applications, and the data can be received from SDTP requests to generate and serve the predictions. PMML model is the most popular way of deploying the models, but the challenge with this model is that all kinds of models are not supported because this language is quite heavier.
Pickle and Joblib formats are the other two methods used for model deployment. The ML model can be stored in both these formats. Since both the formats are serialized objects that allows converting the ML model into serialized objects i.e. it enables the transfer of the ML model as a package. This serialized object can be read to generate inferences.
Joblib is a better approach than the Pickle format because it can handle large-scale modeling, where a large set of NumPy arrays have to be serialized and
deserialized. However, the pickle format cannot handle large volumes of arrays because the serialization and deserialization process takes time.
Another method for model deployment is Checkpoint and it is mostly applicable for deep learning-based models, GRASS-based models, TensorFlow-based models, or iterates-based models. These models can be converted as a checkpoint and the objective of the checkpoint is that it can be used to distribute the training process. Checkpoint is an object that allows pausing, loading, and training of the model from it. Hence, the model storage can also be done using the checkpoint.
The above-mentioned methods or mediums are used in the project pipeline to transfer the ML or DL models and also in the inference generation process.
Product Pipeline
In the case of a product, ML/DL model has to be integrated into the applications and the product pipeline is different from that of a project pipeline. In a product pipeline, there are two kinds of predictions such as real-time prediction and batch prediction.
Real-time Prediction
When an inference request is raised, it is mandatory to honor that request such as fraud predictions, anomaly detections, or other kinds of predictions that require on-demand predictions. To generate on-demand or real-time predictions, applications like REST with RESTful APIs have to be used. REST-based APIs can be used to solve the inferences and predictions.
Batch Prediction
Batch prediction is used to generate predictions for a dataset using a single request and taking action on a specific number or percentage of observations. For batch prediction, there are three kinds of tools such as MLflow, Apache Airflow, and Prefect. MLflow can be used for both real-time predictions and batch predictions.
Different Deployment Needs
The deployment needs include
- On-demand or real-time prediction and batch prediction. In batch prediction, the data will be in bulk that generates bulk inferences.
- Any kind of model can be used, no matter whether it takes more time or less time. But in on-demand prediction, heavy models should not be used as it takes more time, but to use lighter models. In batch prediction, there is no such issue.
- In on-demand prediction, algorithms have to be classified based on highly complex, medium, and fast algorithms. Highly complex algorithms take more time and fast algorithms take less time to generate inferences.
Challenges of Runtime Deployment of Models
- Security and authentication layer that verifies the user’s call
- Data validation layer that validates the input data for model prediction
- Latency in scoring and prediction
- Ease of recovery is required
- Concurrency of usage is important
- The complexity of the model
Requirements of a Batch Mode Model Deployment
- It is preferred when a prediction is required over large quantities of data
- At a given time, the prediction results do not vary a lot
- Batch scoring is a need
- Output is required to be stored in databases or data lakes
Batch mode is used when a large number of inferences have to be generated as it can be debugged and handled easily, but online is a challenge.
Model Deployment Options
Before deploying a model into the production, it is necessary to identify how it has to be deployed, whether it is in a static way or dynamic way. In a static way, deployment is not an issue. But, in the dynamic way of deployment on a server, concurrency and the speed of generating inferences have to be considered. However, when the model deployment is performed on a device both static and dynamic ways are not suitable with heavy models.
Dynamic Deployment of ML Models
- Deploy model parameters/equations like a static equation
- Deploy a model as a serialized object using a pickle or joblib object
- Deploy models to the browser as a curl command and pass parameters to generate inferences
- Deploy model to other readable systems using common standards like PMML
- Deploy models as a Web Service (Python REST API)
Deployment Basics
Checklist to Be Managed for the Model Deployment
- Input-output definition that works
- Confidence score that defines the model standard
- Performance metric for the model
- Model monitoring limits
- Versioning should be decided
- Defined version metadata
- Cashing need for high models
DL Model Deployment Challenges
- Longer training cycle
- Large number of parameters
- Large model object size
- Latency in inference generation
- Difficult to deploy on devices
- Collaborative training required
Summing Up
Machine Learning and Deep Learning Model deployment is the integration of the ML/Dl model into a prevailing production environment that enables to make data-driven business decisions. Effective ML/DL model deployment is the most cumbersome process in the machine learning life cycle.