machine learning operations (MLOps)
What is machine learning operations (MLOps)?
Machine learning operations (MLOps) is the development and use of machine learning models by development operations (DevOps) teams. MLOps adds discipline to the development and deployment of machine learning models, making the development process more reliable and productive.
MLOps encompasses a set of processes that machine learning developers use to build, deploy, and continuously monitor and train their models. It's at the heart of machine learning engineering, and it blends artificial intelligence (AI) and machine learning techniques with DevOps and data engineering practices.
There are many steps needed before an ML model is ready for production, and several players are involved. The MLOps development philosophy is relevant to IT pros who develop ML models, deploy the models and manage the infrastructure that supports them. Producing iterations of ML models requires collaboration and skill sets from multiple IT groups, such as data science teams, software engineers and ML engineers.
Development of deep learning and other ML models is considered inherently experimental, and failures are often part of the process in real-world use cases. The discipline is still evolving, and it's understood that sometimes even a successful ML model might not function the same way from one day to the next.
How MLOps works
MLOps implements the machine learning lifecycle. These are the stages that an ML model must undergo to become production-ready. The following are the four cycles that make up the ML lifecycle:
- Data cycle. The data cycle entails gathering and preparing data for training. First, raw data is culled from appropriate sources, then techniques such as feature engineering are used to transform, manipulate and organize raw data into labeled data that's ready for model training.
- Model cycle. This cycle is where the model is trained with this data. Once a model is trained, tracking future versions of it as it moves through the rest of the lifecycle is important. Certain tools, such as the open source tool MLflow, can be used to simplify this.
- Development cycle. Here the model is further developed, tested and validated so that it can be deployed to a production environment. Deployment can be automated using continuous integration/continuous delivery pipelines that reduce the number of manual tasks.
- Operations cycle. The operations cycle is a monitoring process that ensures the production model continues working and is retrained to improve performance over time. MLOps can automatically retrain an ML model either on a set schedule or when triggered by an event, such as a model performance metric falling below a certain threshold.
Why is MLOps necessary?
Machine learning models aren't built once and then forgotten; they require continuous training so that they improve over time. That's where MLOps comes in. It provides the ongoing training and constant monitoring needed to ensure ML models operate successfully.
MLOps documents reliable processes and creates safeguards to consistently mitigate failures and reduce development time, creating better models. MLOps uses repeatable processes in the same way businesses use workflows for organization and consistency. In addition, MLOps automation ensures time isn't wasted on tasks that are repeated each time new models are built.
What are the benefits of MLOps?
MLOps provides a range of benefits, such as the following:
- Speed and efficiency. MLOps automates many of the repetitive tasks in ML development and within the ML pipeline, such as the initial data preparation procedures. This approach reduces development time and cuts down on human-induced errors in the models.
- Scalability. ML models often must be scaled to handle increased workloads, larger data sets and new features. To provide scalability, MLOps uses technology such as containerized software and data pipelines that can handle large amounts of data efficiently.
- Reliability. MLOps model testing and validation fix problems in the development phase, increasing reliability early on. Operations processes also ensure models comply with policies that an organization has in place. This reduces risks such as data drift, in which the accuracy of a model deteriorates over time because the data it was trained on has changed significantly.
MLOps challenges
MLOps might be far more streamlined and efficient than traditional approaches, but it's not without its challenges. They include the following:
- Staffing. The same data scientists responsible for developing ML algorithms might not be the most effective at deploying them. They also might not be best equipped to explain how to use the algorithms to software developers. Some of the best MLOps teams embrace the idea of cognitive diversity -- the inclusion of people who have different approaches to problem-solving and offer unique perspectives because they think differently.
- Costliness. MLOps can be costly, given the need to build an infrastructure that encompasses many new tools and the resources required for data analysis as well as model and employee training. This is especially true of large-scale machine learning projects with lots of dependencies and feedback loops. It is important for an organization interested in these projects to assess whether MLOps is the best approach.
- Imperfect processes. While MLOps processes are designed to reduce errors, some mistakes still occur and require human intervention.
- Cyber attacks. Malicious actors are a threat given the large amount of data that MLOps infrastructures store and process. Cybersecurity is required to minimize the risk of data breaches or leaks.
MLOps vs. DevOps
The most obvious similarity between DevOps and MLOps is the emphasis on streamlining design and production processes. However, the clearest difference between the two is that DevOps is focused on meeting software vendors' business goals by producing the most up-to-date versions of software applications for customers as quickly as possible. MLOps is instead focused on surmounting the challenges that are unique to machine learning to produce, optimize and sustain a model.
DevOps typically involves development teams that program, test and deploy software apps into production. MLOps means to do the same with ML models, but with a handful of additional phases. These include extracting raw data for analysis, data preparation, model training, evaluating model performance, and finally, continuous monitoring and training.
Standard practices for MLOps
There are many useful practices to which MLOps teams adhere. The following strategies can help guide a successful ML project to completion and reduce its likelihood of failure:
- An application programming interface from an existing AI service can be used to simplify or expedite MLOps in various ways. For example, this approach is used to retrieve data from external data sources or for automated testing of ML models.
- MLOps professionals often run parallel model development processes so that if one model fails, they still have others in progress.
- Pre-trained models are used to show proof of concept.
- Generalized algorithms showing some success are further trained for a specific task. For example, a logistic regression algorithm can be trained to predict the likelihood of future events.
- Publicly available data sources are used to bridge gaps in model training data, provide new data and prevent model drift.
There are four different types of ML training approaches. Supervised machine learning is the most common, but there is also unsupervised learning, semi-supervised learning and reinforcement learning. Learn about the seven steps involved in machine learning training.