It may not be the first time you hear about MLOps. This agile approach, first introduced by Google in 2015 in the famous article “Hidden Technical Debt in Machine Learning”, has since then been at the center of interest of new Machine Learning approaches. At ML Architects Basel, we conceptualized and developed a simplified and industrialized MLOps approach that enables its adoption even in non-high-tech companies and environments. We call this “Effective MLOps”.

In this article, we aim to analyze the background and fundamental elements required to understand, adopt and master the MLOps approach by introducing our definition of an Effective MLOps framework. This framework addresses the key principles, workflows, activities and artefacts related to MLOps and adapted to the digital age in order to identify MLOps’ best practices and techniques.

In fact, Machine Learning (ML) projects are complex and require both deep and cross-functional knowledge. They are usually hard to maintain and update. In addition to the complexity of traditional software, ML projects deal with extra layers, i.e. data and model. The fast emergence of AI and ML in mainstream businesses exposes how hard it is to build and maintain an ML driven application. In the real world, many companies are competing to quickly deliver the most reliable and efficient product. In such an environment, we should “act as fast as possible, but as slowly as necessary”.

Machine Learning Challenges

Cross-functional teams composed of data engineers, data scientists and software engineers amongst others, work on different aspects of such a ML project to design, build, deploy and maintain a ML application. Besides, ML software production processes require different tools and workflows while being complex, hard to predict, to test, to explain and to improve. Moreover, data is often spread out over multiple systems and architectures and is far from being prepared for AI/ML processing. Also, on-premises infrastructures are usually not meeting the scalability and flexibility needed for such a complex project. Therefore, developing ML is hard, but operationalizing ML is even harder.

In recent years, the DevOps approach became vital in software engineering spaces. It aims to extend the agile principles and improve the collaboration among development and operations teams. MLOps, which is getting increasingly popular as shown in figure 1, is a similar approach that brings the DevOps principles to AI and ML projects.

Like DevOps, MLOps is a Machine Learning (ML) engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops).

Google, 2020

While MLOps inherits the challenges of DevOps for traditional software, the complexity of Machine Learning algorithms and the high-dimensionality of data, bring on top of that a different layer of complexity: being difficult to handle and requiring a unified approach to set up practices and raise culture challenges. The trade-off (cross-functional team vs. building, improving and delivering ML based software) is daily-business of many organizations seeking for a digital transformation and machine learning driven software.

All the above lead us to believe that organizations need an effective MLOps framework which is accessible and applicable to "normal" (non-leading high-tech companies like Google) environments.

Figure 1 - Evolution of interest in MLOps

In other terms, we are lacking an agile ML ecosystem that provides an end-to-end unified process to design, implement, deploy and monitor ML applications. Thus, the need to «industrialize» and «democratize» the MLOps approach call for MLOps to be made accessible to average organizations as an engineering discipline. That’s what Effective MLOps is meant to do.

Effective MLOps by MLAB: Democratizing MLOps

At ML Architects Basel (MLAB), we strongly believe that a Digital Transformation and/or an Agile transformation require a holistic transformation approach covering Technology, Operating Model and Culture, as presented in figure 2, by leveraging new generation IT capabilities. We consider the MLOps approach as a key pillar for a Digital Highway for continuous ML delivery. We have therefore defined our own approach and developed a best practices model to «industrialize» the MLOps concept, named “The Effective MLOps Framework”.

Figure 2 - 3 Pillars of Effective MLOps

For us, democratizing MLOps is about making it accessible to average organizations by providing:

  • A clear and structured set of activities supported by customizable learning modules and an approach leveraging market best practices.
  • A systematical approach to reduce complexity and increase efficiency by leveraging DataOps & AI/ML, culture & skills and Continuous Delivery & Site Reliability Engineering (SRE) capabilities.

Effective MLOps is an engineering approach that aims to include and unify operating model, technologies and culture in order to facilitate adoption and provide a smooth interaction between different (new and existing) roles involved in ML projects, and automate safe increments to continuously deliver reliable and high-quality ML systems.

Under Effective MLOps Engineering we understand the application of a systematic, disciplined and holistic approach to the cost-effective development and operations of ML Systems in the context of changing business and data landscapes.

Based on that, the Effective MLOps approach we propose covers the three key pillars of Technology, Operating model and Culture. Technically, this translates to:

  • Designing and building the ML Pipeline to manage Code, Model and Data Changes by offering objective and up-to-date benchmarks of both established market solutions (e.g., Gartner Magic Quadrant for Data Science and Machine Learning) and new, innovative (next generation) tools.
  • Designing and deploying the MLOps culture, as well as operating model and skills required to efficiently to build and run ML Systems.
  • Maintaining and operating the continuous delivery pipeline and the SRE cockpit to enable continuous monitoring and release-management of the ML Systems by providing maturity assessments and roadmaps to manage governance, processes and tools.
Figure 3 - Effective MLOps Scope

The Effective MLOps framework helps organizations to design, build and enable their ML systems and operating models to continuously deliver reliable Machine Learning systems. We think that it is important that some best practices should be taken into consideration regarding the architecture, the implementation and the operations.

In other words, we believe that the key principles for Effective MLOps are:

  • Data, model and code pipelines driven by reliability
  • Continuous learning and if needed online, real-time predictions
  • Error budgeting and service level objective (SLO) engineering
  • Cross-functional collaboration between teams
  • Adoption and extension of the DevOps culture and values to the ML domain

The Effective MLOps we propose at MLAB contributes hugely in enabling what is called the 4 Cs of MLOps: Continuous Integration (CI), Continuous Delivery (CD), Continuous Monitoring (CM) and Continuous Training (CT).

Now that you are aware of the Effective MLOps scope and framework, the principles behind it and how it can be implemented in your organization, we think it is also important to share with you some of the best practices, we strongly believe, are primordial to adopt MLOps:

  • Establish unified model development and data exploration
  • Adopt continuous delivery for ML code, model and data pipelines
  • Leverage unified monitoring, observability and AIOps for ML model and system
  • Define data engineering roles and workflows
  • Define model development and workflows
  • Define DevOps/SRE and workflows
  • Work on SRE, ML and data science skills development
  • Adopt technical and operating model retrospective
  • Establish continuous culture sessions
Figure 4 – Effective MLOps Big Picture

In the figure above, we tried to summarize all of the above to provide you with a big picture perspective on Effective MLOps; including its mission, its key principles and the best practices you need to adopt (in terms of Technologies, Operating Model and Culture) for a successful adoption of Effective MLOps.

In the next blogpost, we will dive deeper and more technical into our Effective MLOps approach. Hope you enjoy this piece, and stay tuned for the next one!

Footnotes and References

  • MLOps: Continuous delivery and automation pipelines in machine learning: link
  • AWSMLOps Framework Implementation Guide: link
  • Delivering on the Vision of MLOps A maturity-based approach: link
  • Machine Learning Ops (MLOps) in 2021: In-depth Guide: link
  • Machine Learning Lens AWS Well-Architected Framework: link
  • GigaOm-Delivering on the Vision ofMLOps: link
  • Continuous Delivery for Machine Learning, Automating the end-to-end lifecycle of Machine Learning applications: link