Building an effective machine learning pipeline is an essential part of the end-to-end MLOps lifecycle, which follows up the concept of building reliable data pipelines by applying DataOps, as explained in our previous blog post. A pipeline for machine learning is a set of instructions that manages the data flow into and out of ML models. It encompasses data inputs, features, model parameters, the machine learning model, and prediction outputs. Furthermore, it requires close collaboration between data scientists, data engineers, DevOps teams, and business stakeholders. The pipeline should be scalable, reliable, and reproducible to ensure that the machine-learning models can be deployed and updated easily. A well-designed pipeline also helps to improve the efficiency and productivity of the machine learning development process and enables teams to quickly iterate and experiment with new models. To build an effective machine learning pipeline, it is important to select appropriate tools, establish clear processes, and continuously monitor the pipeline's performance following the concepts of Effective SRE and Observability. For an in-depth description of these concepts, you can check our previous blog post here.

In this blog post, we will explore the crucial role of engineering and operating models in developing successful machine-learning pipelines. We will also go through all the important steps in the model creation workflow, and we will address the topic of assessing model pipeline quality. Afterward, we will define the concept of model serving and why is it important in a model pipeline. Moreover, this blog post includes a real-world example of an e-commerce company that uses machine learning to improve product recommendations for customers. Most importantly, in this blog post, we emphasize on the importance of integrating SRE considerations into the pipeline to ensure reliability, scalability, and efficiency.

Striking a Balance: The Crucial Role of Engineering and Operating Model

Although engineering is an essential component of building effective machine learning pipelines, it is important to emphasize that a robust operating model is equally critical. Engineering lays the groundwork for building a functional and efficient pipeline, but without a well-designed operating model, it can be challenging to execute, maintain, and scale the pipeline effectively.

Having an appropriate operating model is crucial for building ML model pipelines because it assists in establishing a framework for organizing and executing the various stages of the pipeline. The operating model defines the roles and responsibilities of the team members involved in the pipeline, such as data scientists, data engineers, DevOps engineers, and business stakeholders. It also outlines the processes and workflows for data preparation, model training, testing, validation, deployment, and monitoring.

The absence of a well-defined operating model often leads to siloed teams that work in isolation and do not benefit from the expertise of other teams. In particular, one possible scenario where an improper operating model manifests itself is the following: siloed teams struggling to address intricate challenges that necessitate input from various teams or departments. This is because they may lack a comprehensive understanding of the problem or access to all pertinent information. Consequently, such a situation impedes creativity and innovation by limiting individuals' exposure to fresh ideas or approaches. Another consequence of an ill-defined operating model is the concentration of knowledge and competencies within a few individuals or a single role. This can result in a single point of failure for an organization which poses a risk to sustainable processes as dependencies and complexity grow. In particular, if those individuals or that role were to become unavailable, the organization may not be able to perform the tasks or processes that are reliant on their specialized knowledge. Overall, both scenarios result in missed opportunities, redundancy, and lack of innovation.

To break down silos and improve collaboration, organizations should implement an operating model that includes:

  • Cross-functional teams:
    Cross-functional teams include members from all roles along the value stream. The members of the team work together seamlessly, and this helps to break down the silos and promotes collaboration across departments.
  • Shared goals and transparency:
    The organization establishes shared goals and metrics that align with the organization's overall strategy. This would help to ensure that all teams are working towards a common objective.
  • Standardized processes and tools:
    The organization standardizes processes and tools across teams to improve efficiency and facilitate communication. For example, all teams use the same project management software, collaboration tools, and reporting templates.
  • Regular feedback:
    The organization by establishing communication channels and feedback mechanisms ensures that teams are aligned. This includes regular team meetings, status updates, and performance reviews.

Both engineering and operating models are quintessential to building effective machine learning pipelines. Engineering provides the technical foundation for the pipeline, while the operating model defines the processes and workflows for executing the pipeline efficiently and collaboratively. Together, they help to ensure that the pipeline is accurate, reliable, scalable, and improved over time.

Model Creation Workflow

Assuming that we have established a data pipeline workflow, for example as described in the previous blog post, we have all the relevant data stored in an appropriate format in a Data Warehouse, Data Lake, or some other data product format, ready to be consumed by our model. The next steps of the ML pipeline workflow are the following:

  • Data Collection:
    The first step in the ML pipeline workflow is to collect the relevant data needed for training the model. This can involve sourcing data from different sources and consolidating it into a single and more coherent dataset.
  • Data Preprocessing/ or, and Feature Engineering:
    Once we have loaded all the relevant data for the model, the data is subjected to further transformations that are specific to the model. Moreover, features are selected and standardized as required.
  • Model Training:
    Before training an ML model, it is important to define its architecture and hyperparameters, because they can significantly impact the model’s performance. Afterward, we can proceed with training the ML model using the relevant pre-processed data from the previous step.

    At this point, we need to clarify that a trained model is a term referring to a specific instance of the configured model, which has been trained on specific data at a specific time. In other words, a trained model is not the same as the definition of the model’s architecture, which defines the choice of the algorithm that we use to answer the given question/problem (i.e., Neural Networks, Isolation Forest, XGBoost, etc.).

    Due to the stochastic nature of ML models, training involves a certain degree of randomness, i.e., random initialization of model parameters, the random sampling of training data, and the use of stochastic optimization algorithms, etc. Hence, training the same model with the same configuration on the same data may or may not result in identical trained models, and that is a behavior that we want to avoid as much as possible. For example, it is common that we use random seeds to initialize random number generators that are used in various parts of the training procedure for ensuring that the results can be reproduced across different runs.
  • Model Tuning:
    If the model's performance is not satisfactory, we can try to tune the hyperparameters or modify the architecture to improve its performance. As we explained above, a poorly designed model architecture or a poor combination of hyperparameters can lead to overfitting or underfitting problem, which also impedes a good performance of the model on new data. Therefore, we should repeat steps 1-3 until we achieve the required results.
  • Model Evaluation:
    Before deployment, it's essential to evaluate the model's performance on a test set to ensure that it's accurate and generalizes well to new data. We will elaborate more on this step in the following session.
  • Model Deployment:
    Once the data has been preprocessed, and the model has been trained and validated, the next step is to deploy the model into a production environment where it can be used by the end users.
Figure 1: The workflow of a manual process for building and deploying ML models. The final model is being served as a prediction service. [Image inspired by: MLOps Continuous Delivery and Automation Pipelines in ML. ]

It's worth noting that these steps are often iterative, and we need to revisit some of them multiple times to optimize the model's performance, as you can see in Figure 1. To advance the Model pipeline though and make it reliable we should apply the concepts of DevOps and SRE by implementing the following steps (as you can also see in Figure 2):

  • Continuous Integration:
    Continuous ML model integration is the practice of regularly integrating new or updated models into a development system. By continuously integrating new models into a system, organizations can keep their models up to date with the latest data and insights, and improve the accuracy and effectiveness of their predictions. During this process, we also run unit and integration tests to ensure that the model code is working as expected. The output is typically an artifact, i.e., a packaged model, that contains the code, test results, and other metadata. Once this Continuous Integration step is completed successfully, the output can be used in the Continuous Delivery process, which involves deploying the model to production and make it available to the end users, as we will explain in the next step.
  • Continuous Delivery and Deployment:
    In the Continuous Delivery step, the packaged model is delivered to a pre-production environment, where it undergoes further testing. Then, in the Continuous Deployment phase, the model gets deployed to the production environment, where it becomes available to the end users. The most important here is that by automating the deployment process for new model versions, we reduce the risk of human errors while also ensuring consistency. For example, a production ML pipeline provides a continuous stream of prediction services using newly trained models on fresh data. The step for deploying the trained and validated model as an online prediction service is automated.
  • Continuous Monitoring and Observability:
    After the model is deployed to the production environment, we need to apply Continuous Monitoring and alerting for critical components of the model pipeline. This way, we can detect and respond to crucial failures or threats in real-time, as well as ensure that the model is performing well by providing accurate predictions. For the concept of Observability, please have a look at our previous blog post here.
  • Continuous Training:
    Continuous model training refers to continuously training machine learning models using the latest data available. By continuously training models, organizations can ensure that their models are always up-to-date with the latest information and can provide accurate and effective predictions in real time. As we mentioned above, in Continuous Monitoring we track the model’s outputs and overall performance in the production environment. Hence, when performance degrades, the model can be retrained to improve accuracy and reliability.
  • Automation and Testing:
    Set up automated testing for the entire pipeline, including unit testing, integration testing, and performance testing. This will help ensure that the model pipeline is functioning as expected. Here we should highlight we not only need to test the model or the neural network that we built but also the whole model pipeline.
  • Version Control:
    Use version control to track changes over time made to your code, data, meta-data, ML model, hyper-parameters, or other artifacts, making it easier to reproduce results and troubleshoot errors. In addition, by versioning the artifacts, we can easily rollback to previous versions if needed, providing a safety solution in case of failures or mistakes. Also, versioning allows for multiple team members to work on the same project without overwriting each other’s work and causing inconsistencies.
  • Other SRE Best Practices:
    Follow best practices for SRE, such as designing for failure with rollbacks, setting up scalable infrastructure, implementing security tests/measures, and regularly conducting post-mortems to learn from incidents and improve the pipeline.
Figure 2: The workflow of a model pipeline automation for continuously building, training, and delivering ML models [Image taken from: MLOps Continuous Delivery and Automation Pipelines in ML. ]

In the process of releasing the model, you might find that you struggle to reproduce your results and that other people in your team are struggling to keep up with what exactly you have done. This concept will be extensively covered in an upcoming blog post.

Assessing Model Pipeline Quality: A Comparison of Model Evaluation and Automated Testing Approaches

Offline evaluation is a critical step in assessing the performance of a machine learning model before it is launched into production. So first establish your baseline as a reference point. And then do all required sanity checks on your model. Run simple tests such as:

  • perform perturbation tests by introducing small changes to your input to evaluate the model’s sensitivity,
  • perform invariance tests to check if your model generalizes well,
  • perform sliced-based analysis by checking how your model performs on different subsets of the input data (i.e., critical subsets with respect to what we have defined as an expected behavior of our model in different groups of our dataset – we might or might not want the model to perform differently in critical subsets with respect to business requirements),
  • and finally check the model’s bias and calibration.

Automated testing in the end-to-end model pipeline:

  • Data validation tests:Ensure that the input data is valid, complete, and consistent. For example, check for missing or duplicate values, outliers, or data format issues.
  • Unit tests:Test the individual components of the machine learning pipeline.
  • Integration tests: Test the interactions between the different components of the pipeline to ensure that they are integrated correctly.
  • Performance tests:Test the performance of the model under different conditions, such as different batch sizes, data volumes, and processing times. With these tests, we can identify potential performance bottlenecks and optimize the pipeline for speed.
  • Model evaluation tests:Test the accuracy and robustness of the trained model by evaluating its performance on a validation dataset or through cross-validation.

Model Serving

ML model serving refers to the process of deploying trained machine learning models into production for real-world use. Once a machine learning model is trained on a training dataset and evaluated on a test dataset, it needs to be made available to the end users or applications that can interact with it to make predictions or decisions based on the model's outputs. In the context of ML model serving, we utilize the concept of containerization, because it helps to simplify the deployment process by packaging the models and their dependencies into a container that can be easily deployed to a production environment. This makes it easier not only to ensure consistency, and reproducibility across different environments but also to reduce the risk of dependency conflicts.

First, it's important to ensure that your ML model serving infrastructure is scalable, reliable, and secure. Scalability means that your infrastructure can handle a growing number of requests as more and more users interact with the deployed model. Reliability means that your infrastructure should be available and responsive even under high loads, while security means that you must protect your models and data from unauthorized access, as well as from the abuse of the system.

Second, you need to consider the latency, throughput, and other performance requirements of your ML model serving system. Latency is the round-trip time between initiating a single request and receiving a response (, while throughput refers to the total number of requests that your infrastructure handles per unit of time. Balancing these factors is crucial to ensure that your system can handle a high volume of requests without compromising performance.

Third, it's important to ensure that your ML models are versioned and that you have a mechanism to deploy new versions of your models seamlessly without impacting end users. You should have a way to monitor the performance of your models and analyze the logs to identify and resolve any issues that may arise. Finally, you need to consider the cost of your ML model serving infrastructure. Running ML models in production can be expensive, so you need to optimize your infrastructure to minimize costs while maintaining high performance and reliability as expected from the business stakeholders.

Overall, ML model serving is a critical component of reliable machine learning pipelines and requires careful consideration of the aforementioned factors, i.e., scalability, reliability, security, latency, throughput, versioning, monitoring, and cost. In an upcoming blog post, we will deep dive into this concept.

Real-world Example

Let's take the example of a video streaming service that uses machine learning to personalize content recommendations for its users. The service has a sophisticated ML pipeline that incorporates SRE principles to ensure reliability and scalability. The pipeline starts with data collection, which involves gathering data from multiple sources, including user behavior, viewing history, and preferences. This data is then cleaned and preprocessed to remove any noise and inconsistencies.

Figure 3: The illustration simulates the aforementioned concepts for the video streaming service of Netflix that uses machine learning to personalize content recommendations for its users.

Next, feature engineering is used to create meaningful features that capture important information about the user and the content. For example, they might use features such as the user's age, gender, and location, as well as the language, and actors of the content. Once the features have been created, Netflix trains an ML model to predict which content the user is most likely to enjoy. The model is evaluated on a separate set of data to ensure that it is accurate and effective. The model is then deployed to production, where it is used to generate personalized recommendations. Continuously monitoring is also a concept that is applied in order to track the performance of the model and to make updates when needed.

SRE considerations are integrated into the pipeline to ensure that the system remains reliable and scalable. For example, automated testing, version control, continuous integration, continuous delivery, and deployment are used to ensure that the pipeline can handle large volumes of data and can fulfill the customer’s needs. Netflix also has several backup systems and recovery plans to guarantee that the platform can recover quickly from failures and outages.

Other important concepts of SRE are the SLIs (Service Level Indicators) and SLOs (Service Level Objectives). For a recommendation system like Netflix we can apply these concepts by defining an SLI for the percentage of user interactions with recommended content that result in a click, and then we can specify an SLO as the click-through rate (CTR) of at least 15% for recommended content on the Netflix homepage. This means that the recommendation system should aim to achieve a CTR of at least 15% for recommended content shown on the Netflix homepage. If the CTR falls below this threshold, the system should be evaluated and potentially adjusted to improve its performance. By measuring and monitoring the SLI, Netflix can ensure that the recommendation system is meeting its performance objectives and providing value to its users. For a more detailed description of these concepts, please check this blog post by our colleagues from Digital architects Zurich.

Driving Business Value with MLOps: Conclusion of a Successful ML Model Pipeline Implementation

The ML model pipeline described in this blog post demonstrates the benefits of following SRE and MLOps principles for building and deploying machine learning models in production environments. By following the proposed workflow for the Model’s pipeline, you can ensure that the models are reliable, scalable, and maintainable, and can continuously deliver accurate predictions to users in a timely manner.

The use of containerization helps to simplify the deployment process, while also ensuring that the infrastructure is scalable and resilient to failures. The incorporation of continuous integration and continuous deployment (CI/CD) pipelines enables the team to quickly iterate on the models and deploy new versions seamlessly without impacting the user experience. Furthermore, the implementation of monitoring and alerting mechanisms ensures that any issues are detected and addressed promptly. By leveraging SRE and MLOps principles, the ML model pipeline described in this blog post enables organizations to deliver high-quality machine learning applications that can meet the needs of their users and drive business value.

Machine Learning Architects Basel

Adopting data- and ML-driven approaches can be challenging and time-consuming if you are not familiar with the required data and software architectures, tools, and best practices. Managing data and machine learning end-to-end initiatives and operations can be challenging and time-consuming, including assessing and implementing required technologies and effective DataOps and MLOps workflows and practices. Considering collaboration with an experienced and independent partner could be a valuable option to explore.

Machine Learning Architects Basel (MLAB) is a member of the Swiss Digital Network (SDN). We have created our effective MLOps framework that combines our expertise in DataOps, Machine Learning, MLOps, and our extensive knowledge and experience in DevOps, SRE, and agile transformations.

If you want to learn more about how MLAB can aid your organization in creating long-lasting benefits by developing and maintaining reliable data and machine learning solutions, don't hesitate to contact us.

We hope you find this blog post informative and engaging. It is part of our Next Generation Data & AI Journey powered by MLOps.

References and Acknowledgements

  1. The Digital Highway for End-to-End Machine Learning & Effective MLOps
  2. Introduction to Reliability & Collaboration for Data & ML Lifecycles
  3. Observability for MLOps
  4. Effective MLOps: Maturity Model
  5. Designing Machine Learning Systems, by C. Huyen, O'Reilly Media, Inc.
  6. Reliable Machine Learning: Applying SRE Principles to ML in Production, by Cathy Chen,, O'Reilly Media, Inc.
  7. Practical MLOps, by Gift, Noah and Deza, Alfredo, O'Reilly Media, Inc.
  8. What are Azure Machine Learning pipelines?
  9. Overview of ML Pipelines
  10. Gartner Glossary
  11. Software development icons created by Witdhawaty - Flaticon