The main function of MLOps is to automate more iterative steps in the ML workflow of data scientists and ML engineers, from model development and training to model deployment and operation ( serve model). Automating these steps creates business agility and a better experience for users and end customers, increasing the speed, power, and reliability of ML. These automated processes can also reduce risk and free developers from rote memorization, allowing them to spend more time innovating. All of this contributes to the bottom line: a McKinsey Global Research 2021 found that companies that successfully scale AI can add an additional 20% to earnings before interest and taxes (EBIT).
Vincent David, senior director of machine learning at Capital One, said: “It is not uncommon for companies with complex ML capabilities to create disparate ML tools in individual pockets of the business. “But often you start to see similarities — ML systems do similar things, but with a slight difference. Companies looking to get the most out of their investments in ML are consolidating and enhancing their best ML capabilities to create standardized tools and platforms that everyone can use. usable – and ultimately create differentiated value in the marketplace. ”
In practice, MLOps requires close collaboration between data scientists, ML engineers, and site reliability engineers (SREs) to ensure model reproducibility, monitoring, and maintainability. Consistent ML. Over the past few years, Capital One has developed MLOps best practices that apply to industries: balancing user needs, adopting a common platform and cloud-based technology platform, leveraging advanced technologies. open source tools and ensure the right level of accessibility and governance for both the data and the model.
Understand different needs of different users
ML applications typically have two main types of users – technical experts (data scientists and ML engineers) and non-technical experts (business analysts) – and it is important to strike a balance between them. their different needs. Technical professionals often enjoy complete freedom to use all available tools to build models for their intended use cases. On the other hand, non-technical professionals need user-friendly tools that give them access to the data they need to create value in their own workflow.
To build consistent processes and workflows while satisfying both teams, David recommends meeting with the application design team and subject matter experts across a variety of use cases. “We look at specific cases to understand issues, so users get what they need to benefit their work, but also the company as a whole,” he said. “The key is figuring out how to create the right capabilities while balancing the different stakeholders and business needs in the enterprise.”
Applying a common technology
Collaboration between development teams — critical to successful MLOps — can be difficult and time-consuming if these teams are not using the same technology. A unified technology stack that allows developers to standardize and reuse components, features, and tools across models such as Lego bricks. “That makes it easier to combine related capabilities so that developers don’t lose time moving from one model or system to another,” says David.
A cloud-native stack — built to leverage the distributed computing model in the cloud — enables developers to self-serve infrastructure on-demand, continuously leveraging new and world-class capabilities. introduce new services. Capital One’s decision to go all in on the public cloud has had a significant impact on developer efficiency and speed. Release of code to production is now much faster, and ML platforms and models can be reused across the enterprise.
Save time with open source ML tools
Open source ML tools (code and programs are freely available for anyone to use and adapt) are at the core of creating a powerful cloud platform and unified technology stack. Using existing open source tools means you don’t have to spend precious engineering resources reinventing the wheel, accelerating the speed at which teams can build and deploy models .