{"id":129602,"date":"2021-07-08T08:00:08","date_gmt":"2021-07-08T12:00:08","guid":{"rendered":"https:\/\/www.kdnuggets.com\/?p=129602"},"modified":"2021-07-08T07:25:14","modified_gmt":"2021-07-08T11:25:14","slug":"mlops-is-an-engineering-discipline","status":"publish","type":"post","link":"https:\/\/www.kdnuggets.com\/2021\/07\/mlops-engineering-discipline.html","title":{"rendered":"MLOps is an Engineering Discipline: A Beginner&#8217;s Overview"},"content":{"rendered":"<div align=\"right\"><a href=\"#comments\">comments<\/a><\/div>\n<p><b>By <a href=\"\" target=\"_blank\" rel=\"noopener\">Angad Gupta<\/a>, Data Science Student<\/b><\/p><div class=\"kdnug-after-first-paragraph kdnug-entity-placement\" id=\"kdnug-130394582\"><div id=\"kdnug-3114619513\"><a data-no-instant=\"1\" href=\"https:\/\/www.pny.com\/nvidia-rtx-pro-6000-blackwell?iscommercial=true&#038;utm_source=KDNuggets+Banner+300x250&#038;utm_medium=Web+Banners&#038;utm_campaign=Blackwell+Server&#038;utm_id=RTX+PRO+6000\" rel=\"noopener nofollow\" class=\"a2t-link\" target=\"_blank\"><p>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" style=\"max-width: 100%; height: auto;\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/s-pny-2606-1.jpg\" alt=\"NVIDIA RTX PRO 6000 Blackwell Server Edition\" \/><br \/>\nLearn more<\/p>\n<\/a><\/div><\/div>\n<h3><b>Introduction<\/b><\/h3>\n<p>&nbsp;<br \/>\nMLOps is a combination of ML + DEV + OPS. MLOps basically helps to increase production scalability and quality of production models by increasing automation.<\/p><div class=\"kdnug-in-content-1 kdnug-entity-placement\" style=\"text-align: center;padding-bottom: 180px;padding-top: 20px;\" id=\"kdnug-1107414358\"><div id=\"kdnug-20076241\"><a data-no-instant=\"1\" href=\"https:\/\/www.pny.com\/nvidia-rtx-pro-6000-blackwell?iscommercial=true&#038;utm_source=KDNuggets+Banner+300x250&#038;utm_medium=Web+Banners&#038;utm_campaign=Blackwell+Server&#038;utm_id=RTX+PRO+6000\" rel=\"noopener nofollow\" class=\"a2t-link\" target=\"_blank\"><p>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" style=\"max-width: 100%; height: auto;\" src=\"https:\/\/www.kdnuggets.com\/wp-content\/uploads\/s-pny-2606-1.jpg\" alt=\"NVIDIA RTX PRO 6000 Blackwell Server Edition\" \/><br \/>\nLearn more<\/p>\n<\/a><\/div><\/div>\n<p>MLOps is the idea of combining the\u00a0long-established practice of DevOps with the emerging field of Machine Learning. It is the creation of an automated environment for model development, model retraining, drift monitoring, automation of pipeline, quality control, and\u00a0governance of a model into a single platform.<\/p>\n<p><center><img decoding=\"async\" src=\"\/wp-content\/uploads\/mlops_engineering_discipline.jpg\" alt=\"Figure\" width=\"100%\" \/><br \/>\n<font size=\"-1\">Image source: techinnocens<\/font><\/center><br \/>\n&nbsp;<\/p>\n<p>An MLOps team includes the data scientists who curate datasets and design AI models and ML engineers who run those models and datasets in the automated ways.<\/p>\n<p>&nbsp;<\/p>\n<h3><b>Why MLOps is important<\/b><\/h3>\n<p>&nbsp;<br \/>\nAn MLOps team will help you the following issues:<\/p>\n<p><b>Deployment issues:<\/b><\/p>\n<ol>\n<li>Machine learning build with multiple languages\n<li>Model deployment on development &amp; production environments\n<li>Troubleshooting issues raised during model deployments\n<li>Preparedness of deployment packages with different languages\n<\/ol>\n<p><b>Monitoring Issues:<\/b><\/p>\n<ol>\n<li>Model performance monitoring\n<li>Consistent way to monitor the models deployed across the organization\n<\/ol>\n<p><b>Model lifecycle management issues:<\/b><\/p>\n<ol>\n<li>Needing the involvement of data scientists to update the production models and maintenance activities\n<li>Keeping track of model decay after initial deployments\n<\/ol>\n<p><b>Model Governance:<\/b><\/p>\n<ol>\n<li>Production access control\n<li>Traceable model results\n<li>Model audit trails\n<li>Model upgrade approval workflows\n<\/ol>\n<p>&nbsp;<\/p>\n<h3><b>Goals of MLOps<\/b><\/h3>\n<p>&nbsp;<br \/>\nThe goals of MLOps include:<\/p>\n<ul>\n<li>Deployment and automation\n<li>Model training and upgrading\n<li>Operation diagnostics &amp; fixes\n<li>Data governance and business regulatory compliance\n<li>Production scalability\n<li>Team collaboration\n<li>Monitoring and management\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><b>Major Benefits<\/b><\/h3>\n<p>&nbsp;<br \/>\n<b>Creation of reproducible workflows pipelines and ML models: <\/b>Pipelines are the backbone of the infrastructure of the machine learning workflow. Pipelines help to get the data from the source systems, and process and validate the data. It also keep track of all the activities such as model version, dataset being used to train the models, etc.<\/p>\n<ul>\n<li>Create machine learning pipelines to design, deploy and reproduce model deployment\n<li>Provide a mechanism to trace the code version, data and various matrices as well as execution logs\n<\/ul>\n<p><b>Easy model deployment in any production environment: <\/b>Machine learning models are complex in nature, and each deployment requires the resources to run models efficiently. Deployment of machine learning models require automated system to provide and manage the required resources and execute properly.<\/p>\n<ul>\n<li>Deployment of machine learning models quickly and perfectly\n<li>Automated control of the usage of cloud resources\n<li>Running model validation and various tests before deployment\n<li>Predefined dedicated system to migrate models from deployment to production systems\n<\/ul>\n<p><b>Management of machine learning life cycle: <\/b>A final machine learning model can have many associated micro and ancillary services embedded within it. It is required to keep track of the all the associated resources used in the machine learning models for further enhancement and verification purposes.<\/p>\n<ul>\n<li>Use effective integration tools to track the model development and its components and integrate all the components via dedicated tools\n<li>Advanced bias data analysis to cross verify model performance over a period of time\n<\/ul>\n<p><b>Machine learning resource control and management: <\/b>Machine learning models are required to train continually with different datasets, so it is mandatory to have them keep track of the model version, code version, data set version, and associated required resources. <\/p>\n<ul>\n<li>Keep track of model version history for audit purposes\n<li>Evaluate the importance of features and create more advanced models with minimal bias using uniform distribution metrics\n<li>Set a resource quota and establish proper policies for increasing\/decreasing these resources as requirement to run the model efficiently\n<li>Create audit trails to meet regulatory requirements as you mark machine learning resources and automatically trace experiments\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><b>Best Practices<\/b><\/h3>\n<p>&nbsp;<br \/>\n<b>ML pipelines: <\/b>Setup of various ML pipelines, such as a data pipeline, to define the dependencies and its execution order and produce the matrices for the monitoring of a particular pipeline's resources <\/p>\n<p><b>Hybrid teams: <\/b>MLOps includes the work of a data scientist, machine learning engineer, DevOps engineer and data engineer; such a hybrid team will hopefully, by design, handle issues quickly and efficiently<\/p>\n<p><b>Model and data Versioning: <\/b>In addition to maintain the code version, we also need to maintain the machine learning model version and data used for training the model, hyperparameters of the model, and meta-data of models, etc.; there is more to model versioning than just the resultant model itself<\/p>\n<p><b>Model validation: <\/b>There is a need to setup the statistical tests for model validation because model validation can\u2019t be pass\/fail or true\/false; it is much more nuanced, and there are lessons that can be learned from detailed statistical tests<\/p>\n<p><b>Data validation: <\/b>Before training a model on the provided data, input data has to be validated to avoid inserting uncertainty and bias from the model<\/p>\n<p><b>Monitoring: <\/b>As training and deploying models takes up more and more resources, it is become more important to monitor model performance in the environment by visualizing the various matrices of resources being used by the model<\/p>\n<p>&nbsp;<\/p>\n<h3><b>Platforms and tools to assist with MLOps<\/b><\/h3>\n<p>&nbsp;<br \/>\nAs alluded to above, the following types of platforms and tools can assist with MLOps:<\/p>\n<ul>\n<li>Those tools specifically for model tracking, model history, and model registry related information\n<li>Those tools designed for model versioning, and versioning the various individual aspects of models (code, data sets, etc.)\n<li>Cloud service platforms to execute the model experiments as well as the deployment of models and ML pipelines\n<\/ul>\n<p>&nbsp;<\/p>\n<h3><b>Conclusion<\/b><\/h3>\n<p>&nbsp;<br \/>\nMLOps is a new branch of engineering disciplines. It\u2019s a hybrid team of machine learning engineers, DevOps and data scientists which helps in retrieving the data, validating it, deploying the machine learning models, and training them with the proper datasets. MLOps also helps to monitor the model output to optimize the model, runs and produces the desired output seamlessly. MLOps is very helpful for deploying and training models and keeping track of the models and associated datasets.<\/p>\n<p>&nbsp;<br \/>\n<b>Bio: <a href=\"https:\/\/www.linkedin.com\/in\/angad-gupta-37007a37\/\" target=\"_blank\" rel=\"noopener\">Angad Gupta<\/a><\/b> is working as a customer delivery engineer at AutoGrid India Pvt Ltd and pursuing M.Tech in data science from Bits Pilani. You may follow him on <a href=\"https:\/\/www.linkedin.com\/in\/angad-gupta-37007a37\/\" rel=\"noopener\" target=\"_blank\">LinkedIn<\/a>.<\/p>\n<p><b>Related:<\/b><\/p>\n<ul class=\"three_ul\">\n<li><a href=\"\/2021\/06\/power-mlops-dataops-data-science.html\">Unleashing the Power of MLOps and DataOps in Data Science<\/a>\n<li><a href=\"\/05\/easy-mlops-pycaret-mlflow.html\">Easy MLOps with PyCaret + MLflow<\/a>\n<li><a href=\"\/2021\/05\/deploy-dockerized-fastapi-app-google-cloud-platform.html\">Deploy a Dockerized FastAPI App to Google Cloud Platform<\/a>\n<\/ul>\n<p><a name=\"comments\"><\/a><\/p>\n<div id=\"disqus_thread\"><\/div>\n<p> <script type=\"text\/javascript\">\n var disqus_shortname = 'kdnuggets';\n <!--(function() { var dsq = document.createElement('script'); dsq.type = 'text\/javascript'; dsq.async = true; dsq.src = 'https:\/\/kdnuggets.disqus.com\/embed.js';-->\n <!--(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })();-->\n <\/script><\/p>\n","protected":false},"excerpt":{"rendered":"MLOps = ML + DEV + OPS. MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning.\n","protected":false},"author":89,"featured_media":129609,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","_seopress_robots_follow":"","_seopress_robots_imageindex":"","_seopress_robots_snippet":"","_seopress_robots_primary_cat":"none","_seopress_robots_breadcrumbs":"","_seopress_robots_freeze_modified_date":"","_seopress_robots_custom_modified_date":"","_seopress_robots_canonical":"","_seopress_social_fb_title":"","_seopress_social_fb_desc":"","_seopress_social_fb_img":"","_seopress_social_fb_img_attachment_id":0,"_seopress_social_fb_img_width":0,"_seopress_social_fb_img_height":0,"_seopress_social_twitter_title":"","_seopress_social_twitter_desc":"","_seopress_social_twitter_img":"","_seopress_social_twitter_img_attachment_id":0,"_seopress_social_twitter_img_width":0,"_seopress_social_twitter_img_height":0,"_seopress_redirections_value":"","_seopress_redirections_enabled":"","_seopress_redirections_enabled_regex":"","_seopress_redirections_logged_status":"","_seopress_redirections_param":"","_seopress_redirections_type":301,"_seopress_analysis_target_kw":"","inline_featured_image":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"mc4wp_mailchimp_campaign":[],"footnotes":"","_links_to":"","_links_to_target":""},"categories":[5286],"tags":[1924,485,197,4999,2211],"class_list":["post-129602","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kdnuggets-originals","tag-data-engineering","tag-deployment","tag-machine-learning","tag-mlops","tag-modeling"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts\/129602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/users\/89"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/comments?post=129602"}],"version-history":[{"count":0,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/posts\/129602\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/media\/129609"}],"wp:attachment":[{"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/media?parent=129602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/categories?post=129602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kdnuggets.com\/wp-json\/wp\/v2\/tags?post=129602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}