The Startling Truth About Traditional Machine Learning Training vs. Real-World Production — What They’re Not Telling You!

Ricard Santiago Raigada García
AI Mind
Published in
3 min readJan 22, 2024

--

Image generated by OpenAI’s DALL·E, used with permission for The Startling Truth About Traditional Machine Learning Training vs. Real-World Production — What They’re Not Telling You!

In the following article, I will highlight some differences I have found between traditional machine learning training and machine learning in production.

First, I would like to comment that when studying ML, you are usually taught to solve a problem linearly. Although they explain the data lifecycle and how it works and is applied, they do not tell you that many of the steps in the lifecycle are composed of auto-loops and recurrent loops. Often, when you start developing a ML system with a goal, at some point in the lifecycle, the company’s objectives change, so they may not align with the initial objectives of the model. So, you will have to modify the ML system practically from the earliest phase.

This fact is also regularly related to the fact that it is not usually explained that ML problems involve many stakeholders such as the sales team, the product team, or the managers. Often the stakeholders have totally opposite objectives, and it is the task of the ML team to be able to incorporate the different perspectives in favor of each team. For example, imagine you have a recommendation system where the owners want the recommendations that leave the most profit to be shown, and the customer team wants the recommendations that are more in line with the customer’s taste to be shown. These are two different objectives. In this case, there would have to be decoupling objectives and generate two different models and combine them into one to satisfy both teams. However, this is a highly complex issue that is not usually explained in traditional ML studies.

Following this, there is the fact that studies are carried out with static data. Datasets that do not vary and often certain practices are not carried out correctly. For example, normalization is often applied before dividing the data sets into train, test, and validation, when only the training set should be normalized. In real environments, data distributions are constantly changing, however, this is not a topic that is usually addressed in the study of traditional ML. It is not taught how to deal with temporal distributions, whether more priority should be given to more recent data than to older data, for example.

This is not a discussion exclusive to traditional ML studies; it is also part of data science studies. Topics like the above are not addressed, not even what types of data distribution changes there have been. Nor are you taught to manage platforms like AWS and perform DevOps tasks in traditional ML academic teaching. There are many more aspects that they do not teach us. What other important aspects do you think I am missing? Please let me know in the comments.

A Message from AI Mind

Thanks for being a part of our community! Before you go:

--

--

Writer for

Data Scientist & AWS Architect. Skilled in data mining, ML, and cloud solutions. Loves teamwork and innovative challenges. Open to global opportunities.