Nestlé Quality Management and Forecasting Analysis
Nestlé Quality Management and Forecasting Analysis
---Management Science
Established over 150 years ago, Nestlé is the world’s largest, most diversified food and
beverages company. We have a unique global footprint and sell our products in 188 countries
worldwide. Through enhancing quality of life and contributing to a healthier future, we aim to
deliver sustainable, industry-leading financial performance and earn trust.
Background of Nestle
Background of Nestle
The Nestlé Waters business has continued to grow in the UK, supported by the success of its
strong local brands, Buxton Natural Mineral Water (the number one British branded bottled water)
and Nestlé Pure Life spring water. The UK bottling factory at that time had no space to expand to
meet demand. The site also had inefficiencies in the production process which meant time and
resources were unable to be optimised. These factors lead to Nestlé Waters UK (NWUK) deciding
to look for a new site and invest £35 million to build a brand new state-of-the-art combined plant
and warehouse facility to bottle the two local waters. The new plant is one of Europe’s most
innovative and efficient bottling facilities.
B Part 02
Problem Statement
Problem Statement
[Link]/Marketing/Promotion(Via Forecasting)
Sales/Marketing/Promotion in ways that are appropriate to consumer
audiences and
shaping consumer behaviour to promote healthful choices and
better environmental outcomes.
Sub-issues:
• Foster environmentally friendly behaviours
• Foster healthy behaviours
• Infant formula marketing
• Product labelling
• Responsible marketing to children
Problem Statement
[Link]
looking-back-at-the-maggi-noodles-crisis-in-india-1810003-2021-06-02
[Link]
coli-outbreak/#more-226676
Situation and Facts Gathered
Marketing/Promotion
[Link]
marketing-claims-on-baby-milk-formulas
Situation and Facts Gathered
Resource(Water)
[Link]
E Part 5
Methodology to
Use / QM to Be
Applied
Total Quality Management for Nestlé
Total Quality Management(TQM) refers to a quality emphasis that encompasses the entire
organization from supplier to customer.
Hazard Analysis and Critical Control Points
Nestlé apply the internationally recognized HACCP (Hazard Analysis and Critical Control Point) system
to ensure food safety. This preventive and science based system identifies, evaluates, and controls
hazards that are significant for food safety. It covers the entire food production process from raw
materials to distribution and consumption. Nestlé HACCP plans and systems are verified by external
certification bodies against the international ISO 22’000:2005/ISO 22002-1 standards.
Benefits
Quality control in the food industry delivers numerous benefits, including consumer safety,
regulatory compliance, consistent product quality, cost reduction, brand protection, and
opportunities for continuous improvement. Implementing robust quality control measures is
essential for food businesses to thrive in a competitive marketplace and maintain customer
trust and satisfaction.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
1. Business Understanding
---Sell In Forecasting
[Link] engineering
---Time series transformation to be supervised learning problem
[Link] Understanding
---Sell In Forecasting
Correct forecasting is important from a business perspective. If our forecast assumed fewer products sold than the
actual demand then we lose profit. On the other hand, if our forecast was overestimated, so we actually sell fewer
products, we also lose, we incur the cost of product stock, or worse, our products expire. One bad forecast has a
business, production and logistical consequences.
Sell In is the number of products the manufacturer (in our case Nestlé) sells to the retailer, while Sell Out is the
number of products sold from the retailer to the end customers. We as a manufacturer are interested in Sell In. Here
you can already see the relationship between Sell In and Sell Out. Retailers will not sell products if we do not supply
them. Moreover, if they want to sell more because they are planning a promotion for our product, they also have to
order more products from us.
Business and production conditions require a weekly forecast of Sell In for up to 3 months, or on average the
next 12 weeks. The forecast should be weekly because production or logistics are planned weekly.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
Selling products over time is nothing but a time series. Selling multiple products for different retailers is
multiple time series.
Let’s look at an example of selling one product for one of the retailers.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] Out Forecasting
---Why Sell In Needs a Future Sell Out
On the first chart, you can see the number of ordered products. As you can see the moving average in
time is constant, but in some weeks it is dependent — hence the significant increases.
On the next chart we add Sell Out and we can see that it is lagging behind Sell In. If we see a significant
increase in the number of product orders by our customers, then we can expect that in an average next
3–4 weeks the stores of this customer will sell more products than usual. The length of this Sell Out vs.
Sell In lag can vary from retailer to retailer.
On the third chart, we add the number of delivered products, as I said before it is not always equal to
the number of ordered products. But here we can see other dependencies, for example, if we don’t
deliver as many as ordered then maybe in the following weeks our client will repeat the order.
In the last graph there is only Sell Out and promotional data on which this Sell Out is significantly
dependent.
Hence our conclusion at Nestlé. First, we should forecast Sell Out, based on which we will
forecast Sell In
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] engineering
---Time series transformation to the supervised learning problem
Our methodology is based on machine learning, hence we use one model to forecast
multiple products.
before we can use machine learning algorithms, our time series must be transformed
into a supervised learning problem. In time series, there is no concept of X and y
variables. So we need to choose what we will predict (y) and use feature engineering to
create all the X variables that will be used to make predictions.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] engineering
---Time series transformation to the supervised learning problem
In the first step, we create our targets, the values that the model will forecast. These values are numbers
of how much the customer will sell in a week, in 2 weeks, and so on, as well as how much the customer
will order from us in a week, in 2 weeks, and so on.
For the product we analyzed, we could not see a trend, but this is not always the case. Therefore, we don’t
need to only forecast the raw value, but for example, the differential value, which is a popular and widely used
data transformation that makes time series data stationary. We can also forecast the logarithmic ratio of the
future value by the current value. By using any of these transformations to create new values, we forecast
dynamics and can forecast raw values outside the range of the training data.
For last dates, we cannot create future values because we do not know them. However, we do not remove the
missing created because of this. These missings may arise for the last dates. That is, for the penultimate date,
December 22, we can only make a Forecast one week ahead to compare it with the true value. However, we
cannot forecast 2 weeks ahead. Depending on the forecast horizon, we move the window for the test data to
only forecast values from the same period, for example, 2019.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] engineering
---Time series transformation to the supervised learning problem
Next, we create X variables, the features on which the model will fit and then make a forecast. Such
values can be features from the future that are known at the time the forecast is made — in our case,
these are just promotional variables, what this promotion will be next week, in 2 weeks, etc.
➞ It will be our X.
These are the only features from the future that we use and are known at the time the forecast is
performed. There is no data leakage. This is important and known to practitioners, but often overlooked
by researchers or stakeholders. Data leakage occurs when a model has access to information that in
practice would not be available.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] engineering
---Time series transformation to the supervised learning problem
Lagged values, how much we sold a week ago, 2 weeks ago, etc. This is a simple way to change a time
series into a supervised learning problem. Here, on the other hand, we do not have these features for
the first rows because there is no data to get them from.
Here we see a similar situation with missing values, but these missing features will appear for the first
time indexes. We need all the X variables, and we are not going to replace them with another value.
Therefore, we remove those observations with missing values created in this way.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] engineering
---Time series transformation to the supervised learning problem
The next step after adding the raw lagged values is to add rolling statistics, that is, for a sliding window we
calculate various statistics. This can be the mean, median, standard deviation, sum, or at least the difference
between different values. The last can be the difference between the sum of ordered and delivered
products. Such a variable tells the model that the retailer may be reordering because it has not received as
much as it ordered in the past. We compute statistics for windows of different widths, for example, the
moving average of the previous 2, the previous 3 values, and so on.
If we transformed the day of the week into a numeric variable, Monday could have a value of 1, while
Sunday could have a value of 7. Thus, for the model, the two days would be very far apart, but in reality,
Sunday is followed by Monday.
Another transformation could be OneHotEncoder, but then we would create multiple binary variables.
These are examples of data transformations, I would say basic. We would find still others depending
on the problem. We should also transform the categorical variables, which are customer and product.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] engineering
---Time series transformation to the supervised learning problem
In a time series, in addition to a trend, there may also be seasonality, meaning that there are more sales
in a particular period of the year than in another. To use the week number as information, I convert the
week into a cyclical variable, which is sine and cosine. Thus, I transform one variable into two variables.
Here I am transforming the week number, but we can use the same approach for other cyclic variables
like an hour or a day of the week. Using the day of the week as an example, I will state why this
transformation may be better than others.
If we transformed the day of the week into a numeric variable, Monday could have a value of 1, while
Sunday could have a value of 7. Thus, for the model, the two days would be very far apart, but in reality,
Sunday is followed by Monday.
Another transformation could be OneHotEncoder, but then we would create multiple binary variables.
These are examples of data transformations, I would say basic. We would find still others depending
on the problem. We should also transform the categorical variables, which are customer and product.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] Multioutput Regression
---Methodology of using ML models for multioutput regression
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] Multioutput Regression
---Methodology of using ML models for multioutput regression
We have already created a new set of features for X and y, so the question becomes — how do you fit models
in Nestlé? Standard machine learning algorithms (I’m not talking about neural networks) are designed to
predict a single numerical value, so we need to have a separate model for each horizon. We as a model use
Random Forest. In addition, as I mentioned earlier, we would like to make a Sell Out prediction then a Sell In
prediction. As a reminder, Sell Out is the number of products sold from the retailer to end customers, and Sell
In here is the number of products ordered from Nestlé by the retailer. We first forecast the next 20 weeks for
Sell Out, then the next 15 weeks for Sell In. Longer horizon for Sell Out because Sell In needs a forecast of
future Sell Out values, and we know from exploratory analysis that Sell Out at most is 3–4 weeks behind Sell In.
We chose one more week for rational reasons.
As a model learning methodology, we use Chained Multioutput Regression, which is a linear sequence of
models. The first model which is the model for Sell Out 1 week ahead uses the variables we created. The next
model is Sell Out 2 weeks ahead which has the same features and forecast of the previous model. Once we
have fitted all the models for Sell Out, we then fit the models for Sell In. Each model has access to the same
features and the forecast of the previous models. In addition, each model has feature selection, which I will
write for a second.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] a single model
---Feature selection / Hyperparameters selection
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
[Link] a single model
---Feature selection / Hyperparameters selection
Now I am going to show a single model learning based on the model Sell In 3 weeks ahead.
I start with feature selection. Using feature engineering we have created many features, by which we face the problem of curse of dimensionality, as
well as it happens that features are correlated among themselves, hence many of them carry the same information, by which we may find that the
model considers one information as more important even if it is not so in reality. Another reason is that we do not want to teach the model noise.
Hence the need to use feature selection. We can divide it into variables created during feature engineering and forecastings of previous models.
I am going to start with the predictions, which is what we see at the bottom. Our model uses the predictions of previous models for Sell In, but does
not use all the predictions of Sell Out. We only select the next 5 Sell Out forecasts, which are Sell Out 4W, Sell Out 5W, Sell Out 6W, Sell Out 7W, and
Sell Out 8W. This is due to our exploratory data analysis and business relationships. Retailers will order more products from us now because they
intend to sell more products themselves in 3 weeks by promoting them.
In the case of the features we created, it is primarily that we do not select those related to Sell Out, such as lags for Sell Out, rolling statistics for Sell
Out, promotional data, because these variables are already reduced to forecast. In a way, in this case, we are reducing the multidimensionality of our
data with a forecast based on it.
We use only the characteristics associated with Sell In and the cyclical and categorical variables. However, we do not take all of them, only the
relevant ones. However, it may be that the moving average for different window widths is considered relevant. We should not take all of them, only
the most relevant or most correlated one.
We already have the features selected, so what does the training itself look like. This process is as standard as possible. However, I would like to
briefly address the choice of hyperparameters. The data we use are time series, so the observations are dependent on each other. We cannot use
randomization when splitting the training/test set, and we cannot use cross-validation. In this situation, there would be data leakage. Therefore, in
each of these splits, the test or validation indexes must be higher than the training indexes.
Sell Out Sell In Forecasting (Machine Learning for sales forecasting at Nestlé)
Benefits
sell-out and sell-in forecasting are essential for effective sales and inventory management,
supply chain optimization, and meeting customer demand. By accurately predicting sales at
both the consumer and retail levels, businesses can make informed decisions, reduce costs,
improve customer satisfaction, and drive overall business growth.
Inventory Control(JIT)
Just-in-time (JIT) is Japanese lean production technique. It focuses on timings during the production process.
Both storing and waiting for materials can increase costs. Waiting for materials will waste employees’ time
and could also delay production. JIT involves ensuring materials arrive just as they are needed. Similarly for
outputs, transport must arrive to take finished products away just-in-time, without any waiting or storage
costs.
JIT focuses on continuous improvement but only works as part of an overall lean strategy. It can improve the
efficiency of processes. It can lead to a better return on investment through improving productivity. JIT also
allows for fewer materials to be held at any one point which can reduce working capital needs as less finance
is needed for stock, leading to better financial performance. This can lead to better returns to stakeholders
such as investors, as any finance invested is yielding a direct return.
Inventory Control(JIT)
Through JIT Nestlé Waters was able to make the most efficient use of storage and time at the new factory.
Whilst the old site had to use limited storage and outsourced warehouse space off-site, the new factory
eliminated these additional transport needs. At the old site stock had to be requested and then took time to
arrive. Enough inputs had to be stored on-site to provide for production over the weekend, adding to storage
costs and wasting space. Finished pallets of bottled water had to be held until trucks arrived to transport
them. However, at the new site, transport and waiting times have been significantly reduced through raw
materials being stored adjacent to the finished goods warehouse.
This greatly improved stock control. Shorter flows for raw material and the collection of waste from the
production line also help make sure materials are in the right place at the right time, thus improving efficiency.
JIT helps make big efficiency gains for Nestlé Waters. This requires excellent relationships with suppliers and
distributors. Suppliers must deliver quality resources on time and distributors must ensure bottles are picked
up immediately when they are ready. This aspect required a lot of planning but has delivered great benefits.
Inventory Control(JIT)
Benefits
Nestlé Waters uses lean production techniques to bring benefits other than gains to
efficiency and quality. It also helps to create social and environmental benefits.
Social benefits are those shared by the communities in which Nestlé operates. For
example, the Waterswallows site in Buxton was designed to include a butterfly
planting scheme. Working with Derbyshire Wildlife Trust and the local Butterfly
Conservation Group, Nestlé Waters planted wild flowers within the factory grounds
to attract butterflies back to the area. As part of Nestlé Waters’ Creating Shared
Value it has worked with the local community on projects including its on-the-go
recycling programme and Project WET. Project WET is an educational school
initiative which helps teachers and children learn about the vital role water plays in
our lives.
A further benefit is employment, not just directly at the site, but as a result of
building the new site. When building the new factory Nestlé Waters sourced the
majority of its materials and labour from within a 50 mile radius of the site. This had
a positive impact on the local economy whilst decreasing the amount of transport
required for materials, reducing the site’s carbon footprint. The new factory has also
increased the number of apprenticeships and graduate roles that Nestlé can offer.
Whilst the old site used agency and temporary workers, the new site employs over
100 full-time staff, drawn from the local workforce.
F. Part 6
Learnings and next steps
Learnings
Focus on Quality(QC): Nestlé places a strong emphasis on the quality of their food
products. Quality control measures, adherence to regulations, and rigorous testing
processes are vital to ensure that their products meet the highest standards. Prioritizing
quality helps build trust with consumers and establishes a positive reputation for the
brand.
Digital Transformation: Like many companies, Nestlé may continue to embrace digital
technologies to enhance their operations, supply chain, and customer engagement. This
could involve leveraging data analytics, e-commerce, personalized marketing, and using
emerging technologies like artificial intelligence and blockchain for traceability and
transparency.
Expansion in Emerging Markets: Nestlé has a strong presence in both developed and
emerging markets. They may explore further expansion opportunities in regions with
growing middle-class populations and increasing consumer purchasing power, such as
Asia, Africa, and Latin America.
Thank you for
watching