Department of Computer Engineering
Academic Year 2025-26
Name of the Student: Palak Boricha
Student ID: 22102193
Class and Batch: B.E. A-3
Roll no: 18
Diet Recommendation using Random Forest.
INTRODUCTION
Maintaining a balanced diet and meeting personal nutritional goals is a critical aspect of a
healthy lifestyle. With the growing diversity of recipes and food products available, selecting
meals that align with individual dietary requirements can be challenging. Personalized diet
recommendation systems aim to address this by suggesting recipes tailored to a user’s weight,
height, age, gender, activity level, and specific dietary goals such as weight loss, gain, or
maintenance.
The nutritional content of recipes, including calories, protein, fat, carbohydrates, fiber, and
other key nutrients, provides a foundational basis for modeling user preferences. While
advanced techniques like deep learning can capture complex patterns in nutritional data, they
often require extensive datasets and may be prone to overfitting. In many real-world dietary
scenarios, a combination of interpretable machine learning models and similarity-based
recommendations can effectively match users with suitable recipes.
This project develops a Random Forest-based diet recommendation system that evaluates
recipes using nutritional profiles and healthiness scores. By combining machine learning
regression with nearest-neighbor similarity matching, the system provides personalized
recipe suggestions that satisfy nutritional limits and user ingredient preferences. The goal is
to deliver recommendations that are both accurate and practical, supporting users in
achieving their dietary objectives.
BACKGROUND
Personalized diet planning is a complex challenge in today’s health-conscious society, where
individuals have diverse nutritional needs and lifestyle goals. Selecting meals that meet
personal requirements—such as weight management, muscle gain, or maintaining overall
health—can be overwhelming due to the vast number of available recipes and variations in
nutritional content. Accurate diet recommendations are therefore not just a convenience but
a necessity for individuals aiming to make informed dietary choices, nutritionists creating
meal plans, and health-focused applications seeking to provide value to users.
Traditionally, diet planning relies on either general nutritional guidelines or manual
consultation with dietitians. While these methods offer a reliable framework, they can be
time-consuming, subjective, and often fail to adapt to personal preferences or specific dietary
goals in real-time. On the other hand, rule-based recommendation systems can suggest meals
based on fixed criteria, but they may not account for complex interactions between different
nutrients, user profiles, and ingredient preferences.
Machine learning provides a data-driven approach to overcome these limitations. By
analyzing large datasets of recipes and their nutritional values, machine learning models can
uncover patterns and relationships between nutrients and user-specific dietary requirements
that might be difficult to identify manually. This project leverages Random Forest
regression combined with nearest-neighbor similarity matching to develop a personalized
diet recommendation system. The goal is to create a tool that delivers accurate, adaptable,
and interpretable recipe suggestions tailored to individual health profiles and ingredient
preferences, thereby supporting users in achieving their dietary objectives effectively.
METHODOLOGY/APPROACH
• This project uses the recipes.csv dataset, which contains detailed information on
various recipes, including their nutritional content and ingredients. The dataset
includes columns such as RecipeId, Name, CookTime, PrepTime, TotalTime,
RecipeIngredientParts, Calories, FatContent, SaturatedFatContent,
CholesterolContent, SodiumContent, CarbohydrateContent, FiberContent,
SugarContent, ProteinContent, and RecipeInstructions. These features provide a
comprehensive profile for each recipe, enabling personalized diet recommendations.
• Data Preprocessing
• Data Cleaning: The dataset was cleaned to handle missing or inconsistent values,
especially in the nutritional columns. Non-numeric entries in nutrient fields were
removed or converted to appropriate numerical formats.
• Feature Selection: Nutritional content columns (Calories, FatContent,
SaturatedFatContent, CholesterolContent, SodiumContent, CarbohydrateContent,
FiberContent, SugarContent, ProteinContent) were selected as features to evaluate
recipe healthiness and match user-specific dietary requirements. Additional features
such as CookTime and PrepTime were retained for informational purposes.
• Healthiness Scoring
• A Health Score was calculated for each recipe to quantify its nutritional balance. This
score normalizes each nutrient against recommended limits and combines them into
a single metric, allowing the system to filter out recipes that are less suitable for a
user’s dietary needs.
Model Training and Evaluation
To evaluate the model’s performance, the recipe dataset was split randomly into a training
set (80%) and a testing set (20%). This ensures that the Random Forest model is trained on
a majority of recipes and validated on unseen recipes to assess its ability to predict healthiness
scores accurately.
• Random Forest Regression: A Random Forest model was trained using the
nutritional features as inputs and the Health Score as the target. This model predicts
the healthiness of recipes and helps in recommending high-quality options.
• Personalized Nutrient Adjustment: User-specific daily calorie needs were
calculated based on weight, height, age, gender, activity level, and dietary goal (loss,
gain, maintain). Nutrient limits were scaled proportionally to these calorie
requirements.
• Nearest-Neighbor Matching: Recipes were filtered according to user nutrient limits
and ingredient preferences. A nearest-neighbor algorithm was applied using cosine
similarity to identify recipes closest to the user’s nutritional profile.
Pipeline Overview
1. Calculate user daily calories using BMR and activity level.
2. Adjust nutritional limits based on user calories.
3. Filter recipes by nutrient limits and ingredient preferences.
4. Predict recipe healthiness using the trained Random Forest model and filter for high-
scoring recipes.
5. Use nearest-neighbor similarity matching to recommend the top recipes most aligned
with the user’s dietary profile.
This methodology ensures that the recommendations are personalized, nutritionally
appropriate, and interpretable, leveraging both statistical learning and similarity-based
techniques for optimal results.
To assess performance, the Mean Squared Error (MSE) metrics were calculated to
quantitatively assess the model's predictive performance
Duration Of Study:
Month 1 (July 7 – August 6, 2025)
● Week 1–2: Literature review, paper analysis, dataset acquisition
● Week 3–4: Data preprocessing, exploratory analysis, feature engineering
Month 2 (August 7 – September 6, 2025)
● Week 5–6: Model implementation (random forest training)
● Week 7–8: tuning and revaluation
Month 3 (September 7 – October 7, 2025)
● Week 9–10: Evaluation, visualization, residual analysis
● Week 11–12: Deployment, paper/report writing.
Gantt Chart
Gantt Chart – Gold Price Prediction Project Timeline
RESULTS AND FINDINGS
The models were evaluated on the test dataset, and their performance metrics were compiled
into a comparative table. Below are the screenshots of the confusion matrix and ROC curves
for each of the five models, which provide a visual representation of their performance:
Random Forest
R² Score: 0.9975
MAE: 0.4817
RMSE: 0.8034
MSE: 0.6455124180826782
Confusion Matrix:
After training and evaluation, the results were compiled into a comparative table:
Algorithm MAE RMSE MAPE R2-Score
Random Forest 603.42 630018.53 793.74% -1.2302
DISCUSSIONS
• The evaluation of the Random Forest model provides key insights into its
effectiveness for personalized diet recommendations.
• Random Forest Strengths: The Random Forest model proved highly suitable for
predicting recipe healthiness, demonstrating its ability to capture non-linear
relationships between multiple nutritional features and overall healthiness. Its
ensemble nature reduces overfitting compared to a single decision tree, providing
robust and reliable predictions across diverse recipes. The model’s predictions,
evaluated using Mean Squared Error (MSE), showed that it can effectively
differentiate healthier recipes from less suitable options based on user-specific
nutritional limits.
• Nearest-Neighbor Matching: The combination of Random Forest with nearest-
neighbor similarity matching allowed the system to not only predict healthiness but
also recommend recipes that closely match a user’s dietary profile. This similarity-
based filtering ensures that recommended recipes are aligned with user preferences
and ingredient choices, enhancing personalization beyond simple nutritional
thresholds.
• Practical Implications for Diet Recommendation: This project demonstrates that
an interpretable machine learning model like Random Forest can effectively support
personalized diet planning. By leveraging both predicted health scores and nutrient-
based similarity, the system offers actionable recommendations that users can trust.
The approach highlights the balance between predictive accuracy and practical
interpretability, making it a valuable tool for individuals and health-focused
applications.
• Limitations and Future Work: While effective, the model’s performance depends
on the completeness and accuracy of the nutritional data. Recipes with missing or
inconsistent nutrient values may affect recommendation quality. Future
enhancements could include integrating more sophisticated models, such as gradient
boosting or deep learning, incorporating user feedback loops, and expanding the
dataset to include a wider variety of ingredients and cuisines for more personalized
and diverse recommendations.
CONCLUSION
This project successfully developed and validated a Random Forest-based diet
recommendation system, demonstrating its effectiveness in providing personalized recipe
suggestions based on user-specific nutritional profiles. By leveraging key nutritional
features such as calories, protein, fat, carbohydrates, fiber, sugar, and other essential
nutrients, the model accurately predicts the healthiness of recipes, enabling the system to
filter and recommend options that align with individual dietary goals.
The key takeaway of this work is the importance of combining machine learning
prediction with similarity-based recommendation to balance predictive accuracy and
personalization. The Random Forest model proved robust in handling the non-linear
relationships between various nutrients, while the nearest-neighbor approach ensured that
recommendations closely match user preferences and ingredient choices. This combination
results in a system that is both practical and interpretable, offering actionable guidance for
users aiming to achieve weight loss, gain, or maintenance goals.
For future work, the system could be further enhanced by incorporating additional factors
such as user feedback, dietary restrictions, cuisine preferences, or more advanced ensemble
methods. Expanding the dataset to include a wider variety of recipes and integrating
adaptive learning based on user interactions could improve personalization and the overall
effectiveness of the recommendations.
REFERENCES
1. Chibber, Y., Betala, D., & Srividhya, S. (2023). Diet Recommendation System
Using Machine Learning. In International Conference on Recent Trends in Data
Science and its Applications. River Publishers. River Publishers
2. Nagati, S., Chaitanya, J., Rani, B. S., & Rohan, R. (2024). Diet Recommendation
Systems Using Machine Learning: Approaches, Challenges, and Future Directions.
International Research Journal of Education and Technology, 6(12), 1872–1880.
ResearchGate
3. Ajami, A., & Teimourpour, B. (2023). A Food Recommender System in Academic
Environments Based on Machine Learning Models. arXiv. arXiv
4. Roy, M., Das, S., & Protity, A. T. (2023). OBESEYE: Interpretable Diet
Recommender for Obesity Management using Machine Learning and Explainable
AI. arXiv. arXiv
5. Garg, S., & Pundir, P. (2021). MOFit: A Framework to Reduce Obesity using
Machine Learning and IoT. arXiv. arXiv