This repository contains the code and resources to reproduce the sustainability-aware food recommendation system presented in our research. The system combines large language models (LLMs) for ingredient labeling with knowledge graph-enhanced recommendation algorithms to provide environmentally conscious food recommendations.
The complete dataset and pre-processed files are available on Zenodo:
The Zenodo release includes:
pp_recipes_with_cf_wf.csv: HUMMUS dataset augmented with Carbon Footprint (CF) and Water Footprint (WF) values aggregated at the recipe levelgreenfoodlens_mturk_labels.csv: Ground truth and LLM-generated labels for ingredient taxonomy classificationlabeled_ingredients_Llama-3.1-Nemotron-70B-Instruct-HF-Q4_K_M.csv: LLM-generated ingredient labels using Llama 3.1 Nemotron 70B modellabeled_ingredients_Athene-V2-Chat-Q4_K_M.csv: LLM-generated ingredient labels using Athene V2 Chat modelrevised_su-eatable-life_cf_wf.csv: Revised SU-EATABLE-LIFE food taxonomy with CF and WF values for each taxonomy path (not only food items)
To streamline your workflow, we recommend downloading the pre-processed data from Zenodo to avoid lengthy preprocessing steps.
Some files referenced in the pipeline are not included in this repository or Zenodo and need to be downloaded separately:
pp_recipes.csv: Original HUMMUS recipe dataset- Download from: HUMMUS GitLab Repository
The SU-EATABLE-LIFE database is provided as an Excel file, which contains the food taxonomy and associated Carbon Footprint (CF) and Water Footprint (WF) values. Two sheets must be exported as tab-separated CSV (a.k.a. TSV) files, CF for users and WF for users, to be used in the pipeline. They should be renamed as follows:
SuEatableLife_Food_Fooprint_database_CF.csv: Tab-separated export of "CF for users" sheetSuEatableLife_Food_Fooprint_database_WF.csv: Tab-separated export of "WF for users" sheet
- Llama 3.1 Nemotron 70B Q4_K_M GGUF model: Q4_K_M quantizied version provided by bartowski, available at https://huggingface.co/bartowski/Llama-3.1-Nemotron-70B-Instruct-HF-GGUF
- Athene V2 Chat Q4_K_M GGUF model: Q4_K_M quantizied version provided by bartowski, available at https://huggingface.co/bartowski/Athene-V2-Chat-GGUF
With the downloaded files from Zenodo and required external files, the repository and data should be structured as follows:
PHASEIngredientLabeling/
βββ zenodo_data/
β βββ greenfoodlens_mturk_labels.csv
β βββ labeled_ingredients_Llama-3.1-Nemotron-70B-Instruct-HF-Q4_K_M.csv
β βββ labeled_ingredients_Athene-V2-Chat-Q4_K_M.csv
β βββ pp_recipes_with_cf_wf.csv
β βββ revised_su-eatable-life_cf_wf.csv
βββ src/
β βββ llama_cpp_grammar_ingredient_labeling.py # LLM labeling script
β βββ evaluate_llm_labeling.py # Label evaluation
β βββ labeling_analysis.ipynb # Analysis notebook
β βββ semantic_matching_eda.py # Semantic baseline
β βββ prompt_templates_guidance.py # LLM prompts
β βββ utils.py # Utility functions
βββ test_model_sustainability.py # Sustainability analysis
βββ experiment_config.yaml # RecBole configuration
βββ revised_su-eatable-life_taxonomy.json # Food taxonomy
βββ revised_su_eatable_life.pdf # Taxonomy visualization
βββ ingredient_food_kg_names.csv # Unique Food KG ingredients
βββ CSV_cfp_wfp_ingredients_2.0.csv # CF and WF for each taxonomy food item (not path)
βββ SuEatableLife_Food_Fooprint_database_CF.csv # CF values from SU-EATABLE-LIFE
βββ SuEatableLife_Food_Fooprint_database_WF.csv # WF values from SU-EATABLE-LIFE
βββ pyproject.toml # Project dependencies
GGUF files should be placed in a directory of your choice, and their paths should be specified in the scripts when running the LLM inference.
- Python 3.8 or higher
- uv package manager (recommended) or pip
- For LLM Inference: GPU with at least 40GB VRAM recommended
- Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh- Clone the repository:
git clone https://github.com/yourusername/PHASEIngredientLabeling.git
cd PHASEIngredientLabeling- Create a virtual environment and install dependencies:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv syncLLM Inference Issues:
- Out of memory errors: Reduce
--context_lenor use a smaller model - Slow inference: Ensure GPU support is properly configured for llama-cpp-python
Missing Files:
- Check that all required external files are downloaded and placed in the correct directories
- Verify file paths in configuration files match your actual file locations
- Use
--eval_batch_sizeparameter to balance memory usage and speed - Consider using more quantized models (Q4_K_S) for faster inference with minimal quality loss
Follow these steps to reproduce the complete pipeline from ingredient labeling to sustainability-aware recommendation:
Generate taxonomy labels for ingredients using constrained LLM generation:
python src/llama_cpp_grammar_ingredient_labeling.py \
/path/to/your/model.gguf \
v1 \
--truth_labels_file zenodo_data/greenfoodlens_mturk_labels.csv \
--context_len 12000 \
--temperature 0.0 \
--validation_split 0.5Arguments:
gguf_path: Path to the LLM GGUF model fileversion_tag: Version identifier (format: vX, where X is an integer)--truth_labels_file: Path to ground truth labels file (default: zenodo_data/greenfoodlens_mturk_labels.csv)--context_len: Model context length (default: 0 for auto)--temperature: Sampling temperature (default: 0.0 for deterministic output)--top-p: Top-p sampling parameter (default: 1.0)--top-k: Top-k sampling parameter (default: 1.0)--split_grammar_chars: Split grammar choices into individual characters (default: False)--use_all_ingredients: Label all ingredients instead of just validation/test splits--validation_split: Fraction for validation split (default: 0.5)--gpu_id: GPU ID to use for inference
This script uses the revised_su-eatable-life_taxonomy.json to generate constrained grammars that ensure LLM outputs conform to valid taxonomy paths.
Evaluate the quality of generated labels against ground truth:
python src/evaluate_llm_labeling.py \
labeled_ingredients_model1.csv labeled_ingredients_model2.csv \
--truth_labels_file zenodo_data/greenfoodlens_mturk_labels.csvArguments:
llm_labeled_ingredients_files: One or more paths to LLM-generated label files (e.g., zenodo_data/labeled_ingredients_Athene-V2-Chat-Q4_K_M.csv)--truth_labels_file: Path to ground truth labels (default: zenodo_data/greenfoodlens_mturk_labels.csv)
This script computes accuracy metrics including:
- Perfect matches
- Hierarchical matches with different levels of granularity
- Head-level and tail-cut matching strategies
Open and run the Jupyter notebook for comprehensive analysis, replicate paper figures, and generate final files (e.g., recipes_with_cf_wf.csv):
jupyter notebook src/labeling_analysis.ipynbThis notebook relies on several files:
- ground truth labels (
greenfoodlens_mturk_labels.csv) - LLM-generated labels (e.g.,
labeled_ingredients_Athene-V2-Chat-Q4_K_M.csv) - HUMMUS recipes
pp_recipes.csv, which can be downloaded from HUMMUS repository - other files deriving from light transformations of our revised taxonomy and the SU-EATABLE-LIFE Excel database:
CSV_cfp_wfp_ingredients_2.0.csv(included in this repo): CF and WF values for each food item (last level) of the revised taxonomySuEatableLife_Food_Fooprint_database_CF.csv: tab-separated export of the "CF for users" sheet of the SU-EATABLE-LIFE Excel databaseSuEatableLife_Food_Fooprint_database_WF.csv: the tab-separated export of the "WF for users" sheet of the SU-EATABLE-LIFE Excel database
Before running sustainability analysis, train recommendation models using RecBole.
The HUMMUS dataset with KG prepared for RecBole is available as a zip archive on Google Drive. Extract it to your working directory, which will create a recbole_data folder containing the hummus folder with the dataset files.
The models must be trained with the configuration specified in experiment_config.yaml, which includes the data_path pointing to the recbole_data/ directory, which Recbole automatically connects with the dataset name to find the dataset files.
Ensure you have RecBole installed and configured. You can install it via pip:
# Example training command (adjust based on your RecBole setup)
uv run run_recbole.py --model=KGAT --dataset=hummus --config_files=experiment_config.yaml
uv run run_recbole.py --model=MultiVAE --dataset=hummus --config_files=experiment_config.yamlFor other information on training RecBole models, refer to the RecBole documentation.
Analyze the sustainability performance of trained recommendation models:
python test_model_sustainability.py \
/path/to/model1.pth /path/to/model2.pth \
--recipes_with_cf_wf zenodo_data/recipes_with_cf_wf.csv \
--plots_path plots \
--eval_batch_size 50000 \
--CF_WF_per_serving_sizeArguments:
model_files: Paths to pre-trained RecBole model files (.pth)--recipes_with_cf_wf: Path to recipes with CF/WF data (default: zenodo_data/recipes_with_cf_wf.csv)--plots_path: Directory for saving plots (default: plots)--gpu_id: GPU ID for evaluation (default: "0")--eval_batch_size: Batch size for evaluation (default: 50,000)--skip_eval: Skip model evaluation if results exist--CF_WF_per_serving_size: Calculate CF/WF per serving size instead of per kg (default: False)
This script generates:
- Sustainability heatmaps showing CF/WF across recommendation positions
- Joint plots comparing different models' sustainability profiles
Reproduce the semantic matching baseline analysis (referenced in paper):
python src/semantic_matching_eda.pyThis script demonstrates the limitations of semantic similarity approaches for ingredient taxonomy matching, showing why structured LLM-based labeling is superior. Requires revised_food_taxonomy.json and ingredient_food_kg_names.csv for unique food KG ingredient names.
experiment_config.yaml: RecBole configuration for training and evaluating recommendation models. Includes custom metrics (Novelty) that extend the standard RecBole framework.revised_su-eatable-life_taxonomy.json: Hierarchical food taxonomy used for ingredient labeling, revised and validated for the sustainability domain.revised_su_eatable_life.pdf: Human-readable visualization of the taxonomy hierarchy.
Core dependencies include:
- polars: Fast DataFrame operations
- llama-cpp-python: LLM inference with grammar constraints
- sentence-transformers: Semantic similarity baseline
- recbole: Recommendation system framework
- torch: Deep learning backend
- matplotlib/seaborn: Visualization
To installa llama-cpp-python with GPU support, please follow the instructions in the llama-cpp-python documentation.
See pyproject.toml for complete dependency list.
π Citation
If you use this code or dataset in your research, please cite our paper:
@article{greenfoodlens_recsys2025,
title={GreenFoodLens: Sustainability-Aware Food Recommendation with LLM-Based Ingredient Labeling},
author={Giacomo Balloccu and Ludovico Boratto and Gianni Fenu and Mirko Marras and Giacomo Medda and Giovanni Murgia},
booktitle={Proceedings of the 19th {ACM} Conference on Recommender Systems, RecSys 2025, Praga, Czech Republic, September 22-26, 2025},
year={2025}
}All the models are trained for 100 epochs with early stopping on the validation set on NDCG@10, with a patience of 10 epochs. We optimized the hyperparameters based on the grid search tables suggested by Recbole for the models we employed, which are reported in Recbole Hyper-parameters Search Results. Specifically, we used the grid reported for MovieLens-1M, which does not include DiffRec. For this model, we adopted a smaller set of the hyper-parameters proposed in the DiffRec paper. The full grid is reported here for reference:
| Model | Hyperparameter | Values |
|---|---|---|
| Pop | - | - |
| BPR | learning_rate | [5e-5,1e-4,5e-4,7e-4,1e-3,5e-3,7e-3] |
| DiffRec | embedding_size | [10] |
| dims_dnn | ['[300]','[200,600]','[1000]'] | |
| learning_rate | [1e-5,1e-4,1e-3,1e-2] | |
| steps | [2,5,10,40,50,100] | |
| LightGCN | n_layers | [1,2,3,4] |
| learning_rate | [5e-4,1e-3,2e-3] | |
| reg_weight | [1e-5,1e-4,1e-3,1e-2] | |
| KGAT | layers | ['[64,32,16]','[64,64,64]','[128,64,32]'] |
| mess_dropout | [0.1,0.2,0.3,0.4,0.5] | |
| learning_rate | [1e-2,5e-3,1e-3,5e-4,1e-4] | |
| reg_weight | [1e-4,5e-5,1e-5,5e-6,1e-6] | |
| MultiVAE | learning_rate | [5e-5,1e-4,5e-4,7e-4,1e-3,5e-3,7e-3] |
CF and WF scatterplot of test interactions (from the test set) and top-10 recommendations (density marginal distributions on the sides)
This project is licensed under the MIT License - see the LICENSE file for details.
This project was supported by the project PHaSE - Promoting Healthy and Sustainable Eating through Interactive and Explainable AI Methods, funded by MUR under the PRIN 2022 program (CUP H53D23003530006).




