This repository is the official implementation of our paper: CARE: Aligning Language Models for Regional Cultural Awareness.
You can download CARE resource in: https://huggingface.co/datasets/geyang627/CARE, which includes multilingual responses with human preferences on culture-specific questions.
Specifically, the question field is the culture-specific question, the response field contains responses generated by LLM (e.g. gpt-4o) or written by human, the culture_type field contains cultural context category, associated_culture field contains the associated culture, and rating field contains the rating given by human on a 10-point scale, For a detailed description of the construction process of CARE, please refer to our paper.
We have released the culturally aligned models using CARE in: CARE_collection. You can use them directly as below.
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import torch
model = LLM(model="geyang627/care-chinese-qwen2.5-7b", tensor_parallel_size=torch.cuda.device_count(), dtype="auto", trust_remote_code=True, max_model_len=2048)
tokenizer = AutoTokenizer.from_pretrained("geyang627/care-chinese-qwen2.5-7b", use_fast=False, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
sampling_params = SamplingParams(temperature=0.7, top_p=1.0, max_tokens=256)
outputs = model.generate(["为什么中国人不喜欢数字4?"], sampling_params)
print(outputs[0].outputs[0].text)
To evaluate model's cultural awareness with CARE, you can assess our test set in geyang627/CARE-eval and use our prompt in the directory prompts.
Sony team is not evolved in the process of Chinese and Arabic data collection. Please cite the following paper if you find our code or data helpful.
@article{guo2025care,
title={CARE: Aligning Language Models for Regional Cultural Awareness},
author={Guo, Geyang and Naous, Tarek and Wakaki, Hiromi and Nishimura, Yukiko and Mitsufuji, Yuki and Ritter, Alan and Xu, Wei},
journal={arXiv preprint arXiv:2504.05154},
year={2025}
}
