Skip to content
/ CARE Public

This is the official implementation for the paper: "CARE: Aligning Language Models for Regional Cultural Awareness"

License

Notifications You must be signed in to change notification settings

Guochry/CARE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CARE CARE Banner

This repository is the official implementation of our paper: CARE: Aligning Language Models for Regional Cultural Awareness.

CARE Resource

You can download CARE resource in: https://huggingface.co/datasets/geyang627/CARE, which includes multilingual responses with human preferences on culture-specific questions.

Specifically, the question field is the culture-specific question, the response field contains responses generated by LLM (e.g. gpt-4o) or written by human, the culture_type field contains cultural context category, associated_culture field contains the associated culture, and rating field contains the rating given by human on a 10-point scale, For a detailed description of the construction process of CARE, please refer to our paper.

Culturally Aligned Models

We have released the culturally aligned models using CARE in: CARE_collection. You can use them directly as below.

from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import torch

model = LLM(model="geyang627/care-chinese-qwen2.5-7b", tensor_parallel_size=torch.cuda.device_count(), dtype="auto", trust_remote_code=True, max_model_len=2048)

tokenizer = AutoTokenizer.from_pretrained("geyang627/care-chinese-qwen2.5-7b", use_fast=False, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

sampling_params = SamplingParams(temperature=0.7, top_p=1.0, max_tokens=256)
outputs = model.generate(["为什么中国人不喜欢数字4?"], sampling_params)
print(outputs[0].outputs[0].text)

Evaluation

To evaluate model's cultural awareness with CARE, you can assess our test set in geyang627/CARE-eval and use our prompt in the directory prompts.

Acknowledgment

Sony team is not evolved in the process of Chinese and Arabic data collection. Please cite the following paper if you find our code or data helpful.

@article{guo2025care,
  title={CARE: Aligning Language Models for Regional Cultural Awareness},
  author={Guo, Geyang and Naous, Tarek and Wakaki, Hiromi and Nishimura, Yukiko and Mitsufuji, Yuki and Ritter, Alan and Xu, Wei},
  journal={arXiv preprint arXiv:2504.05154},
  year={2025}
}

About

This is the official implementation for the paper: "CARE: Aligning Language Models for Regional Cultural Awareness"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published