Visual Programmability: Code-as-Thought for Chart Understanding

Paper: Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

Chart understanding presents a critical test to the reasoning capabilities of Vision-Language Models (VLMs). Prior approaches face critical limitations: some rely on external tools, making them brittle and constrained by a predefined toolkit, while others fine-tune specialist models that often adopt a single reasoning strategy, such as text-based chain-of-thought (CoT). The intermediate steps of text-based reasoning are difficult to verify, which complicates the use of reinforcement-learning signals that reward factual accuracy. To address this, we propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format. Our key insight is that this strategy must be adaptive: a fixed, code-only implementation consistently fails on complex charts where symbolic representation is unsuitable. This finding leads us to introduce Visual Programmability: a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis. We implement this concept in an adaptive framework where a VLM learns to choose between the CaT pathway and a direct visual reasoning pathway. The selection policy of the model is trained with reinforcement learning using a novel dual-reward system. This system combines a data-accuracy reward to ground the model in facts and prevent numerical hallucination, with a decision reward that teaches the model when to use each strategy, preventing it from defaulting to a single reasoning mode. Experiments demonstrate strong and robust performance across diverse chart-understanding benchmarks. Our work shows that VLMs can be taught not only to reason but also how to reason, dynamically selecting the optimal reasoning pathway for each task.

Environment Setup

Create and activate a clean conda environment, then install the required dependencies:

conda create -n cat python=3.10 -y
conda activate cat
pip install -r requirements.txt

Dataset Preparation

Datasets should be in Hugging Face Parquet format with the following required fields:

images: list of images as bytes dictionaries, e.g. [{"bytes": ...}]
prompt: text prompt (include <image> token when an image is present)
ground_truth: target answer string (some reward functions expect specific tags like <answer>...</answer>, <csv>...</csv>, <programability>yes|no</programability>)

We provide conversion scripts in my_dataset/ for popular chart understanding datasets (ChartBench/ChartQA/CharXiv). Simply edit the script constants to point to your local raw data directory and run the script to generate benchmark_*.parquet files.

Training

To train the model, configure and run the provided training script:

bash examples/qwen2_5vl_7b.sh

Important Configuration:

Configure these variables in the script according to your setup: MODEL_PATH, TRAIN_DATA, VAL_DATA, EXPERIMENT_NAME, FORMAT_PROMPT, REWARD_FUNCTION, NUM_GPUS, and optionally TENSORBOARD_DIR
The script uses python -m verl.trainer.main with decision prompt and decision reward by default. Modify parameters as needed for your specific requirements.

Evaluation

To evaluate the trained model, configure and run the validation script:

bash examples/val_sh/val_chartbench.sh

Configuration Requirements:

Set the following variables: MODEL_PATH, TRAIN_DATA, VAL_DATA, FORMAT_PROMPT, REWARD_FUNCTION, NUM_GPUS, and VAL_OUTPUT_FILE
This script runs in validation-only mode (trainer.val_only=true) and outputs detailed generations and evaluation metrics.

Models & Datasets

Model: Qwen2_5vl_7b_decision_CaT
Dataset: Decision_CaT

Repository Structure

examples/format_prompt/: Jinja2 template prompts for code generation, chain-of-thought, and decision making
examples/reward_function/: reward functions corresponding to different prompt templates
examples/config.yaml: default training configuration
examples/qwen2_5vl_7b.sh: training script example for Qwen2.5-VL-7B model
examples/val_sh/val_chartbench.sh: validation script example for ChartBench evaluation
my_dataset/: data conversion scripts to transform raw datasets into Parquet format
scripts/model_merger.py: utility to merge FSDP model shards and export Hugging Face compatible weights
verl/: core training framework integrating Ray, FSDP, and vLLM
requirements.txt: Python package dependencies

Citation

If you find this work useful for your research, please cite our paper:

@misc{tang2025visualprogrammabilityguidecodeasthought,
      title={Visual Programmability: A Guide for Code-as-Thought in Chart Understanding}, 
      author={Bohao Tang and Yan Ma and Fei Zhang and Jiadi Su and Ethan Chern and Zhulin Hu and Zhixin Wang and Pengfei Liu and Ya Zhang},
      year={2025},
      eprint={2509.09286},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.09286}, 
}

Acknowledgements

This work is built upon the EasyR1 training framework, which provides the efficient and scalable RL training infrastructure.
We gratefully acknowledge the open-source communities and contributors of HuggingFace Transformers, vLLM, Ray, FlashAttention, and Qwen2.5-VL for making this research possible.

License

This project is licensed under the Apache-2.0 License. See individual file headers for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Programmability: Code-as-Thought for Chart Understanding

Environment Setup

Dataset Preparation

Training

Evaluation

Models & Datasets

Repository Structure

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
figure		figure
my_dataset		my_dataset
scripts		scripts
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Visual Programmability: Code-as-Thought for Chart Understanding

Environment Setup

Dataset Preparation

Training

Evaluation

Models & Datasets

Repository Structure

Citation

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages