Skip to content

Aphelios-Tang/Code-as-Thought

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Programmability: Code-as-Thought for Chart Understanding

Paper: Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

Code-as-Thought Framework

Chart understanding presents a critical test to the reasoning capabilities of Vision-Language Models (VLMs). Prior approaches face critical limitations: some rely on external tools, making them brittle and constrained by a predefined toolkit, while others fine-tune specialist models that often adopt a single reasoning strategy, such as text-based chain-of-thought (CoT). The intermediate steps of text-based reasoning are difficult to verify, which complicates the use of reinforcement-learning signals that reward factual accuracy. To address this, we propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format. Our key insight is that this strategy must be adaptive: a fixed, code-only implementation consistently fails on complex charts where symbolic representation is unsuitable. This finding leads us to introduce Visual Programmability: a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis. We implement this concept in an adaptive framework where a VLM learns to choose between the CaT pathway and a direct visual reasoning pathway. The selection policy of the model is trained with reinforcement learning using a novel dual-reward system. This system combines a data-accuracy reward to ground the model in facts and prevent numerical hallucination, with a decision reward that teaches the model when to use each strategy, preventing it from defaulting to a single reasoning mode. Experiments demonstrate strong and robust performance across diverse chart-understanding benchmarks. Our work shows that VLMs can be taught not only to reason but also how to reason, dynamically selecting the optimal reasoning pathway for each task.


Environment Setup

Create and activate a clean conda environment, then install the required dependencies:

conda create -n cat python=3.10 -y
conda activate cat
pip install -r requirements.txt

Dataset Preparation

Datasets should be in Hugging Face Parquet format with the following required fields:

  • images: list of images as bytes dictionaries, e.g. [{"bytes": ...}]
  • prompt: text prompt (include <image> token when an image is present)
  • ground_truth: target answer string (some reward functions expect specific tags like <answer>...</answer>, <csv>...</csv>, <programability>yes|no</programability>)

We provide conversion scripts in my_dataset/ for popular chart understanding datasets (ChartBench/ChartQA/CharXiv). Simply edit the script constants to point to your local raw data directory and run the script to generate benchmark_*.parquet files.


Training

To train the model, configure and run the provided training script:

bash examples/qwen2_5vl_7b.sh

Important Configuration:

  • Configure these variables in the script according to your setup: MODEL_PATH, TRAIN_DATA, VAL_DATA, EXPERIMENT_NAME, FORMAT_PROMPT, REWARD_FUNCTION, NUM_GPUS, and optionally TENSORBOARD_DIR
  • The script uses python -m verl.trainer.main with decision prompt and decision reward by default. Modify parameters as needed for your specific requirements.

Evaluation

To evaluate the trained model, configure and run the validation script:

bash examples/val_sh/val_chartbench.sh

Configuration Requirements:

  • Set the following variables: MODEL_PATH, TRAIN_DATA, VAL_DATA, FORMAT_PROMPT, REWARD_FUNCTION, NUM_GPUS, and VAL_OUTPUT_FILE
  • This script runs in validation-only mode (trainer.val_only=true) and outputs detailed generations and evaluation metrics.

Models & Datasets


Repository Structure

  • examples/format_prompt/: Jinja2 template prompts for code generation, chain-of-thought, and decision making
  • examples/reward_function/: reward functions corresponding to different prompt templates
  • examples/config.yaml: default training configuration
  • examples/qwen2_5vl_7b.sh: training script example for Qwen2.5-VL-7B model
  • examples/val_sh/val_chartbench.sh: validation script example for ChartBench evaluation
  • my_dataset/: data conversion scripts to transform raw datasets into Parquet format
  • scripts/model_merger.py: utility to merge FSDP model shards and export Hugging Face compatible weights
  • verl/: core training framework integrating Ray, FSDP, and vLLM
  • requirements.txt: Python package dependencies

Citation

If you find this work useful for your research, please cite our paper:

@misc{tang2025visualprogrammabilityguidecodeasthought,
      title={Visual Programmability: A Guide for Code-as-Thought in Chart Understanding}, 
      author={Bohao Tang and Yan Ma and Fei Zhang and Jiadi Su and Ethan Chern and Zhulin Hu and Zhixin Wang and Pengfei Liu and Ya Zhang},
      year={2025},
      eprint={2509.09286},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.09286}, 
}

Acknowledgements


License

This project is licensed under the Apache-2.0 License. See individual file headers for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages