See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Introduction

This is the official repository for the paper "See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning".

Code and models will be released soon.

Motivation

To mitigate the perceptual bottleneck in VLMs, recent approaches often rely on external tools or explicit intermediate visual cues (e.g., generated masks, bounding boxes, or latent tokens) during inference. However, these paradigms face three critical limitations:

Shape Rigidity: Coarse boxes or masks fail to capture irregular, fine-grained evidence (e.g., thin polylines or specific intersections in charts).
Limited Generalization: Task-specific tools generalize poorly across diverse domains.
Inference Overhead: Multi-step visual reasoning increases computation costs and latency. BiPS takes a different route. Instead of using visual cues as inference-time crutches, we transform them into training signals to internalize perception.

Method: Bi-directional Perceptual Shaping

BiPS shapes the model's internal policy through a two-stage curriculum using programmatically generated views via chart code editing:

Consistency Stage: Minimizes divergence between the original image and an Evidence-Preserving View, teaching the model to focus on complete, supporting visual details.
Separation Stage: Maximizes divergence from an Evidence-Ablated View, penalizing the model for relying on text-only shortcuts when visual evidence is missing.

By strictly enforcing these constraints during training, BiPS achieves fine-grained visual grounding without any additional inference cost. Across 8 benchmarks, it boosts Qwen2.5-VL-7B by an average of 8.2%, demonstrating strong cross-domain generalization.

📝 Citation

If you find this work helpful in your research, please cite our paper:

@article{zhang2025bips,
  title={See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning},
  author={Zhang, Shuoshuo and Zhang, Yizhen and Fu, Jingjing and Song, Lei and Bian, Jiang and Yang, Yujiu and Wang, Rui},
  journal={arXiv preprint arXiv:2512.22120},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Introduction

Motivation

Method: Bi-directional Perceptual Shaping

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

zss02/BiPS

Folders and files

Latest commit

History

Repository files navigation

See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning

Introduction

Motivation

Method: Bi-directional Perceptual Shaping

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages