🔥ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

Minghua He¹*, Fangkai Yang², Pu Zhao², Wenjie Yin³, Yu Kang², Qingwei Lin², Saravan Rajmohan², Dongmei Zhang², Qi Zhang²

¹Peking University, ²Microsoft, ³KTH Royal Institute of Technology

^*Work is done during an internship at Microsoft.

📝 Project Structure

├─checkpoint              # Saved models
├─data                    # IFT data
├─evaluation              # Code Translation Evaluation 
├─exe_repr_generation        
|  ├─lang_processors      # Programming Language Processors 
|  ├─parser               # Programming Language Parsers 
|  ├─ast_tools.py         # Processing Syntactic-structure Representation 
|  ├─dataflow_tools.py    # Processing Variable-dependency Representation
|  └─deduplication.py     # Deduplication data
|  └─XLCoST_preprocess.py # Processing XLCoST
├─src                     # Run SFT
└─tools                   # JDK for Evaluation
└─TransCoder-test-X.zip   # Enhanced Benchmark

⚙️ Environment

Key Packages:

datasets==2.18.0

fire==0.6.0

gradio==4.39.0

numpy==1.26.4

openai==0.8.0

pandas==2.2.2

torch==2.2.1

tqdm==4.64.1

transformers==4.42.4

tree_sitter==0.21.0

tree_sitter_go==0.21.0

tree_sitter_c_sharp==0.21.0

tree_sitter_java==0.21.0

tree_sitter_javascript==0.21.0

tree_sitter_php==0.22.4

tree_sitter_python==0.21.0

vllm==0.4.1

openpyxl==3.1.5

deepspeed==0.14.2

accelerate==1.0.1

tensorboardX

📜 Preparation

You need to follow these steps to completely run ExeCoder.

Step 1: Download XLCoST and put it under data folder.
Step 2: Download deepseek-coder-6.7b-instruct and put it under checkpoint folder.
Step 3: Download jdk-10.0.2 and put it under tools folder.
Step 4: Prepare the dependencies in Environment.

🚀 Quick Start

you can run ExeCoder with this code:

Preprocess XLCoST dataset to XLCoST-Instruct.

python exe_repr_generation/XLCoST_preprocess.py

Instruction Tuning for Learning Executability Representation.

sh train.sh

Inference.

sh inference.sh

Evaluation.

sh evaluation.sh

📝 Citation and Reference

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@misc{he2025execoderempoweringlargelanguage,
      title={ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation}, 
      author={Minghua He and Fangkai Yang and Pu Zhao and Wenjie Yin and Yu Kang and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang and Qi Zhang},
      year={2025},
      eprint={2501.18460},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2501.18460}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
checkpoint		checkpoint
data/testset		data/testset
evaluation		evaluation
exe_repr_generation		exe_repr_generation
image		image
src		src
tools		tools
tree-sitter		tree-sitter
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
TransCoder-test-X.zip		TransCoder-test-X.zip
evaluation.sh		evaluation.sh
inference.sh		inference.sh
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔥ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

📝 Project Structure

⚙️ Environment

📜 Preparation

🚀 Quick Start

📝 Citation and Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

microsoft/ExeCoder

Folders and files

Latest commit

History

Repository files navigation

🔥ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation

📝 Project Structure

⚙️ Environment

📜 Preparation

🚀 Quick Start

📝 Citation and Reference

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages