This repository contains code for paper "One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization".
We apply adapter tuning to code pre-trained models UniXcoder and CodeT5 for fine-tuning.
Please refer to the subdirectory of tasks for more details (code summarization and code search) !
The data statistics of original CodeSearchNet is shown in the table below. The dataset contains about 2 million pairs of function-documentation pairs and about another 4 million functions without an associated documentation.
| Programming Language | with documentation | All |
|---|---|---|
| Python | 503,502 | 1,156,085 |
| PHP | 717,313 | 977,821 |
| Go | 347,789 | 726,768 |
| Java | 542,991 | 1,569,889 |
| JavaScript | 157,988 | 1,857,835 |
| Ruby | 57,393 | 164,048 |
The dataset we use are futher cleaned by CodeXGLUE.
| Programming Language | Training | Dev | Test |
|---|---|---|---|
| Python | 251,820 | 13,914 | 14,918 |
| PHP | 241,241 | 12,982 | 14,014 |
| Go | 167,288 | 7,325 | 8,122 |
| Java | 164,923 | 5,183 | 10,955 |
| JavaScript | 58,025 | 3,885 | 3,291 |
| Ruby | 24,927 | 1,400 | 1,261 |
We also provide pre-trained models fine-tuned by our approach to verify the results.
Our implementation is adapted from CodeXGLUE, UniXcoder, and CodeT5 for the implementation of pre-trained models.