Skip to content

ignorancex/LLM_code

Repository files navigation

code_transformed:
The Influence of Large Language Models on Code

git-last-commit GitHub commit activity

Contents

Data Collection

GitHub Data

We collect a total of 19,898 GitHub repositories and 926,935 source code files, corresponding to arXiv papers from the first quarter of 2020 to the first quarter of 2025. Our arXiv dataset is organized across two GitHub repositories: Python files are in LLM_code/arxiv_dataset, and C/C++ code is in LLM_code/arxiv_dataset_cpp.

├── 2020                   // Year
    ├── Q1                 // Quarter
        ├── repo_name      // Repository name
            ├── xxx.py     // Project Python file
            ...
            ├── time_info.txt  // File creation/modification time information

Human-Written Code

We utilize Code4Bench, a multidimensional benchmark based on Codeforces data. This dataset contains user submissions on Codeforces before 2020, which were barely impacted by LLMs. We generate code using LLMs with various prompting strategies.

Naming Patterns

we categorize variable, function, and file names into several distinct formats (e.g. snake_case). The length of the names has also been considered.

[!IMPORTANT] > Finding 1: The coding style of human-written code may be influenced by LLMs: they may not only mirror existing norms but also subtly reshape them, gradually pushing human developers toward greater stylistic alignment with LLM-preferred conventions.

Complexity and Maintainability

Cyclomatic complexity is a metric used to measure the number of linearly independent paths in the code.

[!IMPORTANT] > Finding 2: For I/O algorithm problems, LLM-generated code tends to exhibit higher maintainability, lower difficulty, and fewer bugs than human-written solutions, which aligns with the evolution of Github code after 2023Q1. Moreover, the quality of reference-guided code is generally inferior to that of directly generated code.

Code Similarity

We compare three versions of each problem’s code: the original human-authored solution (AC), the LLM’s output given only the problem description (ANS), and the LLM’s output when additionally conditioned on the human solution (REF). We compute pairwise cosine and Jaccard similarities among AC, ANS, and REF.

[!IMPORTANT] > Finding 3: LLMs can effectively mimic human coding style when given reference code, but without such guidance, their generated solutions diverge significantly from human-written code—especially in IO algorithm tasks.

Labels in the Reasoning Process

To further refine our analysis, we individually examine the matching of reasoning and labels for each question.

Let $T$ denote the set of all labels. For each question $q$, let $A_q \subseteq T$ be the set of true labels in the question description, and let $R_q \subseteq T$ be the set of labels in the reasoning process.

We then define the $\mathrm{match}$ and $\mathrm{error}$ metrics as follows:

$$ \begin{align} \mathrm{match}(q) &= \mathbf{1}\left( A_q \cap R_q \ne \varnothing \right), \\ \mathrm{error}(q) &= \mathbf{1}\left( \left( T \setminus A_q \right) \cap R_q \ne \varnothing \right), \end{align} $$

where $\mathbf{1}(\cdot)$ is the indicator function: it returns 1 if the condition is met, and 0 otherwise.

[!IMPORTANT] > Finding 4: LLMs have low algorithm analysis capabilities, are more inclined to approach C/C++ code from an algorithmic perspective, and harder problems may better activate their algorithmic reasoning capabilities.

Citation

@article{xu2025code_transformed,
  title={code\_transformed: The Influence of Large Language Models on Code},
  author={Xu, Yuliang and Huang, Siming and Geng, Mingmeng and Wan, Yao and Shi, Xuanhua and Chen, Dongping},
  journal={arXiv preprint arXiv:2506.12014},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •