Commit message completion (and generation)

This branch provides code for experiments with OpenAI models for commit message generation task.

Requirements

🐍 Python

This project provides dependencies for two Python dependency managers:

Poetry: poetry.lock, pyproject.toml
pip: requirements.txt (obtained through poetry export --with dev --output requirements.txt)

Usage

Step 1: Prepare raw data

🚧 At some point, we plan to publish our dataset of commits. Until then, and if you wish to utilize this project for other data, refer to this section.

🌟 The data for this project was obtained via commits_dataset repo.

💛 click here for more information on required data format

This project expects input to be stored in a JSONLines format:

 ├── ...  # data directory
 │   ├── <input_file>.jsonl
 └── ...

In our case, each input example is commit. Specifically, the following keys are expected in each row:

message: Commit message.
mods: A list of modification made in a commit. Each modification should contain the following keys:
- change_type: Type of modification (string, one of MODIFY, ADD, DELETE, RENAME, COPY, UNKNOWN).
- old_path: Path to file before the commit (None when change_type is ADD).
- new_path: Path to file after the commit (None when change_type is DELETE).
- diff: Output of the git diff command for this specific file.

Step 2: Generate predictions

Define configuration for evaluation at conf/openai_config.py.

Note that you have to define all the MISSING parameters. You can do it via CLI or just rewrite them. Below are the examples how to define parameters via CLI.

To launch evaluation, run the following command:

python eval_openai.py ++model_id=XXX ++dataset.prompt_configuration=XXX

Step 3: Compute metrics

Define configuration for metrics computation at conf/metrics_config.py.

Note that you have to either provide local path to model predictions in preds_path or use W&B artifact and define the following parameters from ArtifactMetricsConfig: name, artifact_path. You can do it via CLI or just rewrite them. Below are the examples how to define parameters via CLI.

To launch metrics computation for local predictions:

python compute_metrics.py ++preds_path=XXX

To launch metrics computation for W&B artifact with predictions:

python compute_metrics.py ++logger.artifact_config.name=XXX ++logger.artifact_config.artifact_path=XXX

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
.github/workflows		.github/workflows
conf		conf
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compute_metrics.py		compute_metrics.py
eval_openai.py		eval_openai.py
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Commit message completion (and generation)

Requirements

Usage

Step 1: Prepare raw data

Step 2: Generate predictions

Step 3: Compute metrics

About

Uh oh!

Uh oh!

Languages

License

JetBrains-Research/commit_message_generation

Folders and files

Latest commit

History

Repository files navigation

Commit message completion (and generation)

Requirements

Usage

Step 1: Prepare raw data

Step 2: Generate predictions

Step 3: Compute metrics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages