This branch provides code for experiments with OpenAI models for commit message generation task.
- π Python
This project provides dependencies for two Python dependency managers:
- Poetry:
poetry.lock,pyproject.toml - pip:
requirements.txt(obtained throughpoetry export --with dev --output requirements.txt)
π§ At some point, we plan to publish our dataset of commits. Until then, and if you wish to utilize this project for other data, refer to this section.
π The data for this project was obtained via commits_dataset repo.
π click here for more information on required data format
This project expects input to be stored in a JSONLines format:
βββ ... # data directory
β βββ <input_file>.jsonl
βββ ...
In our case, each input example is commit. Specifically, the following keys are expected in each row:
message: Commit message.mods: A list of modification made in a commit. Each modification should contain the following keys:change_type: Type of modification (string, one ofMODIFY,ADD,DELETE,RENAME,COPY,UNKNOWN).old_path: Path to file before the commit (Nonewhenchange_typeisADD).new_path: Path to file after the commit (Nonewhenchange_typeisDELETE).diff: Output of thegit diffcommand for this specific file.
Define configuration for evaluation at conf/openai_config.py.
Note that you have to define all the MISSING parameters. You can do it via CLI or just rewrite them. Below are the examples how to define parameters via CLI.
To launch evaluation, run the following command:
python eval_openai.py ++model_id=XXX ++dataset.prompt_configuration=XXX
Define configuration for metrics computation at conf/metrics_config.py.
Note that you have to either provide local path to model predictions in preds_path or use W&B artifact and define the following parameters from ArtifactMetricsConfig: name, artifact_path. You can do it via CLI or just rewrite them. Below are the examples how to define parameters via CLI.
To launch metrics computation for local predictions:
python compute_metrics.py ++preds_path=XXX
To launch metrics computation for W&B artifact with predictions:
python compute_metrics.py ++logger.artifact_config.name=XXX ++logger.artifact_config.artifact_path=XXX