Skip to content

feat(commits): semantic commit analysis with risk scoring (#32)#64

Merged
Siddhant-K-code merged 1 commit into
mainfrom
feat/32-semantic-commit-analysis
May 2, 2026
Merged

feat(commits): semantic commit analysis with risk scoring (#32)#64
Siddhant-K-code merged 1 commit into
mainfrom
feat/32-semantic-commit-analysis

Conversation

@Siddhant-K-code

Copy link
Copy Markdown
Owner

Closes #32

Summary

Adds pkg/commits — heuristic semantic analysis of git commit history. No LLM or external tools required.

Features

  • Classify — parses Conventional Commits format (type(scope)!: description), sets Type, Scope, Breaking
  • ScoreRisk — assigns RiskLow / RiskMedium / RiskHigh based on:
    • Breaking changes → always High (score +3)
    • Reverts → always High (score +3)
    • Large diffs (>500 lines) → +2, medium diffs (>200) → +1
    • Many files changed (>20) → +2, >10 → +1
    • Broad fix (fix type + >5 files) → +1
    • Risk keywords in message (hotfix, security, cve, regression…) → +1
  • FindSimilar — cosine similarity search over pre-computed embeddings, returns top-K above threshold
  • DetectPatterns — identifies repeated commit types and high-churn files (≥3 changes)
  • Summarize — aggregates risk signals across a set of similar commits into a RiskSummary

Risk thresholds

Score Level
≥ 3 High
≥ 1 Medium
0 Low

Files

  • pkg/commits/commits.go — all types and Analyzer implementation
  • pkg/commits/commits_test.go — 11 tests

@Siddhant-K-code Siddhant-K-code added enhancement New feature or request priority: high labels May 2, 2026
Implements issue #32. Provides heuristic analysis of git commit history
without requiring an LLM or external tools:

- Classify: parses Conventional Commits (type, scope, breaking flag)
- ScoreRisk: assigns RiskLow/Medium/High based on diff size, file count,
  breaking changes, reverts, and risk keywords in the message
- FindSimilar: cosine similarity search over pre-computed embeddings
- DetectPatterns: identifies repeated commit types and high-churn files
- Summarize: aggregates risk signals across a set of similar commits

Risk thresholds: score>=3 → High, score>=1 → Medium, else Low.
Breaking changes and reverts always score High.

Co-authored-by: Ona <[email protected]>
@Siddhant-K-code Siddhant-K-code force-pushed the feat/32-semantic-commit-analysis branch from dc5e4ed to 08a0b93 Compare May 2, 2026 14:30
@Siddhant-K-code Siddhant-K-code merged commit c4455b8 into main May 2, 2026
@Siddhant-K-code Siddhant-K-code deleted the feat/32-semantic-commit-analysis branch May 2, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request priority: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Product] Semantic commit analysis - find similar past changes and predict incidents

1 participant