-
Notifications
You must be signed in to change notification settings - Fork 42
librarian: generation takes ages looking for locally-changed files #2557
Description
I've created a PR to migrate another 50 modules to the Go repo, giving a total of just over 100 libraries for Librarian to manage.
I then ran "librarian generate" to check that no changes were generated.
The actual generation was quick, but then it took ages (about 25 minutes) to generate the PR body. Interestingly, the most recent Go generation on GCB doesn't show this - it still takes a few minutes to find the relevant commit messages, but not nearly as long. I'm not sure why yet - but it could be because GCB is committing, and "what files were changed in the most recent commit" may be a lot faster than "what files are dirty".
I believe the problem is that we're asking Git which files have changed in the repo (without reference to the library) for each library. So I've got loads of log entries like this:
time=2025-10-14T07:03:25.054Z level=INFO msg="Getting changed files"
time=2025-10-14T07:03:38.913Z level=INFO msg="Getting changed files"
time=2025-10-14T07:03:52.803Z level=INFO msg="Getting changed files"
time=2025-10-14T07:04:06.937Z level=INFO msg="Getting changed files"
Additionally:
- we look through commits in the googleapis repo even if there are no changes in the generated source.
- we perform the same check for "are there any changes in this library" in
shouldIncludeForGenerationfor every commit ingoogleapis... because we don't know which changes were associated with which googleapis commit.
I suspect this could be refactored quite a lot:
- Step 1: only fetch the list of changed files in the language repo once (in
formatPrBody) and then pass the same result into each call togetConventionalCommitsSinceLastGeneration - Step 2: move the filtering by "only libraries that have changes" to before we do anything with googleapis.
I'll take a stab at this briefly, but may need to hand it off to someone else. I'm going to mark this as a bug because it's going to become unusably slow as the number of libraries increases, I think.