Incremental updates for OsvDownloadTask by jonbally · Pull Request #5537 · DependencyTrack/dependency-track

jonbally · 2025-11-14T12:45:55Z

Description

In it's current implementation the OsvDownloadTask will download and process all existing advisory files for each selected ecosystem on every run. Depending on the selected ecosystems, this can lead to a big daily resource demand (e.g. mirroring the entire collection of OSV Ubuntu advisories takes more than one hour on our instance).

This PR adds changes to the OsvDownloadTask to support incremental updates for the selected OSV ecosystems.
I also adapted existing tests and included new ones for full and incremental updates.

A full mirror for the selected ecosystems will still be performed every 5 days (similar to the NistMirrorTask) to stay in sync with the OSV database. In certain cases some changes might be missed in the incremental update, e.g. when the modified_id.csv file contains a lot of new or modified ids, sometimes more than 40k, the incremental update will cut off at 10k.

Addressed Issue

Enhancement for OsvDownloadTask

Added support for incremental updates
Added retry mechanism for downloads and requests
Added notifications
Added time tracking

Additional Details

The incremental updates are utilizing the storage.googleapis.com/osv-vulnerabilities/<ecosystem>/modified_id.csv file.
It contains the IDs of recently added and modified advisories as well as the timestamp of the addition/modification.

The implementation is conceptually similar to the NistMirrorTask:

When running the task, it first checks if certain files exist in the osv directory for the respective ecosystem
- an ecosystem all.zip (renamed),
- a .zip.ts timestamp file for the all.zip file, and
- a .csv.ts timestamp file for the modified_id.csv (renamed) file
If these files exist, are non-empty, and the "full update" timestamp is within 5 days before now, then an incremental update is performed using the timestamp in the .csv.ts file for selecting relevant entries of the .csv
If any condition fails, then a full update is performed

I did not modify the parts of the OsvDownloadTask that perform the database update.
At the moment each advisory is still individually going through these steps, causing many DB reads and writes:

Full mirror:

Unzip entry -> Parse JSON -> Parse OSV -> map to DT vulnerability model -> Check if new or modified -> Update DB

Incremental mirror:

Request JSON -> Parse JSON -> Parse OSV -> map to DT vulnerability model -> Check if new or modified -> Update DB

I think this could be optimized a lot by processing the advisories in batches.
An alternative to the current implementation, which I saw discussed somewhere recently, would be to implement an OSV analyzer using the OSV API instead of mirroring all of the advisories and performing the matching in DT.

Some points about this PR which might be worth discussing:

The size limit of 1 GiB which I added for the acquired all.zip files
The hard-coded cadence of 5 days for the full mirror
The cutoff after 10k modified advisories for incremental updates

Checklist

I have read and understand the contributing guidelines
This PR fixes a defect, and I have provided tests to verify that the fix is effective
This PR implements an enhancement, and I have provided tests to verify that it works as intended
This PR introduces changes to the database model, and I have added corresponding update logic
This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

Implemented incremental updates for the OsvDownloadTask Adapted and added tests for the OsvDownloadTask Signed-off-by: jonbally <[email protected]>

owasp-dt-bot · 2025-11-14T12:46:11Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

codacy-production · 2025-11-14T12:54:50Z

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
✅ +0.12% (target: -1.00%)	✅ 77.65% (target: 70.00%)

Coverage variation details

	Coverable lines	Covered lines	Coverage
Common ancestor commit (`fa1eb0b`)	24093	19492	80.90%
Head commit (`4dce392`)	24374 (+281)	19748 (+256)	81.02% (+0.12%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

	Coverable lines	Covered lines	Diff coverage
Pull request (#5537)	340	264	77.65%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings Change summary preferences

Fixed the Codacy issue 'Avoid instantiating FileInputStream, FileOutputStream, FileReader, or FileWriter' in writeTimestampFile(). Removed 404 stub for any unspecified endpoint from test, as it is the default behaviour of wiremock. Removed ecosystems.txt and replaced the local file in the stub with a proxied response from the OSV cloud storage (previous test behavior). Signed-off-by: jonbally <[email protected]>

nscuro

Overall great work on this, thanks!

Especially appreciate you injecting a Clock for being able to test the time-sensitive logic.

src/main/java/org/dependencytrack/tasks/OsvDownloadTask.java

Added an escape hatch, if incremental update would contain more than 1000 new or modified advisories, a full mirror is initiated. Removed the outer retry (download retry). Added consumeResultBeforeRetryAttempt() to the request retry. Added small helper method for URL encoding. Implemented smaller requested changes. Signed-off-by: jonbally <[email protected]>

jonbally · 2025-12-12T16:40:36Z

I have implemented the requested changes, the failing tests seem to be unrelated. These GitHubMetaAnalyzerTest tests are usually flaky for me...

nscuro

Thanks!

Partial refactor of OsvDownloadTask to allow for incremental updates

8593bbe

Implemented incremental updates for the OsvDownloadTask Adapted and added tests for the OsvDownloadTask Signed-off-by: jonbally <[email protected]>

jonbally force-pushed the OsvDownloadTask-incrementalUpdate branch from 9485073 to 8593bbe Compare November 14, 2025 15:38

jonbally changed the title ~~Incemental updates for OsvDownloadTask~~ Incremental updates for OsvDownloadTask Nov 20, 2025

nscuro added this to the 4.14.0 milestone Dec 4, 2025

nscuro added enhancement New feature or request integration/osv Related to the OSV integration labels Dec 4, 2025

nscuro requested changes Dec 4, 2025

View reviewed changes

jonbally requested a review from nscuro December 12, 2025 16:39

nscuro approved these changes Jan 1, 2026

View reviewed changes

nscuro merged commit b29db12 into DependencyTrack:master Jan 1, 2026
11 of 12 checks passed

github-actions bot locked as resolved and limited conversation to collaborators Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incremental updates for OsvDownloadTask #5537

Incremental updates for OsvDownloadTask #5537
nscuro merged 3 commits intoDependencyTrack:masterfrom
jonbally:OsvDownloadTask-incrementalUpdate

jonbally commented Nov 14, 2025 •

edited

Loading

Uh oh!

owasp-dt-bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

codacy-production bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

nscuro left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonbally commented Dec 12, 2025

Uh oh!

nscuro left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jonbally commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Addressed Issue

Additional Details

Checklist

Uh oh!

owasp-dt-bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

codacy-production bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Uh oh!

nscuro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonbally commented Dec 12, 2025

Uh oh!

nscuro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonbally commented Nov 14, 2025 •

edited

Loading

owasp-dt-bot commented Nov 14, 2025 •

edited

Loading

codacy-production bot commented Nov 14, 2025 •

edited

Loading