Skip to content

Incremental updates for OsvDownloadTask #5537

Merged
nscuro merged 3 commits intoDependencyTrack:masterfrom
jonbally:OsvDownloadTask-incrementalUpdate
Jan 1, 2026
Merged

Incremental updates for OsvDownloadTask #5537
nscuro merged 3 commits intoDependencyTrack:masterfrom
jonbally:OsvDownloadTask-incrementalUpdate

Conversation

@jonbally
Copy link
Copy Markdown
Contributor

@jonbally jonbally commented Nov 14, 2025

Description

In it's current implementation the OsvDownloadTask will download and process all existing advisory files for each selected ecosystem on every run. Depending on the selected ecosystems, this can lead to a big daily resource demand (e.g. mirroring the entire collection of OSV Ubuntu advisories takes more than one hour on our instance).

This PR adds changes to the OsvDownloadTask to support incremental updates for the selected OSV ecosystems.
I also adapted existing tests and included new ones for full and incremental updates.

A full mirror for the selected ecosystems will still be performed every 5 days (similar to the NistMirrorTask) to stay in sync with the OSV database. In certain cases some changes might be missed in the incremental update, e.g. when the modified_id.csv file contains a lot of new or modified ids, sometimes more than 40k, the incremental update will cut off at 10k.

Addressed Issue

Enhancement for OsvDownloadTask

  • Added support for incremental updates
  • Added retry mechanism for downloads and requests
  • Added notifications
  • Added time tracking

Additional Details

The incremental updates are utilizing the storage.googleapis.com/osv-vulnerabilities/<ecosystem>/modified_id.csv file.
It contains the IDs of recently added and modified advisories as well as the timestamp of the addition/modification.

The implementation is conceptually similar to the NistMirrorTask:

  • When running the task, it first checks if certain files exist in the osv directory for the respective ecosystem
    • an ecosystem all.zip (renamed),
    • a .zip.ts timestamp file for the all.zip file, and
    • a .csv.ts timestamp file for the modified_id.csv (renamed) file
  • If these files exist, are non-empty, and the "full update" timestamp is within 5 days before now, then an incremental update is performed using the timestamp in the .csv.ts file for selecting relevant entries of the .csv
  • If any condition fails, then a full update is performed


I did not modify the parts of the OsvDownloadTask that perform the database update.
At the moment each advisory is still individually going through these steps, causing many DB reads and writes:

Full mirror:

  • Unzip entry -> Parse JSON -> Parse OSV -> map to DT vulnerability model -> Check if new or modified -> Update DB

Incremental mirror:

  • Request JSON -> Parse JSON -> Parse OSV -> map to DT vulnerability model -> Check if new or modified -> Update DB

I think this could be optimized a lot by processing the advisories in batches.
An alternative to the current implementation, which I saw discussed somewhere recently, would be to implement an OSV analyzer using the OSV API instead of mirroring all of the advisories and performing the matching in DT.


Some points about this PR which might be worth discussing:

  • The size limit of 1 GiB which I added for the acquired all.zip files
  • The hard-coded cadence of 5 days for the full mirror
  • The cutoff after 10k modified advisories for incremental updates

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have added corresponding update logic
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

Implemented incremental updates for the OsvDownloadTask
Adapted and added tests for the OsvDownloadTask

Signed-off-by: jonbally <[email protected]>
@owasp-dt-bot
Copy link
Copy Markdown

owasp-dt-bot commented Nov 14, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Nov 14, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.12% (target: -1.00%) 77.65% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (fa1eb0b) 24093 19492 80.90%
Head commit (4dce392) 24374 (+281) 19748 (+256) 81.02% (+0.12%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#5537) 340 264 77.65%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@jonbally jonbally force-pushed the OsvDownloadTask-incrementalUpdate branch from 9485073 to 8593bbe Compare November 14, 2025 15:38
@jonbally jonbally changed the title Incemental updates for OsvDownloadTask Incremental updates for OsvDownloadTask Nov 20, 2025
Fixed the Codacy issue 'Avoid instantiating FileInputStream,
FileOutputStream, FileReader, or FileWriter' in writeTimestampFile().
Removed 404 stub for any unspecified endpoint from test, as it is
the default behaviour of wiremock.
Removed ecosystems.txt and replaced the local file in the stub with
a proxied response from the OSV cloud storage (previous test behavior).

Signed-off-by: jonbally <[email protected]>
@nscuro nscuro added this to the 4.14.0 milestone Dec 4, 2025
@nscuro nscuro added enhancement New feature or request integration/osv Related to the OSV integration labels Dec 4, 2025
Copy link
Copy Markdown
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall great work on this, thanks!

Especially appreciate you injecting a Clock for being able to test the time-sensitive logic.

Added an escape hatch, if incremental update would contain
more than 1000 new or modified advisories, a full mirror is initiated.
Removed the outer retry (download retry).
Added consumeResultBeforeRetryAttempt() to the request retry.
Added small helper method for URL encoding.
Implemented smaller requested changes.

Signed-off-by: jonbally <[email protected]>
@jonbally jonbally requested a review from nscuro December 12, 2025 16:39
@jonbally
Copy link
Copy Markdown
Contributor Author

I have implemented the requested changes, the failing tests seem to be unrelated. These GitHubMetaAnalyzerTest tests are usually flaky for me...

Copy link
Copy Markdown
Member

@nscuro nscuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nscuro nscuro merged commit b29db12 into DependencyTrack:master Jan 1, 2026
11 of 12 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request integration/osv Related to the OSV integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants