Design: Change vulnerability feedback identification
Problem to solve
The vulnerability feedback model currently relies on a fingerprint to match it with vulnerability findings. Though, this fingerprint is actually a SHA1 of the vulnerabilities[].cve
property (AKA compare key
) of the JSON Report format, and we want to get rid of this legacy property, so we need to find another approach for the identification of the vulnerability feedback.
Intended users
- Delaney (Development Team Lead)
- Sasha (Software Developer)
- Devon (DevOps Engineer)
- Sam (Security Analyst)
Further details
The vulnerability feedback model relies on the project_fingerprint
property to match it with vulnerability findings. This approach allows:
- to match feedback with both vulnerability records stored in the database for the
default_branch
and vulnerability objects reported in JSON artifacts for other branches (and MR). - to have a stable identifier that is similar between different executions of the analyzer, to keep matching the feedback with the vulnerability (e.g. between pipelines)
Though, due to how we generate this fingerprint, #2
is not as stable as expected and is also not consistent between the different analyzers. It also sometimes relies on proprietary properties that aren't part of the common format and thus, not sent to the rails backend.
Proposal
The goal of the feedback (dismissal, issue, MR) is to attach information to a vulnerability finding and make sure this information follows it over time. E.g. when we dismiss a vulnerability, we want it to stay dismissed as long as it is reported by the scans.
This goal is actually very similar to being able to track a vulnerability finding over time, in a stable way. So the proposal is to stick to the exact same identification scheme:
- report type
- primary identifier fingerprint
- location fingerprint
By having the same identification between vulnerability findings and feedback, we ensure that we'll have the same stability, and thus that we'll keep the feedback tied to the findings at least as much as we're capable of tracking the finding. This also means that any improvement we can do on the vulnerability finding tracking can also be applied to the feedback.
Migration of existing data
The feedback feature is still in the alpha stage, so data loss is acceptable but we should try to migrate existing data if possible.
We could loop through existing feedback records that are linked to a vulnerability occurrence record (so feedback created from the default_branch
) and from it, extract the report type, primary identifier fingerprint, and location fingerprint to generate the new fingerprint used to do the matching.
We then could loop through the remaining feedback records that are not linked to occurrence record (so feedback created from MR or pipeline view for feature branches), load the corresponding JSON artifact (we know the pipeline id), find the corresponding vulnerability and extract the report type, primary identifier fingerprint, and location fingerprint to generate the new fingerprint used to do the matching. Some artifacts may be expired and there would then be no way to migrate the feedback.
We can also discuss the relevance of trying to implement a backward-compatible finder that would rely on the old fingerprint.
Permissions and Security
Documentation
The feedback is not well documented today but we could add such information to the 3rd party integration doc: https://docs.gitlab.com/ee/development/integrations/secure/
Availability & Testing
The migration should be well tested.
What does success look like, and how can we measure that?
The vulnerability feedback is identified the same way as the vulnerability findings.
What is the type of buyer?
Links / references
First approach (deprecated)
-
Add two columns to vulnerability_feedback (primary identifier fingerprint, location fingerprint) (weight: 2) -
Expose primary identifier fingerprint, location fingerprint when vulnerabilities displayed (we currently don't expose this in our api end points) (weight: 3) -
Make rails populate this two columns whenever new feedback is added (this could involve front endbecause we don't know how to make relationship between vulnerability_feedback and vulnerability_occurrence only way to do that is when we create feedback FE needs to send primary identifier and location fingerprint ) (weight: 3) -
Change application logic in MR widget, security dashboards, first class vulnerability and vulnerability api end points, to use new information to match between vulnerability_occurrence and vulnerability_feedback (weight: 5) -
Write background migration to populate added columns (weight: 5) -
MR widget still hits Projects::VulnerabilityFeedbackController#index
to get vulnerability feedback information this needs to be cleaned up since we get this information from backend (frontend work), in addition, we can disable reactive cache whenever we add vulnerability feedback otherwise user won't see change immediately see this for additional part (!18960 (merged)) (for invalidating cache part its weight: 2) -
Deprecate project_fingerprint column from vulnerability_feedback
andvulnerability_occurrence
table (weight: 3)
Second approach (favoured)
-
Add one column called feedback_fingerprint
to vulnerability_feedback and vulnerability_occurrence (weight: 2) -
calculate feedback_print (calculation is in below code snipped) (weight: 2) -
Make rails populate new feedback_fingerprint column whenever new occurrence is added to db and reported in artifact (weight: 3) -
Make rails populate new feedback_fingerprint column whenever new feedback is added. (this could involve front endbecause we don't know how to make relationship between vulnerability_feedback and vulnerability_occurrence only way to do that is when we create feedback FE needs to send primary identifier and location feedback_fingerprint ) (weight: 3) -
Write background migration to populate added columns for existing vulnerability_feedback
andvulnerability_occurrence
(weight: 5) -
Change application logic in MR widget, security dashboards, first class vulnerability and vulnerability api end points, to use new information to match between vulnerability_occurrence and vulnerability_feedback (weight: 5) -
MR widget still hits Projects::VulnerabilityFeedbackController#index
to get vulnerability feedback information this needs to be cleaned up since we get this information from backend (frontend work), in addition, we can disable reactive cache whenever we add vulnerability feedback otherwise user won't see change immediately see this for additional part (!18960 (merged)) (for invalidating cache part its weight: 2) -
Deprecate project_fingerprint column from vulnerability_feedback
andvulnerability_occurrence
table (weight: 3)
We could add project_id
to feedback_fingerprint
calculation for both.
Calculating feedback_fingerprint
def generate_feedback_fingerprint
res = [primary_identifier&.fingerprint, location&.fingerprint, report_type.to_s].compact.join("")
Digest::SHA256.hexdigest(res)
end
Alternative for calculating feedback_fingerprint
This method raised some concerns for stability of identifiers. Identifiers might change frequently therefore if we consider all of them there is higher chance we might lose feedback.
def generate_feedback_fingerprint
res = [joined_identifiers, location&.fingerprint, report_type.to_s].compact.join("")
Digest::SHA256.hexdigest(res)
end
# converts and combines all identifers with xor
# in that way order won't matter
def joined_identifiers
convert_to_binary = lambda { |input| input.unpack1("B*").to_i(2) }
identifiers.map do |identifier|
convert_to_binary.call("#{identifier.external_type}:#{identifier.external_id}")
end.compact.reduce(:^).to_s
end
Additional information
https://youtu.be/sgrQFF_JuqQ video explains how project_fingerprint is used currently
Current records in database
explain SELECT count(id) FROM vulnerability_feedback
Currently we have 5590 vulnerability_feedback in gitlab.com
explain select distinct vf.id from vulnerability_feedback vf inner join vulnerability_occurrences vo on ENCODE(vo.project_fingerprint, 'HEX') = vf.project_fingerprint and vf.project_id = vo.project_id AND vo.report_type = vf.category
4697 vulnerability_feedback coming from occurrences which are saved in db
explain SELECT DISTINCT a.project_fingerprint FROM vulnerability_feedback a INNER JOIN ci_pipelines b ON a.pipeline_id = b.id AND b.ref <> 'master' AND a.project_id = b.project_id
~1695 vulnerability_feedback coming from occurrences which are coming from db
Payload that is send from front end when we create vulnerability feedback
category: "dependency_scanning"
feedback_type: "dismissal"
pipeline_id: 115877433
project_fingerprint: "90034c121a4fda23e2ab87911b6692ca2a30f413"
vulnerability_data: {id: null, report_type: "dependency_scanning",…}
id: null
report_type: "dependency_scanning"
name: "A prototype pollution vulnerability in handlebars may lead to remote code execution if an attacker can control the template in handlebars"
severity: "high"
confidence: "unknown"
scanner: {external_id: "retire.js", name: "Retire.js"}
identifiers: [{external_type: "retire.js", external_id: "baf1b2b5f9a7c1dc0fb152365126e6c3",…}]
project_fingerprint: "90034c121a4fda23e2ab87911b6692ca2a30f413"
create_vulnerability_feedback_issue_path: "/caneldem/security-reports2/-/vulnerability_feedback"
create_vulnerability_feedback_merge_request_path: "/caneldem/security-reports2/-/vulnerability_feedback"
create_vulnerability_feedback_dismissal_path: "/caneldem/security-reports2/-/vulnerability_feedback"
project: {id: 16583999, name: "Security Reports2", full_path: "/caneldem/security-reports2",…}
dismissal_feedback: null
issue_feedback: null
merge_request_feedback: null
description: null
links: [{url: "https://snyk.io/vuln/SNYK-JS-HANDLEBARS-174183"},…]
location: {file: "package.json", dependency: {package: {name: "handlebars"}, version: "4.0.11"}}
remediations: [null]
solution: null
state: "detected"
blob_path: "/caneldem/security-reports2/-/blob/1875d04db72f89e37fa7f6b854b06569265ae5cd/package.json"
category: "dependency_scanning"
title: "A prototype pollution vulnerability in handlebars may lead to remote code execution if an attacker can control the template in handlebars"