Conversation
|
🎯 Code Coverage (details) 🔗 Commit SHA: b0468e8 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
| /// any lock to avoid blocking concurrent readers. | ||
| pub fn get_or_build( | ||
| &self, | ||
| raw_rules: &[serde_json::Value], |
There was a problem hiding this comment.
Instead of this, let's change what we need to to store these as serde_json::RawValue and only deserialize them into a serde_json::Value when needed. Since it'll essentially always be a cache hit, by hashing the raw bytes, we can avoid two unnecessary allocations (RawValue -> Value and then the roundtrip Value -> String)
There was a problem hiding this comment.
Alternatively, not sure if it's easier (or possible with Rocket's API), but may we could just change the type to avoid deserializing the rules here?
So instead, deserialize as Vec<u8> representing the JSON string and then hash those?
There was a problem hiding this comment.
secret rules are now stored as Box<serde_json::value::RawValue> instead of Value, wdyt?
There was a problem hiding this comment.
Great, thanks for the review and feedback!
|
Tested successfully with a |
What problem are you trying to solve?
The
/scan-secretsendpoint rebuilds the SDSScannerfrom scratch on every request. This involves deserializing rules from JSON, compiling regex patterns, building automata, and configuring validators - all expensive operations. In contrast, the/analyzeendpoint already caches compiled rules viaRuleCachewithDashMap.In the IDE use-case, the same set of rules is sent with every file scan request (rules come from the Datadog API and rarely change), meaning the scanner is unnecessarily rebuilt hundreds of times with identical rules.
What is your solution?
Introduced a
SecretScannerCache- a single-entry,RwLock-based cache that stores the most recently builtScanneralong with its parsedSecretRules.Arc-cloned scanner + rules. Multiple concurrent scans proceed without blocking each other.u64hash of the serialized JSON rules. Compared on every request to detect rule changes.On cache hit, three expensive operations are skipped entirely:
serde_json::Value->SecretRule)convert_to_sds_ruleconfigwith regex compilation)Scanner::builder().build()with automaton construction)Alternatives considered
DashMapmulti-entry cache (likeRuleCachefor/analyze): considered but unnecessary since the IDE client sends the same rule set every time. A single-entry cache is simpler and has the same hit rate.--use-rules-cachefor/analyze): considered but there's no downside to always caching - memory is minimal (one entry) and the cache self-invalidates on rule changes. So it's always enabled.Scanneris built atomically from all rules at once - you can't incrementally add/remove rules from a built scanner.What the reviewer should know
dd_sds::Scanneris confirmedSend + Sync(the dd-sds benchmarks useArc<Scanner>across 32 threads). Bothscan()andvalidate_matches()take&self.use_debugis deliberately excluded from the cache key - it only controlseprintlnlogging inconvert_to_sds_ruleconfig/try_to_secondary_validator, not the scanner's behavior or output.Scanneris re-exported from thesecretscrate (pub use dd_sds::Scanner) so thebinscrate can reference the type without adding a directdd_sdsdependency.process_secret_scan_requestacceptsOption<&SecretScannerCache>so the non-cache path is preserved for testing.