Skip to content

[improve][broker] PIP-464: Strict Avro schema validation for SchemaType.JSON#25362

Merged
codelipenghui merged 7 commits intoapache:masterfrom
codelipenghui:pip-464/strict-json-schema-avro-validation
Mar 23, 2026
Merged

[improve][broker] PIP-464: Strict Avro schema validation for SchemaType.JSON#25362
codelipenghui merged 7 commits intoapache:masterfrom
codelipenghui:pip-464/strict-json-schema-avro-validation

Conversation

@codelipenghui
Copy link
Copy Markdown
Contributor

@codelipenghui codelipenghui commented Mar 19, 2026

Motivation

The broker-side fallback logic for SchemaType.JSON schema validation is too lenient — it accepts any valid JSON as a schema definition, not just the legacy Jackson format from the Pulsar 2.0 era. This has caused real issues for non-Java clients (e.g., Rust) where users accidentally register JSON Schema Draft 2020-12 definitions:

  1. StructSchemaDataValidator accepts it (Avro parse fails → Jackson fallback succeeds)
  2. JsonSchemaCompatibilityCheck allows it (permissive mixed-format handling)
  3. But Java consumers fail with SchemaParseException: Type not supported: object because AvroBaseStructSchema requires Avro format with no fallback

The result is an asymmetry: broker accepts any JSON, consumer requires Avro. Schemas get stored that no Java consumer can read.

Changes

New broker configuration

  • schemaJsonAllowLegacyJacksonFormat (boolean, default false)

Modified components (6 source files)

  • ServiceConfiguration — new config field
  • StructSchemaDataValidator — gates Jackson JsonSchema fallback on config flag; when false, Avro SchemaParseException propagates directly
  • SchemaDataValidator — new validateSchemaData(data, allowLegacy) overload
  • SchemaRegistryServiceWithSchemaDataValidator — carries and passes config flag
  • JsonSchemaCompatibilityCheck — gates mixed-format compatibility on config flag; defense-in-depth rejection when existing schema is not valid Avro
  • SchemaRegistryService — wires config from PulsarService to validator and compatibility checker

Client-side (1 file)

  • ProducerImpl — deprecation comment on backward-compat code path (no behavioral change)

Tests (3 test files, +171 lines)

  • SchemaDataValidatorTest — 8 new tests: Avro accepted in both modes, Jackson rejected by default / accepted when enabled, JSON Schema Draft rejected / accepted, arbitrary JSON always rejected, AVRO type unaffected
  • JsonSchemaCompatibilityCheckTest — 4 new tests: legacy enabled allows mixed formats, default rejects mixed, Avro↔Avro unaffected, JSON Schema Draft rejected
  • SchemaRegistryServiceWithSchemaDataValidatorTest — 3 new tests: Jackson rejected by default, accepted when enabled, JSON Schema Draft rejected

Compatibility

This is a breaking change in default behavior. Users with legacy pre-2.1 Jackson-format schemas can restore the old behavior by setting schemaJsonAllowLegacyJacksonFormat=true in broker.conf.

Java producers are unaffected (JSONSchema.of() generates Avro format since 2.1). Non-Java clients that were incorrectly registering JSON Schema Draft definitions will get a clear error at registration time instead of a confusing consumer-side failure.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

🤖 Generated with Claude Code

…pe.JSON

Add `schemaJsonAllowLegacyJacksonFormat` broker config (default false) to
control whether the legacy Jackson JsonSchema format is accepted for
SchemaType.JSON schema definitions.

When disabled (default), StructSchemaDataValidator and
JsonSchemaCompatibilityCheck strictly require valid Avro schema format,
consistent with what the consumer side (AvroBaseStructSchema) already
requires. This fixes the asymmetry where the broker accepted any valid
JSON as a schema definition, but consumers failed with
SchemaParseException at read time.

When enabled, the pre-2.1 backward-compatible behavior is preserved.

Also deprecates (but does not remove) the ProducerImpl client-side code
that sends old Jackson format to brokers below protocol v13.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions
Copy link
Copy Markdown

@codelipenghui Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

…ation

Add AdminApiSchemaJsonValidationTest that tests the full server-side
flow using a real broker instance (MockedPulsarServiceBaseTest):

- Avro format JSON schema accepted (via Admin API and Producer API)
- JSON Schema Draft 2020-12 rejected by default
- Jackson JsonSchema format rejected by default
- Jackson and JSON Schema Draft accepted when legacy flag enabled
- SchemaType.AVRO unaffected by the JSON legacy config
- Schema compatibility rejects non-Avro after valid Avro schema exists

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@codelipenghui codelipenghui self-assigned this Mar 19, 2026
@codelipenghui codelipenghui modified the milestones: 5.0.0, 4.2.0 Mar 19, 2026
@github-actions github-actions Bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Mar 19, 2026
codelipenghui and others added 5 commits March 19, 2026 10:33
Avro's Schema.Parser throws AvroTypeException (not SchemaParseException)
for unresolvable type references like "type":"object". These two exception
types are siblings under AvroRuntimeException, so the catch block must
handle both to reach the Jackson fallback path when legacy mode is enabled.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…changes

Avro 1.12.0 throws NullPointerException (not SchemaParseException)
when parsing non-Avro schemas like Jackson JsonSchema format. The
previous catch block only handled SchemaParseException and
AvroTypeException, so the legacy fallback was never reached.

Move the legacy Jackson fallback into the general catch(Exception)
block so it handles all exception types from the Avro parser.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 84.21053% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.74%. Comparing base (765c46e) to head (64a5a1f).
⚠️ Report is 64 commits behind head on master.

Files with missing lines Patch % Lines
...r/service/schema/JsonSchemaCompatibilityCheck.java 55.55% 1 Missing and 3 partials ⚠️
...r/broker/service/schema/SchemaRegistryService.java 83.33% 0 Missing and 1 partial ⚠️
.../SchemaRegistryServiceWithSchemaDataValidator.java 85.71% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #25362      +/-   ##
============================================
+ Coverage     72.72%   72.74%   +0.01%     
- Complexity    34277    34284       +7     
============================================
  Files          1954     1954              
  Lines        154857   154876      +19     
  Branches      17739    17742       +3     
============================================
+ Hits         112627   112661      +34     
+ Misses        33197    33169      -28     
- Partials       9033     9046      +13     
Flag Coverage Δ
inttests 25.76% <50.00%> (-0.17%) ⬇️
systests 22.47% <55.26%> (-0.09%) ⬇️
unittests 73.71% <84.21%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...org/apache/pulsar/broker/ServiceConfiguration.java 98.24% <100.00%> (+<0.01%) ⬆️
.../service/schema/validator/SchemaDataValidator.java 81.81% <100.00%> (+1.81%) ⬆️
...ce/schema/validator/StructSchemaDataValidator.java 95.83% <100.00%> (+11.38%) ⬆️
...va/org/apache/pulsar/client/impl/ProducerImpl.java 83.45% <ø> (+0.28%) ⬆️
...r/broker/service/schema/SchemaRegistryService.java 76.19% <83.33%> (+2.85%) ⬆️
.../SchemaRegistryServiceWithSchemaDataValidator.java 85.29% <85.71%> (+0.91%) ⬆️
...r/service/schema/JsonSchemaCompatibilityCheck.java 53.84% <55.55%> (+6.78%) ⬆️

... and 79 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui merged commit 08d89a2 into apache:master Mar 23, 2026
54 checks passed
@codelipenghui codelipenghui deleted the pip-464/strict-json-schema-avro-validation branch March 23, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants