Skip to content

feat(codecs): Add syslog encoder#23777

Merged
thomasqueirozb merged 45 commits intovectordotdev:masterfrom
vparfonov:syslog-codec
Jan 5, 2026
Merged

feat(codecs): Add syslog encoder#23777
thomasqueirozb merged 45 commits intovectordotdev:masterfrom
vparfonov:syslog-codec

Conversation

@vparfonov
Copy link
Copy Markdown
Contributor

@vparfonov vparfonov commented Sep 15, 2025

Summary

This pull request introduces a new syslog encoder. This work is a continuation of the feature originally started in PR #21307.

The encoder is designed to be lean and performant, expecting users to perform complex data shaping with an upstream remap transform. It correctly handles both RFC 5424 and RFC 3164 standards, including specific field length limitations, character sanitization, and security escaping.

Key Features

Simple Configuration: The configuration uses standard Option<ConfigTargetPath> for all fields.

Flexible Parsing: facility and severity values read from the event are parsed intelligently, accepting either a string name (e.g., "user") or a number (e.g., 16), with case-insensitive matching for names.

Strict RFC Compliance:
Added logic to truncate app_name, proc_id, and msg_id to their specified maximum lengths for RFC 5424.
Implemented robust truncation for the RFC 3164 TAG field to ensure it never exceeds 32 characters.
Added a sanitization step for RFC 3164 messages to remove non-printable ASCII characters.
Implemented correct character escaping (\, ", ]) for structured data parameter values to prevent log injection.

Unit tests: including parsing, truncation, sanitization, and escaping.

Vector configuration

[sinks.example]
type = "socket"
inputs = ["example_parse_encoding"]
address = "logserver:514"
mode = "xyz"

[sinks.example.encoding]
codec = "syslog"
rfc = "rfc5424"
app_name = ".app"
proc_id = ".pid"
msg_id = ".mid"
facility = ".fac"
severity = ".sev"
payload_key = ".message"

How did you test this PR?

This plan covers the basic functionality of the syslog encoder for both RFC5424 and RFC3164, focusing on dynamic field resolution from a JSON source.

Note: All tests assume the stdin source is configured with decoding.codec = "json" to parse the input. Expected timestamps and hosts are illustrative.

data_dir = "./data"

[sources.input]
type = "stdin"
[sources.input.decoding]
codec = "json"

[sinks.console]
type = "console"
inputs = [ "input" ]
target = "stdout"

[sinks.console.buffer]
type = "disk"
max_size = 268_435_488
when_full = "block"

Test Case 1: RFC 5424 - field references

Verify that all configured fields are correctly read from a JSON event.
Config:

[sinks.console.encoding]
codec = "syslog"
[sinks.console.encoding.syslog]
rfc = "rfc5424"
app_name = ".app"
proc_id = ".pid"
msg_id = ".mid"
facility = ".fac"
severity = ".sev"

Input:
{"host": "my-host", "@timestamp": "2025-10-23T19:00:00.123456Z", "message": "hello world", "app": "my_app", "pid": "987", "mid": "REQ-1", "fac": "daemon", "sev": 3}
Expected Output:
<27>1 2025-10-23T17:37:08.711556Z my-host my_app 987 REQ-1 - hello world

Test Case 2: RFC 3164 - fields references

Verify that all configured fields are correctly read for the legacy format.
Config:

[sinks.console.encoding]
codec = "syslog"
[sinks.console.encoding.syslog]
rfc = "rfc3164"
app_name = ".app"
proc_id = ".pid"
facility = ".fac"
severity = ".sev"

Input:
{"host": "my-host", "@timestamp": "2025-10-23T19:00:00Z", "message": "hello legacy", "app": "legacy_app", "pid": "456", "fac": "user", "sev": 5}
Expected Output:
<13>Oct 23 19:00:00 my-host legacy_app[456]: hello legacy

Test Case 3: Field Parsing

Verify facility and severity are parsed from names (case-insensitive) and numbers.
Config:

[sinks.console.encoding.syslog]
rfc = "rfc5424"
facility = ".fac"
severity = ".sev"

Input 1 (Name): {"fac": "local1", "sev": "warning"}
Output 1: <140>1 ...

Input 2 (Number): {"fac": 17, "sev": 4}
Output 2: <140>1 ... (same PRI)

Input 3 (Uppercase): {"fac": "LOCAL1", "sev": "WARNING"}
Output 3: <140>1 ... (same PRI)

Input 4 (Mix): {"fac": "LOCAL1", "sev": "WARNING"}
Output 4: <140>1 ... (same PRI)

Test Case 4: Default Fallbacks

Verify the encoder uses defaults.
Config:

[sinks.console.encoding]
codec = "syslog"

Input: {"host": "my-host", "@timestamp": "2025-10-23T19:00:00Z", "message": "hello default"}
Expected Output: <14>1 2025-10-23T19:00:00.000000Z my-host vector - - - hello default

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

syedriko and others added 22 commits March 15, 2024 17:23
Original commit from syedriko
This is only a temporary change to make the diffs for future commits easier to follow.
- Introduce a `Pri` struct with fields for severity and facility as enum values.
  - `Pri` uses `strum` crate to parse string values into their appropriate enum variant.
  - Handles the responsibility of encoding the two enum values ordinal values into the `PRIVAL` value for the encoder.
- As `Facility` and `Severity` enums better represent their ordinal mapping directly
  - The `Fixed` + `Field` subtyping with custom deserializer isn't necessary. Parsing a string that represents the enum by name or its ordinal representation is much simpler.
  - Likewise this removes the need for the get methods as the enum can provide both the `String` or `u8` representation as needed.
`SyslogSerializer::encode()` has been simplified.
- Only  matching `Event::Log` is relevant, an `if let` bind instead of `match` helps remove a redundant level of nesting.
- This method only focuses on boilerplate now, delegating the rest to `ConfigDecanter` (_adapt `LogEvent` + encoder config_) and `SyslogMessage` (_encode into syslog message string_).
- This removes some complexity during actual encoding logic, which should only be concerned about directly encoding from one representation to another, not complimentary features related to Vector config or it's type system.

The new `ConfigDecanter` is where many of the original helper methods that were used by `SyslogSerializer::encode()` now reside. This change better communicates the scope of their usage.
- Any interaction with `LogEvent` is now contained within the methods of this new struct. Likewise for the consumption of the encoder configuration (instead of queries to config throughout encoding).
- The `decant_config()` method better illustrates an overview of the data we're encoding and where that's being sourced from via the new `SyslogMessage` struct, which splits off the actual encoding responsibility (see next commit).
`SyslogSerializerConfig` has been simplified.
- Facility / Severity deserializer methods aren't needed, as per their prior refactor with `strum`.
- The `app_name` default is set via `decant_config()` when not configured explicitly.
- The other two fields calling a `default_nil_value()` method instead use an option value which encodes `None` into the expected `-` value.
- Everything else does not need a serde attribute to apply a default, the `Default` trait on the struct is sufficient.
- `trim_prefix` was removed as it didn't seem relevant. `tag` was also removed as it's represented by several subfields in RFC 5424 which RFC 3164 can also use.

`SyslogMessage::encode()` refactors the original PR encoding logic:
- Syslog Header fields focused, the PRI and final message value have already been prepared prior. They are only referenced at the end of `encode()` to combine into the final string output.
- While less efficient than `push_str()`, each match variant has a clear structure returned via the array `join(" ")` which minimizes the noise of `SP` from the original PR. Value preparation prior to this is clear and better documented.
- `Tag` is a child struct to keep the main logic easy to grok. `StructuredData` is a similar case.
No changes beyond relocating the code into a single file.
- Drop notes referring to original PR differences + StructuredData adaption references. None of it should be relevant going forward.
- Revise some other notes.
- Drop `add_log_source` method (introduced from the original PR author) in favor of using `StructuredData` support instead.
This should be simple and lightweight enough to justify for the DRY benefit?

This way the method doesn't need to be duplicated redundantly. That was required because there is no trait for `FromRepr` provided via `strum`. That would require a similar amount of lines for the small duplication here.

The `akin` macro duplicates the `impl` block for each value in the `&enums` array.
- `ConfigDecanter::get_message()` replaces the fallback method in favor of `to_string_lossy()` (a dedicated equivalent for converting `Value` type to a String type (_technically it is a CoW str, hence the follow-up with `to_string()`_)).
  - This also encodes the value better, especially for the default `log_namespace: false` as the message value (when `String`) is not quote wrapped, which matches the behaviour of the `text` encoder output.
  - Additionally uses the `LogEvent` method `get_message()` directly from `lib/vector-core/src/event
/log_event.rs`. This can better retrieve the log message regardless of the `log_namespace` setting.
- Encoding of RFC 5424 fields has changed to inline the `version` constant directly, instead of via a redundant variable. If there's ever multiple versions that need to be supported, it could be addressed then.
- The RFC 5424 timestamp has a max precision of microseconds, thus this should be rounded and `AutoSi` can be used (_or `Micros` if it should have fixed padding instead of truncating trailing `000`_).
- The original PR author appears to have relied on a hard-coded timestamp key here.
- `DateTime<Local>` would render the timestamp field with the local timezone offset, but other than that `DateTime<Utc>` would seem more consistent with usage in Vector, especially since any original TZ context is lost by this point?
- Notes adjusted accordingly, with added TODO query for each encoding mode to potentially support configurable timezone.
- Move encoder config settings under a single `syslog` config field. This better mirrors configuration options for existing encoders like Avro and CSV.
- `ConfigDecanter::value_by_key()` appears to accomplish roughly the same as the existing helper method `to_string_lossy()`. Prefer that instead. This also makes the `StructuredData` helper `value_to_string()` redundant too at a glance?
- Added some reference for the priority value `PRIVAL`.
- `Pri::from_str_variants()` uses the existing defaults for fallback, communicate that more clearly. Contextual note is no longer useful, removed.
To better communicate the allowed values, these two config fields can change from the `String` type to their appropriate enum type.
- This relies on serde to deserialize the config value to the enum which adds a bit more noise to grok.
- It does make `Pri::from_str_variants()` redundant, while the `into_variant()` methods are refactored to `deserialize()` with a proper error message emitted to match the what serde would normally emit for failed enum variant deserialization.
- A drawback of this change is that these two config fields lost the ability to reference a different value path in the `LogEvent`. That'll be addressed in a future commit.
In a YAML config a string can optionally be wrapped with quotes, while a number that isn't quote wrapped will be treated as a number type.

The current support was only for string numbers, this change now supports flexibility for config using ordinal values in YAML regardless of quote usage.

The previous `Self::into_variant(&s)` logic could have been used instead of bringing in `serde-aux`, but the external helper attribute approach seems easier to grok/follow as the intermediary container still seems required for a terse implementation.

The match statement uses a reference (_which requires a deref for `from_repr`_) to appease the borrow checker for the later borrow needed by `value` in the error message.
This seems redundant given the context? Mostly adds unnecessary noise.

Could probably `impl Configurable` or similar to try workaround the requirement. The metadata description could generate the variant list similar to how it's been handled for error message handling?
Not sure if this is worthwhile, but it adopts error message convention elsewhere I've seen by managing them via Snafu.
Signed-off-by: Vitalii Parfonov <[email protected]>
…rity dynamic, payload_key optional

Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
Signed-off-by: Vitalii Parfonov <[email protected]>
@vparfonov vparfonov requested a review from a team as a code owner September 15, 2025 08:40
@vparfonov vparfonov requested a review from a team September 15, 2025 08:40
@vparfonov vparfonov requested review from a team as code owners September 15, 2025 08:40
@github-actions github-actions Bot added the domain: releasing Anything related to releasing Vector label Sep 15, 2025
@vparfonov vparfonov marked this pull request as draft September 15, 2025 08:56
Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good!

  • Please add edge case tests (missing fields, empty values)
  • Validate that the documentation improvement I committed are correct

Copy link
Copy Markdown
Contributor

@polarathene polarathene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick glance over I recalled some review feedback I received previously that's presumably still relevant for your iteration of the PR.

Comment thread lib/codecs/src/encoding/format/syslog.rs Outdated
Comment thread lib/codecs/src/encoding/format/syslog.rs
Comment thread lib/codecs/src/encoding/format/syslog.rs Outdated
Comment thread lib/codecs/Cargo.toml Outdated
 - Removed the obsolete `payload_key` field from `SyslogSerializerOptions` and simplified the payload retrieval logic.
 - Applied `#[serde(deny_unknown_fields)]` to the  `SyslogSerializerOptions` struct, to enforces failing if configuration errors.
@vparfonov
Copy link
Copy Markdown
Contributor Author

Hello @polarathene and @pront, thank you for the review!
I have addressed all mentioned comments and incorporated the suggested changes.

  • applied #[serde(deny_unknown_fields)] to SyslogSerializerOptions to preventing silent configuration errors caused by obsolete or mistyped fields.
  • removed obsolete payload_key field. The encoder now exclusively uses the standard event .message field, simplifying the configuration interface as requested.
  • implemented semantic application name fallback. The app_name lookup now prioritizes the explicit configuration, then falls back to log.get_by_meaning("service") after goes with default value.
  • fixed RFC3164 TAG truncation logic, ensured the 32-character limit is maintained
  • added edge case tests
  • bumped derive_more to v2.0.1

@tot19
Copy link
Copy Markdown
Contributor

tot19 commented Nov 20, 2025

Hey @vparfonov and @pront , does this open up the door to a dedicated syslog sink?

@vparfonov
Copy link
Copy Markdown
Contributor Author

vparfonov commented Nov 20, 2025

Hey @vparfonov and @pront , does this open up the door to a dedicated syslog sink?

Not yet, but after merging will be possible to use it to pair with socket sink, something like:

[sinks.example]
type = "socket"
inputs = ["example_parse_encoding"]
address = "logserver:514"
mode = "tcp"
keepalive.time_secs = 60

[sinks.example.encoding]
codec = "syslog"
rfc = "rfc5424"

@tot19
Copy link
Copy Markdown
Contributor

tot19 commented Nov 20, 2025

Sweet! Thank you

Copy link
Copy Markdown
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vparfonov. I tried to fix the failing checks but it seems like I don't hash push permissions to this branch. I suggest git fetch && git merge origin/master and doing git checkout origin/master -- Cargo.lock && cargo check when you get a conflict to resolve it. Also, once master is merged you'll also need to run make generate-component-docs and cargo vdev build licenses.

Comment thread lib/codecs/Cargo.toml Outdated
Comment thread lib/codecs/src/encoding/format/syslog.rs Outdated
Comment thread lib/codecs/src/encoding/format/syslog.rs Outdated
Comment thread lib/codecs/src/encoding/format/syslog.rs
@thomasqueirozb thomasqueirozb added the meta: awaiting author Pull requests that are awaiting their author. label Dec 22, 2025
@github-actions github-actions Bot removed the meta: awaiting author Pull requests that are awaiting their author. label Dec 23, 2025
@github-actions github-actions Bot added the domain: external docs Anything related to Vector's external, public documentation label Dec 23, 2025
@vparfonov
Copy link
Copy Markdown
Contributor Author

Hi @vparfonov. I tried to fix the failing checks but it seems like I don't hash push permissions to this branch. I suggest git fetch && git merge origin/master and doing git checkout origin/master -- Cargo.lock && cargo check when you get a conflict to resolve it. Also, once master is merged you'll also need to run make generate-component-docs and cargo vdev build licenses.

@thomasqueirozb, thanks for pointing this out. I've attempted to run the generation commands, but I am hitting environment issues that I can't resolve quickly.

cargo vdev build licenses: Failed. I am getting:

> cargo vdev build licenses
Error: No such file or directory (os error 2)

@thomasqueirozb
Copy link
Copy Markdown
Contributor

thomasqueirozb commented Dec 23, 2025

cargo vdev build licenses: Failed. I am getting:

> cargo vdev build licenses
Error: No such file or directory (os error 2)

This error message has been on my todo list to fix since forever. You're missing dd-rust-license-tool. You can install it by running cargo install dd-rust-license-tool --version 1.0.4 and then running cargo vdev build licenses again.

It also looks like changes to website/cue/reference/components/sinks/generated/greptimedb_logs.cue need to be reverted

@vparfonov
Copy link
Copy Markdown
Contributor Author

This error message has been on my todo list to fix since forever. You're missing dd-rust-license-tool. You can install it by running cargo install dd-rust-license-tool --version 1.0.4 and then running cargo vdev build licenses again.

got it now works, thanks

It also looks like changes to website/cue/reference/components/sinks/generated/greptimedb_logs.cue need to be reverted

reverted, but it strange why it failed, only this changes was observed

- examples: [{}]
+ examples: [{},
+ ]

@thomasqueirozb
Copy link
Copy Markdown
Contributor

thomasqueirozb commented Dec 23, 2025

reverted, but it strange why it failed, only this changes was observed

I have ran into that before with this same file. Not sure what is going on there - might be a difference between how formatting occurs inside the CI and make generate-component-docs works locally.

@thomasqueirozb thomasqueirozb added this pull request to the merge queue Jan 5, 2026
Merged via the queue into vectordotdev:master with commit 5f8ab31 Jan 5, 2026
50 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 5, 2026
@vparfonov vparfonov deleted the syslog-codec branch January 12, 2026 13:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: codecs Anything related to Vector's codecs (encoding/decoding) domain: external docs Anything related to Vector's external, public documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants