Skip to content

Add Pulsar transport #2297

@schrepfler

Description

@schrepfler

I think it would be useful to try to add an additional transport to Zipkin which would be Apache Pulsar.

Feature:
Description of the feature

Rational
Pulsar has several features which I think make it interesting for us:

  1. Integrated SQL query capability.
    Latest Pulsar has integrated SQL query capabilities based on the Facebook Presto engine. This means it's possible to query data directly from the topic thus reducing the need to have a separate DB tier.
    Benefit: Potentially simplified deployments.
    Unknown: Not quite clear how capable is the query engine and would it be suitable for the kind of queries Zipkin makes.

  2. Tiered storage.
    Pulsar has the capability to offload topics onto long term/cheap storage (ex. S3) without having to estimate ahead of time any size or time based expiration policies.
    Benefit: Simplify DB management overhead and provide simple way how to enable cheap durable persistence.

  3. Decoupled storage from brokers
    By decoupling brokers from storage Pulsar (at least on paper) should be easier to scale out.
    Benefit: Smaller operational overhead, less chances of errors and simplified operations.

  4. Scales to more topics

  5. Multitenancy and Georeplication capabilties.
    Benefit: Potentially diverse business uses

  6. Pulsar functions.
    It's possible to implement very low latency triggers based on messages which trigger actions directly in the middleware.

  7. Kafka compatibility mode.
    Might be possible to use it in Kafka compatibility mode to simplify development (but at that point I'm not sure if it's possible to leverage the other features).

  8. Web Sockets capability
    It's possible to expose the data stream over web sockets, might be interesting if we want to do fancy stuff in the UI (like real time tracing on a giant ring, I think this view was lost from the old days at Twitter?)

  9. Lower latency thank Kafka (on paper)

  10. Schema registry
    If the data is persisted in one of the formats that support schema it might be possible to evolve the format by using the format schema tools and the registry. I think using a schema is a requirement in order to be able to use Presto to query the data. JSON, Protobuf and Avro are supported.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions