A streaming data lake for real-time log processing

All the properties of a traditional data lake with specific optimizations for real-time log streams.

SIMPLIFIED OPERATIONS

|

MULTI-YEAR HOT DATA

|

HIGH-VOLUME INGEST AND ETL

|

SUB-SECOND AD HOC QUERY PERFORMANCE

SIMPLIFIED OPERATIONS

MULTI-YEAR HOT DATA

HIGH-VOLUME INGEST AND ETL

SUB-SECOND AD HOC QUERY PERFORMANCE

HYDROLIX

What is a streaming data lake?

Large global brands use Hydrolix Enterprise for their real-time, log-intensive use cases. It combines the ease of a managed offering with the power, control, and customizability of data lake software running on your infrastructure.

Combine logs into a single, simplified view

Hydrolix is designed so you can process massive volumes of log data from multiple sources, combine them all into a single table, and query or generate alerts within seconds. No other system provides this ETL capability at the scale Hydrolix can achieve.

Scale up for peaks, scale to zero for lulls

Hydrolix runs on Kubernetes and is cloud-native, allowing it to scale compute resources for ingest and query up or down dynamically with demand. And since ingest and query resources scale independently of each other and can be assigned in discrete, task-oriented pools, you can fine tune resources for budgetary and performance targets.

All the properties of a traditional data lake

Flexible schema

Raw storage

Decoupled storage

Independently scalable query and ingest compute

With specific optimizations for real-time log streams

Streaming ingest that scales to 100’s of TBs/day

ETL for real-time log stream processing

Combine multiple log sources into a single table

SQL and Spark Query on ingest

Multi-year hot storage

Especially for high-cardinality/dimensionality data

HYDROLIX ARCHITECTURE

Key architectural details

» Decoupled storage (S3-compatible)
» Stateless ingest and query

» Kubernetes—scale up/scale to zero
» Multi-cloud support

» Runs on your infrastructure, whether
your VPC on a hyper-scaler or in a
private data center

The Hydrolix streaming data lake

WHAT IT IS

A streaming data lake combines the decoupled storage and flexible schemas one is accustomed to in traditional data lakes with highly scalable ingest and ETL offered by event stream processing systems like Flink or ksqlDB.

WHY IT MATTERS

A streaming data lake allows you to process massive volumes of log data from multiple sources, combine them all into a single table, and query or generate alerts within seconds. No other system provides this ETL capability at the scale Hydrolix can achieve.

HOW IT WORKS

Hydrolix runs on Kubernetes and is cloud-native, allowing it to scale, ingest, and query compute resources up or down dynamically with demand.

And since ingest and query resources scale independently of each other and can be assigned in discrete, task-oriented pools, you can fine tune resources for budgetary and performance targets.