This document provides an overview of Apache Amoro, a Lakehouse management system built on open data lake formats. It introduces the system architecture, core components, module structure, and key capabilities. For detailed information about specific subsystems, refer to:
Apache Amoro is a Lakehouse management system that provides self-optimizing and management capabilities for data lake tables. It works with multiple compute engines (Apache Flink, Apache Spark, Trino) and supports various table formats (Iceberg, Mixed-Iceberg, Mixed-Hive, Paimon, Hudi).
Key Features:
Primary Use Cases:
Sources: README.md40-98 pom.xml34-36
The following diagram shows the high-level architecture of Apache Amoro and how its major components interact:
Sources: README.md46-61 docs/admin-guides/deployment.md71-97 amoro-ams/pom.xml1-641
Architecture Components:
AMS (Amoro Management Service): Central management service providing three main interfaces:
Compute Engines: External processing engines that read/write data through AMS table service
Optimizer Containers: Execution environments for self-optimizing tasks (Local JVM, Flink, Spark, Kubernetes)
Storage Layer: Underlying storage systems for data files, metadata, and message queues
The AmoroServiceContainer is the main entry point that initializes and manages all AMS services:
Sources: CONTRIBUTING.md134-141 amoro-ams/pom.xml29-31
HTTP Server (Port 1630):
DashboardServer: Javalin-based HTTP server for web UI and REST APILoginController, CatalogController, TableController, OptimizerController, TerminalControllerThrift Services:
TableManagementServer (Port 1260): Table lifecycle operations for compute enginesOptimizingServiceServer (Port 1261): Task polling and optimizer coordinationSources: docs/admin-guides/deployment.md71-97
Sources: README.md103-114 docs/admin-guides/deployment.md141-175
Apache Amoro is organized into the following Maven modules:
Sources: pom.xml47-58 README.md99-114
Module Descriptions:
| Module | Purpose | Key Components |
|---|---|---|
amoro-common | Core abstractions and common utilities | Thrift API definitions, configuration, shared utilities |
amoro-ams | Management service implementation | Service container, REST API, Thrift servers |
amoro-web | Web dashboard frontend | Vue.js application for AMS management |
amoro-metrics | Metrics reporting | Prometheus metric reporter |
amoro-format-iceberg | Iceberg format integration | Iceberg table support, catalog implementations |
amoro-format-mixed | Mixed format implementation | Mixed-Iceberg and Mixed-Hive table support |
amoro-format-paimon | Paimon format integration | Paimon table metadata display |
amoro-format-hudi | Hudi format integration | Hudi table metadata display |
amoro-optimizer | Optimizer implementations | Local, Flink, Spark, and Kubernetes optimizers |
dist | Binary distribution packaging | Maven assembly for release artifacts |
Sources: README.md99-114 pom.xml47-58
Apache Amoro supports multiple table formats, each serving different use cases:
| Format | Description | Use Case |
|---|---|---|
| Iceberg | Native Apache Iceberg tables | Standard Iceberg functionality with AMS management |
| Mixed-Iceberg | Enhanced Iceberg with LogStore and ChangeStore | Streaming updates with high performance |
| Mixed-Hive | Hive-compatible mixed format | Upgrade Hive tables while maintaining compatibility |
| Paimon | Apache Paimon integration | Display Paimon table metadata and statistics |
| Hudi | Apache Hudi integration | Display Hudi table metadata and statistics |
Sources: README.md62-70
| Engine | Version | Batch Read | Batch Write | Streaming Read | Streaming Write | Create Table | Alter Table |
|---|---|---|---|---|---|---|---|
| Flink | 1.16.x, 1.17.x, 1.18.x | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Spark | 3.3, 3.5 | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
| Hive | 2.x, 3.x | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
| Trino | 406 | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ |
Sources: README.md79-88
Apache Amoro can be deployed in multiple ways to suit different infrastructure requirements:
Sources: docs/admin-guides/deployment.md27-339 docker/build.sh1-243 .github/workflows/docker-images.yml1-267
Deployment Methods:
Standalone Binary: Unpack tarball, configure via YAML files, start with shell scripts
bin/ams.sh, bin/optimizer.shconf/config.yaml, conf/jvm.propertiesDocker Containers: Pre-built images for AMS and optimizers
apache/amoro: Main AMS serviceapache/amoro-flink-optimizer: Flink-based optimizer (versions 1.14.6, 1.20.0)apache/amoro-spark-optimizer: Spark-based optimizer (version 3.5.7)Kubernetes with Helm: Production-grade deployment with scaling and HA
Sources: docs/admin-guides/deployment.md36-176 .github/workflows/docker-images.yml37-265 docker/README.md1-48
Apache Amoro uses Maven for building and dependency management:
./mvnw)Key Build Properties:
| Property | Default Value | Description |
|---|---|---|
iceberg.version | 1.6.1 | Apache Iceberg version |
spark.version | 3.5.7 | Apache Spark version |
flink.version | 1.20.3 | Apache Flink version |
hadoop.version | 3.4.0 | Hadoop version (use -Phadoop2 for 2.x) |
paimon.version | 1.2.0 | Apache Paimon version |
hudi.version | 0.14.1 | Apache Hudi version |
Common Build Commands:
Sources: README.md117-134 pom.xml72-171
Minimum Requirements:
Supported Operating Systems:
Sources: docs/admin-guides/deployment.md31-35
For detailed information about specific aspects of Apache Amoro:
Sources: README.md1-186 CONTRIBUTING.md1-249
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.