Documentation is undergoing a significant revamp - the new documentation will be finalized as part of the v0.3 release in the late Spring or early Summer of 2025.
dft is a batteries-included suite of DataFusion applications that provides:
- Data Source Integration: Query files from S3, local filesystems, or HuggingFace datasets
- Table Format Support: Native support for Delta Lake
- Extensibility: UDFs defined in WASM (and soon Python)
- Helper Functions: Built-in functions for JSON and Parquet data processing
The project offers four complementary interfaces:
- Text User Interface (TUI): An interactive SQL IDE with real-time query analysis, benchmarking, and catalog exploration (requires
tuifeature) - Command Line Interface (CLI): A scriptable engine for executing queries from files or command line
- FlightSQL Server: A standards-compliant SQL interface for programmatic access (requires
flightsqlfeature) - HTTP Server: A REST API for SQL queries and catalog exploration (requires
httpfeature)
All interfaces share the same execution engine, allowing you to develop locally with the TUI and then seamlessly deploy with the server implementations.
dft builds upon datafusion-cli with enhanced interactivity, additional integrations, and ready-to-use server implementations.
# Core CLI and server interfaces
cargo install datafusion-dft
# With TUI interface
cargo install datafusion-dft --features=tui
# For full functionality with all features (including TUI)
cargo install datafusion-dft --all-featuresIf you don't have Rust installed, follow the installation instructions.
Common feature combinations:
# Core with S3 support
cargo install datafusion-dft --features=s3
# With TUI interface
cargo install datafusion-dft --features=tui
# TUI with S3 and data lake formats
cargo install datafusion-dft --features=tui,s3,deltalake
# Data lake formats
cargo install datafusion-dft --features=deltalake
# With JSON and Parquet functions
cargo install datafusion-dft --features=functions-json,functions-parquetSee the Features documentation for all available features.
Note: The TUI (Text User Interface) is optional and requires the tui feature flag. The CLI, FlightSQL server, and HTTP server are always available.
# Interactive TUI (requires `tui` feature)
dft
# CLI with direct query execution
dft -c "SELECT 1 + 2"
# CLI with file-based query
dft -f query.sql
# Benchmark a query (with stats)
dft -c "SELECT * FROM my_table" --bench
# Concurrent benchmark (measures throughput under load)
dft -c "SELECT * FROM my_table" --bench --concurrent
# Save benchmark results to CSV
dft -c "SELECT * FROM my_table" --bench --save results.csv
# Start FlightSQL Server (requires `flightsql` feature)
dft serve-flightsql
# Start HTTP Server (requires `http` feature)
dft serve-http
# Generate TPC-H data in the configured DB path
dft generate-tpchdft includes built-in benchmarking to measure query performance with detailed timing breakdowns:
# Serial benchmark (default) - measures query performance in isolation
dft -c "SELECT * FROM my_table" --bench
# Concurrent benchmark - measures throughput under load
dft -c "SELECT * FROM my_table" --bench --concurrent
# Custom iteration count
dft -c "SELECT * FROM my_table" --bench -n 100
# Save results to CSV for analysis
dft -c "SELECT * FROM my_table" --bench --save results.csv
# Compare serial vs concurrent performance
dft -c "SELECT * FROM my_table" --bench --save results.csv
dft -c "SELECT * FROM my_table" --bench --concurrent --save results.csv --appendBenchmark Output:
- Timing breakdown by phase: logical planning, physical planning, execution
- Statistics: min, max, mean, median for each phase
- Row counts validation across all runs
- CSV export with
concurrency_modecolumn for result comparison
Serial vs Concurrent:
- Serial: Pure query execution time without contention (baseline performance)
- Concurrent: Throughput measurement with parallel execution (reveals bottlenecks and contention)
- Concurrent mode uses adaptive concurrency:
min(iterations, CPU cores)
dft can automatically load table definitions at startup, giving you a persistent "database-like" experience.
- Create a DDL file (default:
~/.config/dft/ddl.sql) - Add your table and view definitions:
-- S3 data source (requires s3 feature)
CREATE EXTERNAL TABLE users
STORED AS NDJSON
LOCATION 's3://bucket/users';
-- Parquet files
CREATE EXTERNAL TABLE transactions
STORED AS PARQUET
LOCATION 's3://bucket/transactions';
-- Local files
CREATE EXTERNAL TABLE listings
STORED AS PARQUET
LOCATION 'file://folder/listings';
-- Create views from tables
CREATE VIEW users_listings AS
SELECT * FROM users
LEFT JOIN listings USING (user_id);
-- Delta Lake table (requires deltalake feature)
CREATE EXTERNAL TABLE delta_table
STORED AS DELTATABLE
LOCATION 's3://bucket/delta_table';- TUI (requires
tuifeature): DDL is automatically loaded at startup - CLI: Add
--run-ddlflag to execute DDL before your query - Custom Path: Configure a custom DDL path in your config file
[execution] ddl_path = "/path/to/my/ddl.sql"
| Feature | Documentation |
|---|---|
| Core Features | Features Guide |
| Database | Database Guide |
| TUI Interface | TUI Guide |
| CLI Usage | CLI Guide |
| FlightSQL Server | FlightSQL Guide |
| HTTP Server | HTTP Guide |
| Configuration Options | Config Reference |