数据库系统
Database System
DuckDB Tutorial
DuckDB 教程
9/22/2025 Fall 2025 1
Getting Started with DuckDB CLI
• Lightweight, fast, and easy-to-use analytical SQL engine
• Focus: Practical CLI usage
What is DuckDB?
• In-process SQL OLAP database
• Designed for analytical workloads
• Runs without a server, embedded in apps or CLI
DuckDB
• Download:
https://duckdb.org/docs/installation/?version=stable&environ
ment=cli&platform=win&download_method=direct&architect
ure=x86_64
• x86_64
9/22/2025 Fall 2025 4
Installing DuckDB
• Download binaries from https://duckdb.org
• Linux/macOS: brew install duckdb or package manager
• Windows: use precompiled .exe
• Python/R: pip install duckdb, install.packages('duckdb')
Starting the CLI
• Open terminal (in Linxu, Ubuntu)
• Run:
duckdb
• Starts an interactive SQL shell
• By default, creates an in-memory DB
Creating a Database File
• Persistent database stored in file
• Start with file:
duckdb mydb.duckdb
• Reopen anytime with same command
Basic CLI Commands
• .help → list all commands
• .open file → open another database
• .quit → exit CLI
• .schema → show schema
Running SQL Queries
• Example:
CREATE TABLE users(id INTEGER, name TEXT);
INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob');
SELECT * FROM users;
• CLI prints tabular results
Importing CSV Data
• Use built-in read_csv_auto:
CREATE TABLE trips AS SELECT * FROM
read_csv_auto('trips.csv’);
• Automatically detects schema
Inspecting Data
• .tables → list all tables
• .schema tablename → show table schema
• PRAGMA table_info('trips'); → query metadata
Querying Data
• Standard SQL supported:
SELECT COUNT(*) FROM trips;
SELECT city, AVG(duration) FROM trips GROUP BY city;
• Aggregate & group functions
Filtering and Joins
• Filtering with WHERE:
SELECT * FROM trips WHERE duration > 60;
• Joining:
SELECT u.name, t.city FROM users u JOIN trips t ON u.id =
t.user_id;
Using Parquet Files
• DuckDB reads/writes Parquet natively
• Example:
SELECT * FROM 'data.parquet’;
• Write query results: COPY (SELECT * FROM trips) TO
'out.parquet';
Exporting Results
• CSV export:
COPY trips TO 'trips.csv' (HEADER, DELIMITER ',’);
• Parquet export (efficient for large data)
CLI Output Options
• Change output format:
• .mode csv → output in CSV format
• .mode markdown → for docs
• .mode box → nice table boxes
Views and Temporary Tables
• Create view:
• CREATE VIEW active_trips AS SELECT * FROM trips WHERE
duration > 60;
• Temporary tables auto-cleaned after session
Transactions
• Start: BEGIN TRANSACTION;
• Multiple inserts/updates
• Commit: COMMIT;
• Rollback: ROLLBACK;
Extensions
• Load extensions dynamically:
INSTALL httpfs;
LOAD httpfs;
Enables S3/HTTP access
Querying Remote Data
• Example with HTTPFS:
SELECT * FROM read_csv_auto('https://.../data.csv’);
• Works with Parquet/CSV over HTTP or S3
Performance Features
• Vectorized execution engine
• Columnar storage
• Parallel query execution
• Efficient for analytics on laptops
Wrap Up
• DuckDB CLI is lightweight yet powerful
• Ideal for:
• - Ad-hoc SQL queries
• - Data science exploration
• - Fast analytics without servers