Superceded by https://github.com/hackclub/orpheus-engine. This repo is no longer used.
AI generated
A tool for migrating data from one Postgres database to another with schema remapping. This is particularly useful for copying data from Rails applications where you want to preserve the data structure but place it into a specific schema in the target database.
- Migrate tables and data from one Postgres database to another
- Remap source schema (typically
public) to a target schema of your choice - Support for incremental updates by tracking the most recent timestamp
- Automatically creates target schema and tables if they don't exist
- Preserves primary key constraints
- Handles batched inserts for better performance
- Memory-efficient processing of large tables
- Python 3.6+
- Bash shell
- Access to source and target Postgres databases
-
Clone this repository:
git clone <repository-url> cd <repository-dir> -
Set up your environment variables in a
.envfile:HACKATIME_DB_URL="postgres://username:password@hostname:port/dbname" WAREHOUSE_DB_URL="postgres://username:password@hostname:port/dbname" -
Make the scripts executable:
chmod +x pg_migrate.py pg_migrate.sh
Run the migration using the shell script:
./pg_migrate.sh [options] [source_db_url] [target_db_url] target_schema [incremental]source_db_url(optional): URL for the source database. Defaults to HACKATIME_DB_URL from .envtarget_db_url(optional): URL for the target database. Defaults to WAREHOUSE_DB_URL from .envtarget_schema(required): Schema name to create in the target databaseincremental(optional): "true" or "false", defaults to "true"
--batch-size N: Number of rows to process in a batch (default: 100000)--debug: Enable debug logging
# Migrate data to 'db1' schema with incremental updates
./pg_migrate.sh postgres://user:pass@host:port/source_db postgres://user:pass@host:port/target_db db1 true
# Using environment variables from .env
./pg_migrate.sh "" "" db1
# Process with smaller batches to reduce memory usage
./pg_migrate.sh --batch-size=10000 "" "" db1You can also run the Python script directly:
python3 pg_migrate.py --source-db-url="..." --target-db-url="..." --target-schema="db1" --incrementalAdditional options:
--source-schema: Source schema name (default: public)--state-file: File to store migration state (default: .migration_state.json)--batch-size: Number of rows to process in a batch (default: 100000)--debug: Enable debug logging
When migrating very large tables (millions of rows), you may need to adjust the batch size to optimize for your system's memory:
-
Use incremental updates: Tables with timestamp columns will automatically use incremental updates if the
--incrementalflag is set (the default). -
Adjust batch size: Use
--batch-size=Nto change the number of rows processed in each batch. Smaller batches use less memory but may be slower.
To run the migration on a regular schedule, you can set up a cron job manually:
crontab -eThen add a line like this to run the migration every 15 minutes:
*/15 * * * * /path/to/pg_migrate.sh "source_db_url" "target_db_url" "target_schema" "true" >> /path/to/pg_migrate.log 2>&1
- The script connects to both source and target databases
- It gets a list of tables from the source schema
- For each table:
- Creates an equivalent table in the target schema if it doesn't exist
- If incremental mode is enabled, looks for timestamp columns to track changes
- Copies data from source to target, using timestamps for filtering in incremental mode
- For incremental updates, it tracks the latest timestamp for each table in a state file
- Check the log file (
pg_migrate.log) for error messages - Run with
--debugflag for more detailed logs - For "out of memory" errors, reduce the batch size with
--batch-size=10000or lower - Make sure database credentials are correct and the user has sufficient privileges
- Verify that network access is available to both databases