To migrate data from Teradata to Google BigQuery (BQ), you can follow these general
steps:
1. Extract data from Teradata: Connect to the Teradata database and fetch the data you want to
migrate. You can use SQL queries to retrieve the data from Teradata tables.
2. Load data into BigQuery: Create a dataset and table in BigQuery to store the migrated data. Then,
use the available methods or tools to load the data into BigQuery. This can be done using the
BigQuery web UI, command-line tools, client libraries, or APIs.
3. Validate and verify the migration: After the data is loaded into BigQuery, perform data validation
to ensure the migration was successful. Compare a sample of data between Teradata and
BigQuery to verify the integrity and accuracy of the migrated data.
Code:
from [Link] import bigquery
import teradatasql
# Teradata connection settings
td_host = '[Link]'
td_port = ‘8080’
td_user = 'admin'
td_password = 'password'
td_database = 'info'
td_table = 'customers'
# BigQuery settings
bq_project = 'tdtobq'
bq_dataset = 'bqtarget'
bq_table = 'convertedsql'
# Teradata connection
td_conn = [Link](host=td_host, port=td_port, user=td_user, password=td_password,
database=td_database)
td_cursor = td_conn.cursor()
# Fetch data from Teradata
td_cursor.execute(f'SELECT * FROM {td_table}')
rows = td_cursor.fetchall()
# BigQuery client
bq_client = [Link]()
# Create BigQuery dataset if it doesn't exist
dataset_ref = bq_client.dataset(bq_dataset)
if not bq_client.get_dataset(dataset_ref):
dataset = [Link](dataset_ref)
dataset = bq_client.create_dataset(dataset)
# Create BigQuery table schema
schema = []
for field in td_cursor.description:
field_name = field[0]
field_type = field[1]
[Link]([Link](field_name, field_type))
# Create BigQuery table if it doesn't exist
table_ref = dataset_ref.table(bq_table)
if not bq_client.get_table(table_ref):
table = [Link](table_ref, schema=schema)
table = bq_client.create_table(table)
# Prepare data for insertion into BigQuery
rows_to_insert = []
for row in rows:
rows_to_insert.append(dict(zip([field[0] for field in td_cursor.description], row)))
# Insert data into BigQuery
errors = bq_client.insert_rows_json(table_ref, rows_to_insert)
if errors:
print(f'Error inserting rows into BigQuery table: {errors}')
else:
print('Data migration successful!')
# Close Teradata connection
td_conn.close()
Explanation
1. The code imports the necessary libraries: bigquery from the [Link] package and teradatasql
for Teradata connectivity.
2. The Teradata connection settings are specified, including the host, username, password, database,
and table.
3. The BigQuery settings are specified, including the project, dataset, and table where the data will
be migrated.
4. The code establishes a connection to Teradata using the teradatasql library and creates a cursor for
executing SQL queries.
5. The SQL query SELECT * FROM {td_table} is executed to fetch all the rows from the specified
Teradata table. The fetched rows are stored in the rows variable.
6. The code creates a connection to BigQuery using the [Link] library.
7. It checks if the specified BigQuery dataset exists. If it doesn't, it creates the dataset using the
bq_client.create_dataset() method.
8. Next, the code creates the schema for the BigQuery table by iterating over the description of the
Teradata cursor and creating [Link] objects for each field.
9. It checks if the specified BigQuery table exists within the dataset. If it doesn't, it creates the table
using the bq_client.create_table() method, passing the previously defined schema.
10. The code prepares the data fetched from Teradata for insertion into BigQuery by converting each
row into a dictionary format.
11. The prepared rows are then inserted into the BigQuery table using the
bq_client.insert_rows_json() method.
12. If any errors occur during the insertion, they are captured and displayed.
13. Finally, the Teradata connection is closed using td_conn.close().