Skip to content

RealistAI/datetime-type-converter-test-sql

Repository files navigation

BigQuery Migration Automation Tool

The BigQuery Migration Automation Tool is used with the BigQuery Migration Service to batch transpile all of the teradata SQLs to BigQuery SQLs and then validates those queries with Bigquery dry-run. Then the BigQuery SQLs are pushed to github as well as a transpilation logs table in BigQuery.

Architecture

bq-migration-tool-flow drawio (1)

Tools Used

  • BQ Migration CLI - CLI tooling built by the BigQuery Migration Service team that allows you to interact with the BigQuery Migration Service.
  • Makefile- A Makefile is used to run the python scripts, the pip installs and all the requirements needed to run the BigQuery Migration Automation Tool.

Required repositories

UC4 SQL Repo
This is the repository of all of the validated SQLs generated by the BigQuery Migration Service tool. The tool uploads the verified SQL to this repository.

Folder Structure:

|-- Teradata SQLs
    |--radd_master_directory
        |-- my_sql.sql
        … 
|-- BigQuery SQLs
    |--simba_directory
        |-- my_sql.sql

The Teradata SQLs folder contains all of the Teradata SQLs for the UC4 Jobs. The BigQuery SQLs folder contains all of the converted BigQuery SQLs for the UC4 Jobs

DWH Migration Tools Repo
This is the Github repository that contains the dwh-migration-tools that is required for the transpilation of the Teradata SQL.

Setup

The first part of the Makefile will run the setup.py. This script will clone the required repos into the local file system, if the given Github repo exists already in our local file system, we will do a git pull instead.

Dataset Mapping

The Dataset Mapping parses through the given SQL for each job, extracts the table names attached to each job, and creates a mapping configuration file that maps those tables into the correct dataset for the associated business unit.

Transpilation

The transpilation is the process that takes the terdata SQL and transpiles it into BigQuery SQL. This process is completed using bqms-run. The script sets the environment variables required by the BQMS tool and then run the bqms-run command to initilize the transpilation process.

Dry Run

We then iterate through the files in the BQMS output, we submit a dry run for every query for each specific job. If the query is successful the file will then be moved into the UC4_SQL_REPO in the bigquery_sql/ directory. If the querys it will stay in the teradata_sql/ directory.

Transpilation Logs

At the end of the Dry-run validation, whether a dry-run is successul for not, the query data is uploaded to the transpilation_logs table in BigQuery where it can be accessed to get accurate logs for the dry-runs success' or failures. If the Dry-run is successful it will have a status of SUCCEEDED, it will have the time the dry-run ran and the specific query that succeeded. If the dry-run fails it will have a status of FAILED, it will have the time the dry-run ran, the specific query that failed and the error message explaining why the dry-run validation wasn't successful.

Github Integration

Upon completion of the validation process, the script will create a new branch in the repository, and push the new branch with its changes to the repository stored in github.

Usage

In order to utilize this tool, you first need to clone the project into the directory of your choice git clone https:/RealistAI/bq_migration_automation_tool.git, navigate into the newly cloned directory cd bq_migration_automation_tool, and alter the config.py to your own specification. Create a Pip virtual environment using pipenv shell and install the required libraries pipenv install, and run the Makefile make run.

Configuration Options

DWH_MIGRATION_TOOL_REPO

The repository url and branch containing the dwh-migration-tools

UC4_SQL_REPO

The repository url and branch containing the SQL's to transpile & validate

UC4_SQL_REPO_NAME

The name of the repository containing the SQL's to transpile & validate

BASE_PATH

The base path for which the dataset mapping grabs the SQLs that is parses through and adjusts to work for BigQuery

PROJECT

The name of the Google Cloud Platform project that will perform the bulk transpilation & validation

DATASET

The name of the Google Cloud Platform Dataset that will perform the bulk transpilation & validation.

PREPROCESED_BUCKET

A Google Cloud Storage bucket that will be used by bqms-run as a staging area for the translation process

SOURCE_SQL_PATH

The directory in your Github repository containing .sql files for translation and validation

TRANSLATED_BUCKET

A Google Cloud Storage bucket that will be used by bqms-run to store translated files before dumping them back into the local file system.

SQL_TO_VALIDATE

The local directory that bqms-run will use to store the results of the run.

TARGET_SQL_PATH

The directory within the origin Github repository to contain the translated and validated .sql files

CONFIG_BASE

The path to the base config directyory which hosts the config.yaml file and the object name mapping file.

CONFIG_YAML

The path to the dwh-migration-tools config.yaml file.

OBJECT MAPPING

The path to the object name mapping configuration file.