Improve traceability of row validations when large number of partitions are validated #1276

sundar-mudupalli-work · 2024-09-20T17:03:59Z

Hi,

When generate-table-partitions generates yaml files with validations, it is very hard to trace validation output to specific yaml files and validations within the yaml file - since one yaml file can contain validations for multiple partitions. We recommend that BigQuery be used for validation output and cloud run be used to run validations. Within BigQuery, we can track a validation output to a run id. We cannot go from run-id to a specific yaml file or cloud run task without a) looking into the logs of each cloud run task or b) figure out the yaml file from the primary keys reported in validations.

I am suggesting two changes - one for generate-table-partitions - by default to add two labels - yaml-file (for yaml file name, e.g. 0004.yaml) and source-filter (for the filter used on the source). The second one is needed because one yaml file can contain multiple validations and each validation has its own run-id. generate-table-partitions can take a --no-labels or -nl option if the user does not want any labels.

I am also suggesting a change to configs run - to take --labels or -l parameter so the user can inject labels when the yaml file is run in cloud run - for e.g data-validation configs run -l task-exec-id="$CLOUD_RUN_EXECUTION",task-index="$CLOUD_RUN_TASK_INDEX" -cdir ...

Sundar Mudupalli

The text was updated successfully, but these errors were encountered:

sundar-mudupalli-work added the good first issue Good issue for new DVT contributors label Sep 20, 2024

helensilva14 added type: feature request 'Nice-to-have' improvement, new feature or different behavior or design. priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) labels Sep 20, 2024

luispavaogoogle self-assigned this Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve traceability of row validations when large number of partitions are validated #1276

Improve traceability of row validations when large number of partitions are validated #1276

sundar-mudupalli-work commented Sep 20, 2024

Improve traceability of row validations when large number of partitions are validated #1276

Improve traceability of row validations when large number of partitions are validated #1276

Comments

sundar-mudupalli-work commented Sep 20, 2024