Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

[AP-591] Use SHOW SCHEMAS|TABLES|COLUMNS instead of INFORMATION_SCHEMA #67

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

koszti
Copy link
Contributor

@koszti koszti commented Mar 11, 2020

Summary
This PR refactoring the code to use SHOW SCHEMAS|TABLES|COLUMNS queries instead of INFORMATION_SCHEMA. Information schema queries are slow in general and can cause failing parallel running taps with snowflake.connector.errors.ProgrammingError: 090030 (22000): Information schema query returned too much data. Please repeat query with more selective predicates.

This PR is also getting rid of PIPELINEWISE.COLUMNS table because the SHOW queries basically provide the same information with good performance.

Solution
After some research it turns out this is a generic problem in snowflake INFORMATION_SCHEMA and other projects have similar issues as well. For example DBT:

Pros:

  • SHOW SCHEMAS|TABLES|COLUMNS will not queue, whereas the select statement can queue
  • SHOW SCHEMAS|TABLES|COLUMNS does not require a running warehouse
  • SHOW SCHEMAS|TABLES|COLUMNS appears to be strictly faster than the information_schema alternative

Cons:

  • SHOW SCHEMAS|TABLES|COLUMNS returns max 10k rows.

If a SHOW query returns more than 9999 records then an exception will be raised. This is a limitation of SHOW COLUMNS but 10k rows should be enough for hundreds of average length tables. If 10k columns not enough then we can still query INFORMATION_SCHEMA as a fallback method but this is not in scope of this PR

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants