Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458

radeusgd · 2024-07-05T13:29:21Z

Pull Request Description

Related to Snowflake Dialect #9486
Fixes types in literal tables that are used throughout the tests
Tries to makes testing faster by disabling some edge cases, trying batching some queries, re-using the main connection and trying to re-use tables more
Implements date/time type mapping and operations for Snowflake
Updates type mapping to correctly reflect what Snowflake does
- Disables warnings for Integer->Decimal coercion as that's too annoying and implicitly understood in Snowflake
- Allows to select a Decimal column with ..By_Type ..Integer (only in Snowflake backend) because the Decimal column there is its 'de-facto' Integer column replacement.

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
Unit tests have been written where possible.

…0 tests

…table once instead of ~20 times!

…est names with parentheses did not work

radeusgd · 2024-07-05T13:39:22Z

std-bits/snowflake/src/main/java/org/enso/snowflake/SnowflakeJDBCUtils.java

+
+public class SnowflakeJDBCUtils {
+ private static final DateTimeFormatter dateTimeWithOffsetFormatter =
+ DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX");


TODO: this is probably why I have the 'remove T' logic later. I probably should update this to

Suggested change

DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX");

DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSSSSSSSS XX");

or sth

jdunkerley

Can we check we read in integers as Long columns.

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/SQL_Type_Reference.enso

distribution/lib/Standard/Database/0.0.0-dev/src/SQL_Statement.enso

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Date_Time_Helpers.enso

jdunkerley · 2024-07-05T14:36:25Z

test/Table_Tests/src/Common_Table_Operations/Conversion_Spec.enso

@@ -408,11 +366,11 @@ add_specs suite_builder setup =
 t = table_builder [["X", [Nothing, 1, 2, 3000]], ["Y", [Nothing, True, False, True]]]

 c1 = t.at "X" . cast Value_Type.Char
- c1.value_type.is_text . should_be_true
+ c1.value_type . should_be_a (Value_Type.Char ...)


why is this better than .is_text?

Because if the test fails, is_text.should_be_true gives us "expected False to be True" which tells me nothing.

The new approach tells me e.g. "expected ... built with constructor Char but got ... built with constructor Integer" - which is much more informative as to what went wrong.

distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso

GregoryTravis · 2024-07-05T17:47:11Z

test/Base_Tests/src/Data/Round_Spec.enso

+ values_vec.zip dps_vec v-> dp-> f v dp use_bankers
+ Batch_Runner.Value batch f
+
+ run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) =


Suggested change

run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) =

run self (use_bankers : Boolean) (action : (Number -> Integer -> Check_Instance) -> Nothing) =

What is the advantage of batching the operations like this? How does it make it more efficient?

In Snowflake that is running in the cloud, the minimum latency from my computer to the Frankfurt data center is ~15-20ms. Usually the Snowflake query overhead totals about 70ms. Merely executing the computation is much less - that's why e.g. Postgres tests were much faster.

With batching we pay the roundtrip/overall query overhead only once per batch, so the execution time is faster.

You made me check the efficiency and ooops - there's basically no significant difference!

Test Without batching With batching

Can round positive decimals correctly 1409ms 1107ms

Can round negative decimals correctly 1013ms 2261ms

Explicit and implicit 0 decimal places work the same 606ms 395ms

Can round zero and small decimals correctly 509ms 1233ms

Can round positive decimals to a specified number of decimal places 2011ms 3930ms

Can round negative decimals to a specified number of decimal places 1808ms 3504ms

Can round positive decimals to a specified negative number of decimal places 1656ms 1368ms

Can round negative decimals to a specified negative number of decimal places 1743ms 949ms

So the problem is - when I call round on a DB_Column, it checks its value_type, triggering a small no-results query that asks the DB for the type.

Thus regardless if I batch run N rounds on separate queries or batch them in a single query, I still get: N+1 queries with batching (N type checks and 1 more complex execute) vs 2*N queries without batching (type check + execute for each operation). Apparently, timing wise the result is similar.

I want to fix the batching and see if it gets better.

So in f9bbbe0 I have changed how we work with the SQL_Type_Reference - now if a statement containing a column with that type is read in, its type is cached into the reference - so that a separate query for reading the reference is avoided. We only perform a separate types-only query if the type is checked before querying.

With that fix the results are much better:

Test Without batching (old measurement) With fixed batching

Can round positive decimals correctly 1409ms 673ms

Can round negative decimals correctly 1013ms 253ms

Explicit and implicit 0 decimal places work the same 606ms 207ms

Can round zero and small decimals correctly 509ms 165ms

Can round positive decimals to a specified number of decimal places 2011ms 291ms

Can round negative decimals to a specified number of decimal places 1808ms 371ms

Can round positive decimals to a specified negative number of decimal places 1656ms 419ms

Can round negative decimals to a specified negative number of decimal places 1743ms 423ms

Now the speed-up is 2-6x so the batching is worth it. With also disabling of some edge case tests, the overall Rounding part is down from ~60s to 4s.

…rate column

# Conflicts: # distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Type_Mapping.enso # distribution/lib/Standard/Test/0.0.0-dev/src/Suite.enso # test/Table_Tests/src/Common_Table_Operations/Column_Operations_Spec.enso # test/Table_Tests/src/Common_Table_Operations/Map_To_Table_Spec.enso # test/Table_Tests/src/Common_Table_Operations/Util.enso

radeusgd added 30 commits July 5, 2024 13:18

notes

32ab547

fixing select columns tests

56638c8

update conflict after #10372 - use sort now instead of order_by in tests

7dd93c3

fix iif test

76d977d

test for #10402

036c012

update how we check value type for clearer errors

9a7e80f

fail hard if cannot connect to Snowflake instead of silently running …

7db1e29

…0 tests

better message in test suite

bb1c4bc

fix test

c9f6fcc

fixing COUNT DISTINCT and trying to fix FIRST

d4d14ce

disable first / last

722c2b2

fix COUNT DISTINCT ignoring NULLs

64d1c06

checkpoint

a13437e

optimize aggregate by sharing tables/connection - create a 2.5k rows …

4a45ab9

…table once instead of ~20 times!

fixing shortest/longest

17e45dd

unused var

ec4050f

workaround for #10412

63b04d5

fix empty COUNT aggs

8c4e8a6

naming tests

ca11ce3

re-using same connection in tests

d271f3b

almost 50% faster Core_Spec by more tables sharing

be9aa99

sort tables in Core_Spec

b708e76

must be same connection

67bc315

more sharing

45e65ee

sharing in Date_Time_Spec

493897c

cross tab share connection

4ab02fd

support setting Date column

ed16ea9

wip

d2d03da

WIP

cf30d26

correctly escape regex characters in test name patterns - otherwise t…

3abd986

…est names with parentheses did not work

radeusgd added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 5, 2024

radeusgd self-assigned this Jul 5, 2024

radeusgd requested review from jdunkerley, GregoryTravis, AdRiley and marthasharkey as code owners July 5, 2024 13:29

radeusgd added 2 commits July 5, 2024 15:31

fix a few types

66be4ba

remove commented out code

edf512f

radeusgd commented Jul 5, 2024

View reviewed changes

jdunkerley approved these changes Jul 5, 2024

View reviewed changes

GregoryTravis reviewed Jul 5, 2024

View reviewed changes

radeusgd added 8 commits July 6, 2024 10:19

one more workaround for #10438

9ba1e41

Merge branch 'refs/heads/develop' into wip/radeusgd/snowflake-dialect-3

63c5111

fix

8dab4b8

workaround for #10465

81b2c82

Merge branch 'refs/heads/develop' into wip/radeusgd/snowflake-dialect-3

9f4fcad

cache types from the query to avoid re-fetching queries for each sepa…

f9bbbe0

…rate column

Merge branch 'refs/heads/develop' into wip/radeusgd/snowflake-dialect-3

f6cbe2e

fix secret test

cb679b8

GregoryTravis approved these changes Jul 8, 2024

View reviewed changes

radeusgd added 4 commits July 8, 2024 18:50

update spec to treat integer roundtrip accordingly

ea71e27

Merge branch 'refs/heads/develop' into wip/radeusgd/snowflake-dialect-3

49484f4

addressing CR comments

19c458f

radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Jul 10, 2024

radeusgd added 3 commits July 10, 2024 13:15

adapt after #10483

5f4f102

adapt after #10474

dd682e0

fix typo

f2cabe6

mergify bot merged commit 48c1784 into develop Jul 10, 2024
36 checks passed

mergify bot deleted the wip/radeusgd/snowflake-dialect-3 branch July 10, 2024 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458

Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458

radeusgd commented Jul 5, 2024 •

edited

Loading

radeusgd Jul 5, 2024

jdunkerley left a comment

jdunkerley Jul 5, 2024

radeusgd Jul 6, 2024

GregoryTravis Jul 5, 2024

radeusgd Jul 6, 2024

radeusgd Jul 6, 2024

radeusgd Jul 8, 2024

	DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX");
	DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSSSSSSSS XX");

	run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) =
	run self (use_bankers : Boolean) (action : (Number -> Integer -> Check_Instance) -> Nothing) =

Test	Without batching	With batching
Can round positive decimals correctly	1409ms	1107ms
Can round negative decimals correctly	1013ms	2261ms
Explicit and implicit 0 decimal places work the same	606ms	395ms
Can round zero and small decimals correctly	509ms	1233ms
Can round positive decimals to a specified number of decimal places	2011ms	3930ms
Can round negative decimals to a specified number of decimal places	1808ms	3504ms
Can round positive decimals to a specified negative number of decimal places	1656ms	1368ms
Can round negative decimals to a specified negative number of decimal places	1743ms	949ms

Test	Without batching (old measurement)	With fixed batching
Can round positive decimals correctly	1409ms	673ms
Can round negative decimals correctly	1013ms	253ms
Explicit and implicit 0 decimal places work the same	606ms	207ms
Can round zero and small decimals correctly	509ms	165ms
Can round positive decimals to a specified number of decimal places	2011ms	291ms
Can round negative decimals to a specified number of decimal places	1808ms	371ms
Can round positive decimals to a specified negative number of decimal places	1656ms	419ms
Can round negative decimals to a specified negative number of decimal places	1743ms	423ms

Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458

Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458

Conversation

radeusgd commented Jul 5, 2024 • edited Loading

Pull Request Description

Important Notes

Checklist

radeusgd Jul 5, 2024

Choose a reason for hiding this comment

jdunkerley left a comment

Choose a reason for hiding this comment

jdunkerley Jul 5, 2024

Choose a reason for hiding this comment

radeusgd Jul 6, 2024

Choose a reason for hiding this comment

GregoryTravis Jul 5, 2024

Choose a reason for hiding this comment

radeusgd Jul 6, 2024

Choose a reason for hiding this comment

radeusgd Jul 6, 2024

Choose a reason for hiding this comment

radeusgd Jul 8, 2024

Choose a reason for hiding this comment

radeusgd commented Jul 5, 2024 •

edited

Loading