-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing Database tests and Snowflake Dialect - part 3 out of ... #10458
Conversation
…table once instead of ~20 times!
…est names with parentheses did not work
|
||
public class SnowflakeJDBCUtils { | ||
private static final DateTimeFormatter dateTimeWithOffsetFormatter = | ||
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this is probably why I have the 'remove T' logic later. I probably should update this to
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS XXX"); | |
DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSSSSSSSS XX"); |
or sth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check we read in integers as Long columns.
distribution/lib/Standard/Database/0.0.0-dev/src/Internal/SQL_Type_Reference.enso
Outdated
Show resolved
Hide resolved
distribution/lib/Standard/Database/0.0.0-dev/src/SQL_Statement.enso
Outdated
Show resolved
Hide resolved
distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Date_Time_Helpers.enso
Outdated
Show resolved
Hide resolved
@@ -408,11 +366,11 @@ add_specs suite_builder setup = | |||
t = table_builder [["X", [Nothing, 1, 2, 3000]], ["Y", [Nothing, True, False, True]]] | |||
|
|||
c1 = t.at "X" . cast Value_Type.Char | |||
c1.value_type.is_text . should_be_true | |||
c1.value_type . should_be_a (Value_Type.Char ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this better than .is_text
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if the test fails, is_text.should_be_true
gives us "expected False to be True" which tells me nothing.
The new approach tells me e.g. "expected ... built with constructor Char but got ... built with constructor Integer" - which is much more informative as to what went wrong.
distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Dialect.enso
Outdated
Show resolved
Hide resolved
values_vec.zip dps_vec v-> dp-> f v dp use_bankers | ||
Batch_Runner.Value batch f | ||
|
||
run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run self (use_bankers : Boolean) (action : Batch_Builder -> Nothing) = | |
run self (use_bankers : Boolean) (action : (Number -> Integer -> Check_Instance) -> Nothing) = |
What is the advantage of batching the operations like this? How does it make it more efficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Snowflake that is running in the cloud, the minimum latency from my computer to the Frankfurt data center is ~15-20ms. Usually the Snowflake query overhead totals about 70ms. Merely executing the computation is much less - that's why e.g. Postgres tests were much faster.
With batching we pay the roundtrip/overall query overhead only once per batch, so the execution time is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made me check the efficiency and ooops - there's basically no significant difference!
Test | Without batching | With batching |
---|---|---|
Can round positive decimals correctly | 1409ms | 1107ms |
Can round negative decimals correctly | 1013ms | 2261ms |
Explicit and implicit 0 decimal places work the same | 606ms | 395ms |
Can round zero and small decimals correctly | 509ms | 1233ms |
Can round positive decimals to a specified number of decimal places | 2011ms | 3930ms |
Can round negative decimals to a specified number of decimal places | 1808ms | 3504ms |
Can round positive decimals to a specified negative number of decimal places | 1656ms | 1368ms |
Can round negative decimals to a specified negative number of decimal places | 1743ms | 949ms |
So the problem is - when I call round
on a DB_Column
, it checks its value_type
, triggering a small no-results query that asks the DB for the type.
Thus regardless if I batch run N rounds on separate queries or batch them in a single query, I still get: N+1 queries with batching (N type checks and 1 more complex execute) vs 2*N queries without batching (type check + execute for each operation). Apparently, timing wise the result is similar.
I want to fix the batching and see if it gets better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in f9bbbe0 I have changed how we work with the SQL_Type_Reference
- now if a statement containing a column with that type is read in, its type is cached into the reference - so that a separate query for reading the reference is avoided. We only perform a separate types-only query if the type is checked before querying.
With that fix the results are much better:
Test | Without batching (old measurement) | With fixed batching |
---|---|---|
Can round positive decimals correctly | 1409ms | 673ms |
Can round negative decimals correctly | 1013ms | 253ms |
Explicit and implicit 0 decimal places work the same | 606ms | 207ms |
Can round zero and small decimals correctly | 509ms | 165ms |
Can round positive decimals to a specified number of decimal places | 2011ms | 291ms |
Can round negative decimals to a specified number of decimal places | 1808ms | 371ms |
Can round positive decimals to a specified negative number of decimal places | 1656ms | 419ms |
Can round negative decimals to a specified negative number of decimal places | 1743ms | 423ms |
Now the speed-up is 2-6x so the batching is worth it. With also disabling of some edge case tests, the overall Rounding part is down from ~60s to 4s.
# Conflicts: # distribution/lib/Standard/Snowflake/0.0.0-dev/src/Internal/Snowflake_Type_Mapping.enso # distribution/lib/Standard/Test/0.0.0-dev/src/Suite.enso # test/Table_Tests/src/Common_Table_Operations/Column_Operations_Spec.enso # test/Table_Tests/src/Common_Table_Operations/Map_To_Table_Spec.enso # test/Table_Tests/src/Common_Table_Operations/Util.enso
Pull Request Description
..By_Type ..Integer
(only in Snowflake backend) because the Decimal column there is its 'de-facto' Integer column replacement.Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.