-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Decimal support for Postgres backend #10216
Conversation
…ed builder, and do not use biginteger builder for integral bigdecimal values
@@ -29,7 +29,7 @@ make_aggregate_column : DB_Table -> Aggregate_Column -> Text -> Dialect -> (Any | |||
make_aggregate_column table aggregate as dialect infer_return_type problem_builder = | |||
is_non_empty_selector v = v.is_nothing.not && v.not_empty | |||
simple_aggregate op_kind columns = | |||
expression = SQL_Expression.Operation op_kind (columns.map c->c.expression) | |||
expression = dialect.cast_op_type op_kind columns (SQL_Expression.Operation op_kind (columns.map c->c.expression)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adding the cast here, at the SQL level, instead of using DB_Column.cast
. This seems like the right thing to me, because we really just with that Postgres understood that a sum of integers is also an integer, and since it doesn't, we're helping it out here.
And by doing it at this point, the regular type inference code will take care of the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the right place to call into this hook.
I'm wondering if this is the right thing to do with the type change though.
While I understand we want to work with long
and not BigInteger
in the in-memory backend, in a remote Postgres Database I'm not sure if the big-integer arithmetic will be the bottleneck. It may be in some cases I guess...
The problem is - what if the result overflows? If the type of the Sum is the same as the type of the input, it is very easy to overflow it - just take two elements close to the maximum value representable in the input type and sum them.
I think that is exactly the reason why the Postgres DB has these rules for 'promoting' the aggregate type to a larger type - to be able to hold a sum. I imagine it still has its limitations - a sum of int4
column is promoted to int8
so if we sum 2^32 values of max int4 we could get it to overflow - but I assume it is rather rare to sum that many values, so it's probably 'good enough'.
However as noted above - if we do not enlarge the type, we can get an overflow very easily with just 2 rows.
Probably the problem is that our columns are mostly int8
and so the only 'larger' type that we can promote to is the numeric
(big-integer) type.
Maybe we should use the int4
type more if our data fits only 32-bits? Then the aggregate would become int8
, still being a 'fast' number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still want to have this cast, we should at least ensure that we will not corrupt the data. We must add a test case where we have a column containing e.g. 2 rows with max int8
value, and add them together. Let's see what happens :)
I imagine if we get some kind of 'overflow' error - that will be OK. Ideally we should intercept this error and give the user a clear indication what happened and suggest that they can cast to a larger type (e.g. numeric
) to get the result (paying the performance price).
I think if we'd get a value that is modulo-truncated - that would be unacceptable as it would be a silent data corruption. But looking at a simple example:
radeusgd=> SELECT pg_typeof(9223372036854775808);
pg_typeof
-----------
numeric
(1 row)
radeusgd=> SELECT CAST(9223372036854775808 AS int4);
ERROR: integer out of range
I imagine we will likely just get an error, so it would just be good to intercept it and handle in a nice way for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, I have wrote this comment before reading Postgres_Dialect.enso
and I think I misunderstood the idea here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so do I understand correctly that the cast is needed to set the scale of the NUMERIC
type to 0
so that it is interpreted as BigInteger
and not BigDecimal
?
I thought that we are casting to int8
but apparently we are not, sorry for my confusing comments above...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I guess it may be good to add such a test (Summing 2 rows that contain MAX INT64 values). It will be a good test verifying that our implementation is correct and does not cause data corruption in all kinds of DB backends (as I imagine how well the backends handle this may vary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this test.
@@ -130,6 +130,12 @@ type Redshift_Dialect | |||
_ = [approximate_result_type, infer_result_type_from_database_callback] | |||
column | |||
|
|||
## PRIVATE | |||
Add an extra cast to adjust the output type of certain operations with certain arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very useful, I imagine I may want to use a hook like this in the Snowflake dialect as well to handle some weird edge cases.
I'd appreciate if we can make this comment a bit more detailed to understand how it is supposed to be used.
I'd like to know:
- when is this method called: is it when returning a result from any operation? or just some operations?
- I imagine it's good to say that 'In most cases this method will just return the expression unchanged, it is used only to override the type in cases where the default one that the database uses is not what we want.'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
## PRIVATE | ||
Add an extra cast to adjust the output type of certain operations with certain arguments. | ||
cast_op_type self (op_kind:Text) (args:(Vector Internal_Column)) (expression:SQL_Expression) = | ||
is_bigint ic = ic.sql_type_reference.get.typeid == Types.BIGINT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I prefer to call it int8
even though BIGINT
is indeed Postgres's 'primary' name for this. It is just easy to confuse it with Java's BigInteger
(Postgres's NUMERIC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of cast_op_type
hook. I'd just appreciate some more info about it - e.g. when it is called - as currently it only happens on aggregates, so if I want to use it in any other place, I'd have to remember add the call to the hook - which is totally OK (no need to call the hook in yet unused places) - but let's make that clear.
I was a bit confused about how the type-change logic works, but if I understand correctly it seems all good.
Not ready to merge this yet -- I want to add many more tests checking for accidental Decimals. |
Ready to merge; returns types verified. |
Pull Request Description
Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.