Implement Decimal support for Postgres backend #10216

GregoryTravis · 2024-06-07T20:36:52Z

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
TypeScript,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
Unit tests have been written where possible.

…ed builder, and do not use biginteger builder for integral bigdecimal values

GregoryTravis · 2024-06-27T15:25:03Z

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Aggregate_Helper.enso

@@ -29,7 +29,7 @@ make_aggregate_column : DB_Table -> Aggregate_Column -> Text -> Dialect -> (Any
 make_aggregate_column table aggregate as dialect infer_return_type problem_builder =
 is_non_empty_selector v = v.is_nothing.not && v.not_empty
 simple_aggregate op_kind columns =
- expression = SQL_Expression.Operation op_kind (columns.map c->c.expression)
+ expression = dialect.cast_op_type op_kind columns (SQL_Expression.Operation op_kind (columns.map c->c.expression))


I am adding the cast here, at the SQL level, instead of using DB_Column.cast. This seems like the right thing to me, because we really just with that Postgres understood that a sum of integers is also an integer, and since it doesn't, we're helping it out here.

And by doing it at this point, the regular type inference code will take care of the rest.

It seems like the right place to call into this hook.

I'm wondering if this is the right thing to do with the type change though.
While I understand we want to work with long and not BigInteger in the in-memory backend, in a remote Postgres Database I'm not sure if the big-integer arithmetic will be the bottleneck. It may be in some cases I guess...

The problem is - what if the result overflows? If the type of the Sum is the same as the type of the input, it is very easy to overflow it - just take two elements close to the maximum value representable in the input type and sum them.

I think that is exactly the reason why the Postgres DB has these rules for 'promoting' the aggregate type to a larger type - to be able to hold a sum. I imagine it still has its limitations - a sum of int4 column is promoted to int8 so if we sum 2^32 values of max int4 we could get it to overflow - but I assume it is rather rare to sum that many values, so it's probably 'good enough'.

~~However as noted above - if we do not enlarge the type, we can get an overflow very easily with just 2 rows.~~

~~Probably the problem is that our columns are mostly int8 and so the only 'larger' type that we can promote to is the numeric (big-integer) type.~~

~~Maybe we should use the int4 type more if our data fits only 32-bits? Then the aggregate would become int8, still being a 'fast' number.~~

If we still want to have this cast, we should at least ensure that we will not corrupt the data. We must add a test case where we have a column containing e.g. 2 rows with max int8 value, and add them together. Let's see what happens :)

I imagine if we get some kind of 'overflow' error - that will be OK. Ideally we should intercept this error and give the user a clear indication what happened and suggest that they can cast to a larger type (e.g. numeric) to get the result (paying the performance price).

I think if we'd get a value that is modulo-truncated - that would be unacceptable as it would be a silent data corruption. But looking at a simple example:

radeusgd=> SELECT pg_typeof(9223372036854775808); pg_typeof ----------- numeric (1 row) radeusgd=> SELECT CAST(9223372036854775808 AS int4); ERROR: integer out of range

I imagine we will likely just get an error, so it would just be good to intercept it and handle in a nice way for the user.

Ooops, I have wrote this comment before reading Postgres_Dialect.enso and I think I misunderstood the idea here.

Okay, so do I understand correctly that the cast is needed to set the scale of the NUMERIC type to 0 so that it is interpreted as BigInteger and not BigDecimal?

I thought that we are casting to int8 but apparently we are not, sorry for my confusing comments above...

Still, I guess it may be good to add such a test (Summing 2 rows that contain MAX INT64 values). It will be a good test verifying that our implementation is correct and does not cause data corruption in all kinds of DB backends (as I imagine how well the backends handle this may vary).

Added this test.

radeusgd · 2024-06-27T15:48:20Z

distribution/lib/Standard/AWS/0.0.0-dev/src/Database/Redshift/Internal/Redshift_Dialect.enso

@@ -130,6 +130,12 @@ type Redshift_Dialect
 _ = [approximate_result_type, infer_result_type_from_database_callback]
 column

+ ## PRIVATE
+ Add an extra cast to adjust the output type of certain operations with certain arguments.


This seems very useful, I imagine I may want to use a hook like this in the Snowflake dialect as well to handle some weird edge cases.

I'd appreciate if we can make this comment a bit more detailed to understand how it is supposed to be used.

I'd like to know:

when is this method called: is it when returning a result from any operation? or just some operations?

I imagine it's good to say that 'In most cases this method will just return the expression unchanged, it is used only to override the type in cases where the default one that the database uses is not what we want.'.

radeusgd · 2024-06-27T16:09:47Z

distribution/lib/Standard/Database/0.0.0-dev/src/Internal/Postgres/Postgres_Dialect.enso

+ ## PRIVATE
+ Add an extra cast to adjust the output type of certain operations with certain arguments.
+ cast_op_type self (op_kind:Text) (args:(Vector Internal_Column)) (expression:SQL_Expression) =
+ is_bigint ic = ic.sql_type_reference.get.typeid == Types.BIGINT


personally I prefer to call it int8 even though BIGINT is indeed Postgres's 'primary' name for this. It is just easy to confuse it with Java's BigInteger (Postgres's NUMERIC).

radeusgd

I like the idea of cast_op_type hook. I'd just appreciate some more info about it - e.g. when it is called - as currently it only happens on aggregates, so if I want to use it in any other place, I'd have to remember add the call to the hook - which is totally OK (no need to call the hook in yet unused places) - but let's make that clear.

I was a bit confused about how the type-change logic works, but if I understand correctly it seems all good.

GregoryTravis · 2024-06-27T19:10:40Z

Not ready to merge this yet -- I want to add many more tests checking for accidental Decimals.

GregoryTravis · 2024-07-01T14:36:57Z

Ready to merge; returns types verified.

GregoryTravis added 30 commits June 4, 2024 15:26

treat scale nothing as unspecifed

3ad23f1

cast to decimal

48464af

float int biginteger

6b694c7

conversion failure ints

7951326

Merge branch 'develop' into wip/gmt/10162-d-col-conversions

1e3da12

loss of decimal precision

aba8d48

precision loss for mixed column to float

145f9c0

mixed columns

8ce0dfe

loss of precision on inexact float conversion

e41bf5b

cleanup, reuse

1200cce

changelog

8d94d5c

Merge branch 'develop' into wip/gmt/10162-d-col-conversions

88e6714

review

631820c

no fits bd

d4ded95

no warning on 0.1 conversion

24b265d

fmt

3ac81b7

big_decimal_fetcher

a4b364b

Merge branch 'develop' into wip/gmt/10213-read-pg

2553533

default fetcher and statement setting

c20d830

Merge branch 'develop' into wip/gmt/10213-read-pg

0e6bd39

round-trip d

5d7f770

fix warning

1f97af7

Merge branch 'develop' into wip/gmt/10213-read-pg

49e50b3

expr +10

c11e3f6

double builder retype to bigdecimal

b62dcea

Use BD fetcher for underspecified postgres numeric column, not inferr…

32fe09b

…ed builder, and do not use biginteger builder for integral bigdecimal values

Merge branch 'develop' into wip/gmt/10213-read-pg

643c231

fix tests

2dd3927

fix test

f66889e

Merge branch 'develop' into wip/gmt/10213-read-pg

7e748e1

GregoryTravis added 8 commits June 26, 2024 11:24

no-ops for other dialects

b9afc85

Types

111fad1

sum + avg

4ae2902

avg + sum test

4c4bc5b

fix test

b4f7115

update agg type inference test

4206905

Merge branch 'develop' into wip/gmt/10213-read-pg

358f89f

wip

4815631

GregoryTravis commented Jun 27, 2024

View reviewed changes

radeusgd reviewed Jun 27, 2024

View reviewed changes

radeusgd approved these changes Jun 27, 2024

View reviewed changes

GregoryTravis added 2 commits June 27, 2024 14:48

is_int8, stddev

2c87d7a

more doc, overflow check

bea018a

GregoryTravis added 3 commits June 27, 2024 15:12

fmt

40f9b8d

Merge branch 'develop' into wip/gmt/10213-read-pg

357935b

finish round-trip test

8652753

GregoryTravis changed the title ~~Read Decimal column from Postgres into in-memory table~~ Implement Decimal type for Postgres backend Jun 28, 2024

GregoryTravis changed the title ~~Implement Decimal type for Postgres backend~~ Implement Decimal support for Postgres backend Jun 28, 2024

wip

6e7524f

GregoryTravis marked this pull request as ready for review June 28, 2024 22:33

GregoryTravis requested review from jdunkerley, AdRiley and marthasharkey as code owners June 28, 2024 22:33

Merge branch 'develop' into wip/gmt/10213-read-pg

47dc410

AdRiley approved these changes Jul 2, 2024

View reviewed changes

GregoryTravis merged commit 48fb999 into develop Jul 2, 2024
36 checks passed

GregoryTravis deleted the wip/gmt/10213-read-pg branch July 2, 2024 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Decimal support for Postgres backend #10216

Implement Decimal support for Postgres backend #10216

GregoryTravis commented Jun 7, 2024

GregoryTravis Jun 27, 2024

radeusgd Jun 27, 2024 •

edited

Loading

radeusgd Jun 27, 2024

radeusgd Jun 27, 2024

radeusgd Jun 27, 2024

radeusgd Jun 27, 2024

GregoryTravis Jun 27, 2024

radeusgd Jun 27, 2024

GregoryTravis Jun 27, 2024

radeusgd Jun 27, 2024

GregoryTravis Jun 27, 2024

radeusgd left a comment

GregoryTravis commented Jun 27, 2024

GregoryTravis commented Jul 1, 2024

Implement Decimal support for Postgres backend #10216

Implement Decimal support for Postgres backend #10216

Conversation

GregoryTravis commented Jun 7, 2024

Pull Request Description

Important Notes

Checklist

Choose a reason for hiding this comment

radeusgd Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radeusgd left a comment

Choose a reason for hiding this comment

GregoryTravis commented Jun 27, 2024

GregoryTravis commented Jul 1, 2024

radeusgd Jun 27, 2024 •

edited

Loading