You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating a new Continuous Aggregate with data (having the creation actually materialize the data straight away) the final watermark is incorrect. Meaning, the currently open time bucket is materialized which leads to weird, unexpected results when adding additional data into the currently open time bucket, and forces users to refresh explicitly (or wait for the next materialization task to come around). It is probably ok for materialize_only CAGGs, but is a bad user experience for real-time ones. We already had users wondering about that, but we finally found the reason.
A full reproducer (with data and with no data) is below to see the different behavior when creating with or without data and explicitly do the initial refresh. The latter one behaves correctly btw.
TimescaleDB version affected
2.9.3
PostgreSQL version used
14.7
What operating system did you use?
Cloud
What installation method did you use?
Not applicable
What platform did you run on?
Timescale Cloud
Relevant log output and stack trace
CAGG creation with data materialization:
tsdb=> create table test (
ts timestamptz primary key,
value int not null
);
CREATE TABLE
tsdb=>selectcreate_hypertable('test','ts', chunk_time_interval := interval '1 day');
create_hypertable
--------------------
(33,public,test,t)
(1 row)
tsdb=> insert into testselectt.ts, random_between(10, 50) as value
from generate_series(
now() - interval '14 days',
now(),
interval '1 minute'
) t(ts);
INSERT 0 20161
tsdb=> create materialized view test2
with (timescaledb.continuous) as
select
time_bucket('1 hour', ts) as bucket,
max(value) as value
from test
group by 1;
NOTICE: refreshing continuous aggregate "test2"
HINT: Use WITH NO DATA if you do not want to refresh the continuous aggregate on creation.
CREATE MATERIALIZED VIEW
tsdb=>select
_timescaledb_internal.to_timestamp(
_timescaledb_internal.cagg_watermark(mat_hypertable_id)
)
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
to_timestamp
------------------------
2023-03-02 10:00:00+00
(1 row)
tsdb=>selecttime_bucket('1 hour', now()); time_bucket------------------------ 2023-03-02 09:00:00+00(1 row)tsdb=> select bucket, max(value)from test2group by 1order by 1 desclimit 5; bucket | max------------------------+----- 2023-03-02 09:00:00+00 | 50 2023-03-02 08:00:00+00 | 50 2023-03-02 07:00:00+00 | 51 2023-03-02 06:00:00+00 | 51 2023-03-02 05:00:00+00 | 50(5 rows)tsdb=> insert into test (ts, value)values (now(), 1000);INSERT 0 1tsdb=> select bucket, max(value)from test2group by 1order by 1 desclimit 5; bucket | max------------------------+----- 2023-03-02 09:00:00+00 | 50 2023-03-02 08:00:00+00 | 50 2023-03-02 07:00:00+00 | 51 2023-03-02 06:00:00+00 | 51 2023-03-02 05:00:00+00 | 50(5 rows)
And CAGG creation without immediate data materialization:
tsdb=> create table test (
ts timestamptz primary key,
value int not null
);
CREATE TABLE
tsdb=> select create_hypertable('test', 'ts', chunk_time_interval := interval '1 day');
create_hypertable
--------------------
(35,public,test,t)
(1 row)
tsdb=> insert into test
select t.ts, random_between(10, 50) as value
from generate_series(
now() - interval '14 days',
now(),
interval '1 minute'
) t(ts);
INSERT 0 20161
tsdb=> create materialized view test2
with (timescaledb.continuous) as
select
time_bucket('1 hour', ts) as bucket,
max(value) as value
from test
group by 1
with no data;
CREATE MATERIALIZED VIEW
tsdb=> select
_timescaledb_internal.to_timestamp(
_timescaledb_internal.cagg_watermark(mat_hypertable_id)
)
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
to_timestamp
---------------------------
4714-11-24 00:00:00+00 BC
(1 row)
tsdb=> select time_bucket('1 hour', now());
time_bucket
------------------------
2023-03-02 09:00:00+00
(1 row)
tsdb=> call refresh_continuous_aggregate('test2', '-infinity', now());
CALL
tsdb=> select
_timescaledb_internal.to_timestamp(
_timescaledb_internal.cagg_watermark(mat_hypertable_id)
)
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
to_timestamp
------------------------
2023-03-02 09:00:00+00
(1 row)
tsdb=> select time_bucket('1 hour', now());
time_bucket
------------------------
2023-03-02 09:00:00+00
(1 row)
tsdb=> select
bucket,
max(value)
from test2
group by 1
order by 1 desc
limit 5;
bucket | max
------------------------+-----
2023-03-02 09:00:00+00 | 50
2023-03-02 08:00:00+00 | 50
2023-03-02 07:00:00+00 | 51
2023-03-02 06:00:00+00 | 51
2023-03-02 05:00:00+00 | 50
(5 rows)
tsdb=> insert into test (ts, value)
values (now(), 1000);
INSERT 0 1
tsdb=> select
bucket,
max(value)
from test2
group by 1
order by 1 desc
limit 5;
bucket | max
------------------------+------
2023-03-02 09:00:00+00 | 1000
2023-03-02 08:00:00+00 | 50
2023-03-02 07:00:00+00 | 51
2023-03-02 06:00:00+00 | 51
2023-03-02 05:00:00+00 | 50
(5 rows)
### How can we reproduce the bug?
```bash
Here's a full example of the statements that cause the bug to happen:
drop table test cascade;
create table test (
ts timestamptz primary key,
value int not null
);
select create_hypertable('test', 'ts', chunk_time_interval := interval '1 day');
insert into test
select t.ts, random_between(10, 50) as value
from generate_series(
now() - interval '14 days',
now(),
interval '1 minute'
) t(ts);
create materialized view test2
with (timescaledb.continuous) as
select
time_bucket('1 hour', ts) as bucket,
max(value) as value
from test
group by 1;
select
_timescaledb_internal.to_timestamp(
_timescaledb_internal.cagg_watermark(mat_hypertable_id)
)
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
select time_bucket('1 hour', now());
select
bucket,
max(value)
from test2
group by 1
order by 1 desc
limit 5;
insert into test (ts, value)
values (now(), 1000);
select
bucket,
max(value)
from test2
group by 1
order by 1 desc
limit 5;
The text was updated successfully, but these errors were encountered:
Thank you very much for reporting the behavior and the detailed steps to reproduce the problem. I was able to reproduce it in my local installation with PostgreSQL 14 and TimescaleDB 2.10.0. The watermark was wrong and a manual refresh_continuous_aggregate was needed to update the CAGG after a new value was inserted.
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Continuous aggregate
What happened?
When creating a new Continuous Aggregate with data (having the creation actually materialize the data straight away) the final watermark is incorrect. Meaning, the currently open time bucket is materialized which leads to weird, unexpected results when adding additional data into the currently open time bucket, and forces users to refresh explicitly (or wait for the next materialization task to come around). It is probably ok for materialize_only CAGGs, but is a bad user experience for real-time ones. We already had users wondering about that, but we finally found the reason.
A full reproducer (
with data
andwith no data
) is below to see the different behavior when creating with or without data and explicitly do the initial refresh. The latter one behaves correctly btw.TimescaleDB version affected
2.9.3
PostgreSQL version used
14.7
What operating system did you use?
Cloud
What installation method did you use?
Not applicable
What platform did you run on?
Timescale Cloud
Relevant log output and stack trace
And CAGG creation without immediate data materialization:
The text was updated successfully, but these errors were encountered: