Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Initial materialization (with data) materializes the currently open time bucket #5379

Open
noctarius opened this issue Mar 2, 2023 · 1 comment

Comments

@noctarius
Copy link
Contributor

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Continuous aggregate

What happened?

When creating a new Continuous Aggregate with data (having the creation actually materialize the data straight away) the final watermark is incorrect. Meaning, the currently open time bucket is materialized which leads to weird, unexpected results when adding additional data into the currently open time bucket, and forces users to refresh explicitly (or wait for the next materialization task to come around). It is probably ok for materialize_only CAGGs, but is a bad user experience for real-time ones. We already had users wondering about that, but we finally found the reason.

A full reproducer (with data and with no data) is below to see the different behavior when creating with or without data and explicitly do the initial refresh. The latter one behaves correctly btw.

TimescaleDB version affected

2.9.3

PostgreSQL version used

14.7

What operating system did you use?

Cloud

What installation method did you use?

Not applicable

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

CAGG creation with data materialization:

tsdb=> create table test (
    ts timestamptz primary key,
    value int not null
);
CREATE TABLE

tsdb=> select create_hypertable('test', 'ts', chunk_time_interval := interval '1 day');
 create_hypertable
--------------------
 (33,public,test,t)
(1 row)

tsdb=> insert into test
select t.ts, random_between(10, 50) as value
from generate_series(
    now() - interval '14 days',
    now(),
    interval '1 minute'
) t(ts);
INSERT 0 20161

tsdb=> create materialized view test2
    with (timescaledb.continuous) as
    select
        time_bucket('1 hour', ts) as bucket,
        max(value) as value
    from test
    group by 1;
NOTICE:  refreshing continuous aggregate "test2"
HINT:  Use WITH NO DATA if you do not want to refresh the continuous aggregate on creation.
CREATE MATERIALIZED VIEW
tsdb=> select
    _timescaledb_internal.to_timestamp(
        _timescaledb_internal.cagg_watermark(mat_hypertable_id)
    )
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
      to_timestamp
------------------------
 2023-03-02 10:00:00+00
(1 row)

tsdb=> select time_bucket('1 hour', now());
      time_bucket
------------------------
 2023-03-02 09:00:00+00
(1 row)

tsdb=> select
    bucket,
    max(value)
from test2
group by 1
order by 1 desc
limit 5;
         bucket         | max
------------------------+-----
 2023-03-02 09:00:00+00 |  50
 2023-03-02 08:00:00+00 |  50
 2023-03-02 07:00:00+00 |  51
 2023-03-02 06:00:00+00 |  51
 2023-03-02 05:00:00+00 |  50
(5 rows)

tsdb=> insert into test (ts, value)
values (now(), 1000);
INSERT 0 1

tsdb=> select
    bucket,
    max(value)
from test2
group by 1
order by 1 desc
limit 5;
         bucket         | max
------------------------+-----
 2023-03-02 09:00:00+00 |  50
 2023-03-02 08:00:00+00 |  50
 2023-03-02 07:00:00+00 |  51
 2023-03-02 06:00:00+00 |  51
 2023-03-02 05:00:00+00 |  50
(5 rows)

And CAGG creation without immediate data materialization:

tsdb=> create table test (
    ts timestamptz primary key,
    value int not null
);
CREATE TABLE

tsdb=> select create_hypertable('test', 'ts', chunk_time_interval := interval '1 day');
 create_hypertable
--------------------
 (35,public,test,t)
(1 row)

tsdb=> insert into test
select t.ts, random_between(10, 50) as value
from generate_series(
    now() - interval '14 days',
    now(),
    interval '1 minute'
) t(ts);
INSERT 0 20161

tsdb=> create materialized view test2
    with (timescaledb.continuous) as
    select
        time_bucket('1 hour', ts) as bucket,
        max(value) as value
    from test
    group by 1
with no data;
CREATE MATERIALIZED VIEW

tsdb=> select
    _timescaledb_internal.to_timestamp(
        _timescaledb_internal.cagg_watermark(mat_hypertable_id)
    )
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
       to_timestamp
---------------------------
 4714-11-24 00:00:00+00 BC
(1 row)

tsdb=> select time_bucket('1 hour', now());
      time_bucket
------------------------
 2023-03-02 09:00:00+00
(1 row)

tsdb=> call refresh_continuous_aggregate('test2', '-infinity', now());
CALL

tsdb=> select
    _timescaledb_internal.to_timestamp(
        _timescaledb_internal.cagg_watermark(mat_hypertable_id)
    )
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';
      to_timestamp
------------------------
 2023-03-02 09:00:00+00
(1 row)

tsdb=> select time_bucket('1 hour', now());
      time_bucket
------------------------
 2023-03-02 09:00:00+00
(1 row)

tsdb=> select
    bucket,
    max(value)
from test2
group by 1
order by 1 desc
limit 5;
         bucket         | max
------------------------+-----
 2023-03-02 09:00:00+00 |  50
 2023-03-02 08:00:00+00 |  50
 2023-03-02 07:00:00+00 |  51
 2023-03-02 06:00:00+00 |  51
 2023-03-02 05:00:00+00 |  50
(5 rows)

tsdb=> insert into test (ts, value)
values (now(), 1000);
INSERT 0 1

tsdb=> select
    bucket,
    max(value)
from test2
group by 1
order by 1 desc
limit 5;
         bucket         | max
------------------------+------
 2023-03-02 09:00:00+00 | 1000
 2023-03-02 08:00:00+00 |   50
 2023-03-02 07:00:00+00 |   51
 2023-03-02 06:00:00+00 |   51
 2023-03-02 05:00:00+00 |   50
(5 rows)


### How can we reproduce the bug?

```bash
Here's a full example of the statements that cause the bug to happen:


drop table test cascade;

create table test (
    ts timestamptz primary key,
    value int not null
);

select create_hypertable('test', 'ts', chunk_time_interval := interval '1 day');

insert into test
select t.ts, random_between(10, 50) as value
from generate_series(
    now() - interval '14 days',
    now(),
    interval '1 minute'
) t(ts);

create materialized view test2
    with (timescaledb.continuous) as
    select
        time_bucket('1 hour', ts) as bucket,
        max(value) as value
    from test
    group by 1;

select
    _timescaledb_internal.to_timestamp(
        _timescaledb_internal.cagg_watermark(mat_hypertable_id)
    )
from _timescaledb_catalog.continuous_agg
where user_view_name = 'test2';

select time_bucket('1 hour', now());

select
    bucket,
    max(value)
from test2
group by 1
order by 1 desc
limit 5;

insert into test (ts, value)
values (now(), 1000);

select
    bucket,
    max(value)
from test2
group by 1
order by 1 desc
limit 5;
@noctarius noctarius added the bug label Mar 2, 2023
@jnidzwetzki
Copy link
Contributor

Hello @noctarius,

Thank you very much for reporting the behavior and the detailed steps to reproduce the problem. I was able to reproduce it in my local installation with PostgreSQL 14 and TimescaleDB 2.10.0. The watermark was wrong and a manual refresh_continuous_aggregate was needed to update the CAGG after a new value was inserted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants