Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Real Time aggregation breaks on the month level #7326

Open
PS1TD opened this issue Oct 7, 2024 · 4 comments
Open

[Bug]: Real Time aggregation breaks on the month level #7326

PS1TD opened this issue Oct 7, 2024 · 4 comments
Labels

Comments

@PS1TD
Copy link

PS1TD commented Oct 7, 2024

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Continuous aggregate

What happened?

I am trying to set up a layered real time aggregation structure similar to how it is described in the docs.
Link
I take readings from multiple sensors every minute.
I need to aggregate them into time buckets of hour, day, week, month, and year.
Minute hypertable -> real time hourly
real time hourly -> real time daily
real time daily -> real time weekly
real time daily -> real time monthly
real time monthly -> real time yearly

On the initial creation all aggregations seem to do what they are supposed to.
I get all views properly populated with aggregated data.
However as soon as data changes only the hour, day and week aggregations get updated.
Month and Year aggregates do not behave like real time aggregates in this case.
Trying to manually refresh them does not help and creating an update policy does not help either.

TimescaleDB version affected

2.16.1

PostgreSQL version used

16

What operating system did you use?

Docker Container on Linux (Ubuntu)

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

I have provided all the steps need to reproduce such a situation on a fresh timescaledb-ha:pg16 container
Step 1:
Create Table with sensor readings

CREATE TABLE
    "SensorReadings" (
        "id" VARCHAR NOT NULL,
        "time" timestamptz NOT NULL,
        "temp" NUMERIC NOT NULL
    );

CREATE UNIQUE INDEX "SensorReadings_id_time_key" ON "SensorReadings" ("id", "time");

Step 2:
Fill it with data

INSERT INTO
    public."SensorReadings" ("id", "time", "temp")
VALUES
    (
        GENERATE_SERIES(1, 10),
        (NOW()::date - INTERVAL '3 minutes'),
        TRUNC(RANDOM() * 100)
    );

INSERT INTO
    public."SensorReadings" ("id", "time", "temp")
VALUES
    (
        GENERATE_SERIES(1, 10),
        (NOW()::date - INTERVAL '2 minutes'),
        TRUNC(RANDOM() * 100)
    );

Step 3:
Convert to hypertable

SELECT
    public.create_hypertable (
        '"SensorReadings"',
        public.by_range ('time', INTERVAL '7 days'),
        migrate_data => TRUE
    );

Step 4:
Create hourly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsHourly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 hour', "time") AS "time1h",
    "id",
    AVG("temp") AS "tempAvg",
    public.stats_agg ("temp") AS "tempAgg"
FROM
    "SensorReadings"
GROUP BY
    "time1h",
    "id"
ORDER BY
    "time1h" DESC;

Step 5:
Create daily real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsDaily"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 day', "time1h", 'Europe/Moscow') AS "time1d",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsHourly"
GROUP BY
    "time1d",
    "id"
ORDER BY
    "time1d" DESC;

Step 6:
Create weekly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsWeekly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 week', "time1d", 'Europe/Moscow') AS "time1w",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsDaily"
GROUP BY
    "time1w",
    "id"
ORDER BY
    "time1w" DESC;

Step 7:
Create monthly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsMonthly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 month', "time1d", 'Europe/Moscow') AS "time1m",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsDaily"
GROUP BY
    "time1m",
    "id"
ORDER BY
    "time1m" DESC;

Step 8:
Create yearly real time aggregate

CREATE MATERIALIZED VIEW
    "SensorReadingsYearly"
WITH
    (
        timescaledb.continuous,
        timescaledb.materialized_only = FALSE
    ) AS
SELECT
    public.time_bucket ('1 year', "time1m", 'Europe/Moscow') AS "time1y",
    "id",
    public.average (public.rollup ("tempAgg")) AS "tempAvg",
    public.rollup ("tempAgg") AS "tempAgg"
FROM
    "SensorReadingsMonthly"
GROUP BY
    "time1y",
    "id"
ORDER BY
    "time1y" DESC;

Step 9:
Add new data (Notice in my case 20 new sensors were added and the previous 10 recieved new data)

INSERT INTO
    public."SensorReadings" (id, TIME, TEMP)
VALUES
    (
        GENERATE_SERIES(1, 30),
        (NOW()::date),
        TRUNC(RANDOM() * 100)
    );

After all these steps SensorReadingsHourly, Daily & Weekly will have new sensor, Monthly and Yearly won't

Things I have tried to refresh the not working aggregates

Manually refresh aggregate. Returns that aggregate is already up to date.

CALL public.refresh_continuous_aggregate (
    '"SensorReadingsMonthly"',
    NULL,
    LOCALTIMESTAMP - INTERVAL '1 week'
);

Create short schedule refresh policies. They run but nothing changes.

SELECT
    add_continuous_aggregate_policy (
        '"SensorReadingsHourly"',
        start_offset => INTERVAL '1 week',
        end_offset => INTERVAL '1 hour',
        schedule_interval => INTERVAL '1 second'
    );

SELECT
    add_continuous_aggregate_policy (
        '"SensorReadingsMonthly"',
        start_offset => INTERVAL '3 months',
        end_offset => INTERVAL '1 month',
        schedule_interval => INTERVAL '1 minute'
    );

Re-enabling real time aggregation in case it did not apply first time

ALTER MATERIALIZED VIEW "SensorReadingsMonthly"
SET
    (timescaledb.materialized_only = FALSE);

Help Wanted

It might totally be a user error and I maybe have misunderstood something in the documentation but the setup seems reasonable. Any help with this case will be much appreciated <3

@PS1TD PS1TD added the bug label Oct 7, 2024
@PS1TD
Copy link
Author

PS1TD commented Oct 7, 2024

Just tried reproducing again but this time doing (NOW()::date - INTERVAL '1 minute') on time in Step 9 to keep all readings in 1 day and got even weirder results where none of the aggregates updated. Creating readings 1 month ahead however forces the Monthly aggregate to update.

@PS1TD
Copy link
Author

PS1TD commented Oct 7, 2024

Also would like to point out the the underlying SELECT queries do in fact return correct and up-to-date data, its just the view that is not updated

@PS1TD
Copy link
Author

PS1TD commented Oct 7, 2024

I think I have figured out what the problem was.
It has to do with how the watermark behaves when you create continuous aggregations WITH DATA.
Appending to WITH NO DATA has returned the behavior to expected.
I feel like this should be mentioned somewhere more visible than the troubleshooting section.
Linking similar issues
#5775
#5379

@PS1TD PS1TD closed this as completed Oct 7, 2024
@PS1TD
Copy link
Author

PS1TD commented Oct 7, 2024

Ill keep the issue open so that it is noticed, feel free to close it

@PS1TD PS1TD reopened this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant