Synapse does enormous expensive queries #11184

Heidistein · 2021-10-26T08:08:15Z

Description

I have upgraded my OS from fedora 33 to 34, synapse is always on the latest version per auto-updates. Since then after one or two minutes the complete server hangs (on postgres doing IO). 8 cores, 8GB RAM. After four days it is not recovered, requests timeout, server is not useable.

Steps to reproduce

run a server for years
upgrade to fedora34 (I dont see how this is related)
no profit

Version information

matrix-synapse-1.45.1-1.fc34.5.noarch from obs.infoserver.lv

Running on a VM, 8 cores, 8GB ram.
When I ask postgres the F it its doing, its running 10 times the same query, with a hidious expensive sort:

synapse=# explain WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Unique  (cost=37421005.02..38175030.12 rows=40000 width=104)
   CTE state
     ->  Recursive Union  (cost=0.00..59933.02 rows=126231 width=8)
           ->  Result  (cost=0.00..0.01 rows=1 width=8)
           ->  Hash Join  (cost=0.33..5740.84 rows=12623 width=8)
                 Hash Cond: (e.state_group = s.state_group)
                 ->  Seq Scan on state_group_edges e  (cost=0.00..4667.57 rows=252457 width=16)
                 ->  Hash  (cost=0.20..0.20 rows=10 width=8)
                       ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8)
   ->  Sort  (cost=37361072.00..37612413.70 rows=100536680 width=104)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         ->  Hash Join  (cost=2844.70..7503862.43 rows=100536680 width=104)
               Hash Cond: (state_groups_state.state_group = state.state_group)
               ->  Seq Scan on state_groups_state  (cost=0.00..5854729.60 rows=201073360 width=104)
               ->  Hash  (cost=2842.20..2842.20 rows=200 width=8)
                     ->  HashAggregate  (cost=2840.20..2842.20 rows=200 width=8)
                           Group Key: state.state_group
                           ->  CTE Scan on state  (cost=0.00..2524.62 rows=126231 width=8)
(18 rows)

The text was updated successfully, but these errors were encountered:

DMRobertson · 2021-10-26T10:55:25Z

upgrade to fedora34 (I dont see how this is related)

Perhaps f34 ships a newer postgres version and there's some change we don't respond well to. Are you able to rollback to f33 to see if that remedies the problem?

squahtx · 2021-10-26T11:09:56Z

On my (admittedly very small) homeserver, I get a different query plan:

synapse=# EXPLAIN ANALYZE WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=10747.39..10785.27 rows=5050 width=91) (actual time=0.675..0.679 rows=0 loops=1)
   CTE state
     ->  Recursive Union  (cost=0.00..795.03 rows=101 width=8) (actual time=0.005..0.520 rows=1 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=1)
           ->  Nested Loop  (cost=0.28..79.30 rows=10 width=8) (actual time=0.511..0.513 rows=0 loops=1)
                 ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.003 rows=1 loops=1)
                 ->  Index Scan using state_group_edges_idx on state_group_edges e  (cost=0.28..7.90 rows=1 width=16) (actual time=0.503..0.503 rows=0 loops=1)
                       Index Cond: (state_group = s.state_group)
   ->  Sort  (cost=9952.36..9964.99 rows=5050 width=91) (actual time=0.674..0.675 rows=0 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=2.84..9641.73 rows=5050 width=91) (actual time=0.605..0.606 rows=0 loops=1)
               ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=0.531..0.532 rows=1 loops=1)
                     Group Key: state.state_group
                     ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8) (actual time=0.007..0.520 rows=1 loops=1)
               ->  Index Scan using state_groups_state_type_idx on state_groups_state  (cost=0.56..94.93 rows=50 width=91) (actual time=0.069..0.069 rows=0 loops=1)
                     Index Cond: (state_group = state.state_group)
 Planning Time: 0.431 ms
 Execution Time: 1.120 ms
(19 rows)

Could you verify that the state_groups_state_type_idx index exists in your database?

Heidistein · 2021-10-26T11:45:47Z

Yes, that index exist. I must say, looking at your query, the cost of the sort is still significant, however because its less rows it's not as high.
I am waiting on the tool rust-synapse-compress-state to complete. However this also already runs for 3 full days.

Indexes:
    "state_group_idx" btree (state_group)
    "state_groups_state_type_idx" btree (state_group, type, state_key)

Heidistein · 2021-10-26T11:47:51Z

upgrade to fedora34 (I dont see how this is related)

Perhaps f34 ships a newer postgres version and there's some change we don't respond well to. Are you able to rollback to f33 to see if that remedies the problem?

Sorry, I cannot downgrade, made no backup. The database is also 'upgraded' to the new format.
It went from 12.7 to 13.4. However, I cannot imagine I am the only one running v13...

Oh, and I reindexed twice by now.

edit: Yes, yes! I will import a databasedump export backup sql file in an older version. Please allow me some time.

squahtx · 2021-10-26T11:48:41Z

Strange. Are you able to run an EXPLAIN ANALYZE to get the actual vs estimated number of rows going through the sort? 100,000,000 seems excessive.

Heidistein · 2021-10-26T11:51:54Z

Strange. Are you able to run an EXPLAIN ANALYZE to get the actual vs estimated number of rows going through the sort? 100,000,000 seems excessive.

It's running, expect hours.

Heidistein · 2021-10-26T12:38:57Z

I exaggerated, was 3 minutes:

synapse=# EXPLAIN ANALYZE WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC;
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=37421005.02..38175030.12 rows=40000 width=104) (actual time=216192.973..216238.654 rows=53874 loops=1)
   CTE state
     ->  Recursive Union  (cost=0.00..59933.02 rows=126231 width=8) (actual time=0.011..502.764 rows=9 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.007..0.013 rows=1 loops=1)
           ->  Hash Join  (cost=0.33..5740.84 rows=12623 width=8) (actual time=55.618..55.839 rows=1 loops=9)
                 Hash Cond: (e.state_group = s.state_group)
                 ->  Seq Scan on state_group_edges e  (cost=0.00..4667.57 rows=252457 width=16) (actual time=0.011..28.668 rows=252457 loops=9)
                 ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.005..0.007 rows=1 loops=9)
                       Buckets: 1024  Batches: 1  Memory Usage: 9kB
                       ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.001 rows=1 loops=9)
   ->  Sort  (cost=37361072.00..37612413.70 rows=100536680 width=104) (actual time=216192.963..216223.953 rows=53876 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: external merge  Disk: 5536kB
         ->  Hash Join  (cost=2844.70..7503862.43 rows=100536680 width=104) (actual time=91925.222..216053.961 rows=53876 loops=1)
               Hash Cond: (state_groups_state.state_group = state.state_group)
               ->  Seq Scan on state_groups_state  (cost=0.00..5854729.60 rows=201073360 width=104) (actual time=0.064..183731.189 rows=201073361 loops=1)
               ->  Hash  (cost=2842.20..2842.20 rows=200 width=8) (actual time=502.828..502.846 rows=9 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  HashAggregate  (cost=2840.20..2842.20 rows=200 width=8) (actual time=502.791..502.807 rows=9 loops=1)
                           Group Key: state.state_group
                           Batches: 1  Memory Usage: 40kB
                           ->  CTE Scan on state  (cost=0.00..2524.62 rows=126231 width=8) (actual time=0.019..502.723 rows=9 loops=1)
 Planning Time: 0.836 ms
 Execution Time: 216248.673 ms
(24 rows)

richvdh · 2021-10-26T12:39:01Z

see also #5064 with some notes from people who have previously had problems here. (See in particular the final comment which notes that someone had this problem due to having duplicate data in state_group_edges.)

Running this smaller query can help provide enlightenment:

WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT count(*) FROM state;

(it should return a number below about 200).

squahtx · 2021-10-26T12:53:27Z

I suspect this may be a postgres issue rather than a Synapse issue. I'd recommend bringing it to #synapse:matrix.org if you're able to use matrix.

For some reason your postgres instance is avoiding doing any index scans, both in the recursive part and the main query.
Could you check the following config options?

SHOW seq_page_cost;
SHOW random_page_cost;
SHOW enable_indexscan;

The statistics for state_groups_state also seem wildly off, namely the estimated number of rows:

Hash Join  (cost=2844.70..7503862.43 rows=100536680 width=104) (actual time=91925.222..216053.961 rows=53876 loops=1)

#5064, which rich mentioned, has a few comments about fixing table statistics that may be worth trying

Heidistein · 2021-10-26T13:09:33Z

---------------
 1
(1 row)

 random_page_cost
------------------
 4
(1 row)

 enable_indexscan
------------------
 on
(1 row)

Doing only the analyze (on state_groups_state) already has a massive improvement. Analyze all tables yealds even better results:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2523.27..2561.15 rows=5050 width=90) (actual time=206.298..304.561 rows=53874 loops=1)
   CTE state
     ->  Recursive Union  (cost=0.00..848.78 rows=101 width=8) (actual time=0.019..34.414 rows=9 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=1)
           ->  Nested Loop  (cost=0.42..84.67 rows=10 width=8) (actual time=3.817..3.818 rows=1 loops=9)
                 ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.001 rows=1 loops=9)
                 ->  Index Scan using state_group_edges_idx on state_group_edges e  (cost=0.42..8.44 rows=1 width=16) (actual time=3.814..3.815 rows=1 loops=9)
                       Index Cond: (state_group = s.state_group)
   ->  Sort  (cost=1674.49..1687.12 rows=5050 width=90) (actual time=206.290..291.612 rows=53876 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: external merge  Disk: 5536kB
         ->  Nested Loop  (cost=2.84..1363.86 rows=5050 width=90) (actual time=34.515..68.025 rows=53876 loops=1)
               ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=34.474..34.486 rows=9 loops=1)
                     Group Key: state.state_group
                     Batches: 1  Memory Usage: 24kB
                     ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8) (actual time=0.028..34.429 rows=9 loops=1)
               ->  Index Scan using state_group_idx on state_groups_state  (cost=0.57..12.97 rows=50 width=90) (actual time=0.014..2.266 rows=5986 loops=9)
                     Index Cond: (state_group = state.state_group)
 Planning Time: 3.208 ms
 Execution Time: 312.548 ms
(20 rows)

I will try to start matrix now, and if that works I can join the channel.

squahtx · 2021-10-26T15:29:20Z

Those config options look normal. It sounds like the issue was the table statistics?

I'll close the issue, since it sounds like the problem has been resolved and it's more to do with postgres than synapse.

w3bb · 2021-10-31T08:10:41Z

Dealing with a similar problem, after upgrading today there's been a ton of high CPU usage SELECT requests in htop from it.

Heidistein · 2021-10-31T08:50:05Z

Hi! I did an `analyze` on all tables. That solved it. For absolute sureness I did a `vacuum full` thereafter (this is slow and disruptive), concluding with a reindex and another analyze.

On October 31, 2021 9:10:51 AM GMT+01:00, welt ***@***.***> wrote: Dealing with a similar problem, after upgrading today there's been a ton of high CPU usage SELECT requests in htop from it. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #11184 (comment)

-- Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.

w3bb · 2021-10-31T08:51:47Z

I've done that already and it didn't help.

w3bb · 2021-10-31T09:05:30Z

Okay so the problem I'm seeing is during a sync. I realised that when my phone attempts to sync that's when the CPU skyrockets. After doing eight million different combinations of reindexing and vacuuming and analyizing it seems to be better, but now it just isn't syncing.

w3bb · 2021-10-31T09:11:51Z

It was because I forgot I turned down the maximum workers to a tiny amount, at its normal amount it's at 100% CPU

Heidistein · 2021-10-31T09:12:49Z

For support I guess you better try joining the synapse admins channels (perhaps with different server), for a bug report best if you open a new one, find out the offending queries plus explain plan, etcetera. After being down for 4 days it took several hours for me to reach normality.

On October 31, 2021 10:05:41 AM GMT+01:00, welt ***@***.***> wrote: Okay so the problem I'm seeing is during a sync. I realised that when my phone attempts to sync that's when the CPU skyrockets. After doing eight million different combinations of reindexing and vacuuming and analyizing it seems to be better, but now it just isn't syncing. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #11184 (comment)

-- Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.

squahtx closed this as completed Oct 26, 2021

jffrancob mentioned this issue Mar 3, 2022

Initial sync is taking a long time to finish #12158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synapse does enormous expensive queries #11184

Synapse does enormous expensive queries #11184

Heidistein commented Oct 26, 2021

DMRobertson commented Oct 26, 2021

squahtx commented Oct 26, 2021

Heidistein commented Oct 26, 2021

Heidistein commented Oct 26, 2021 •

edited

Loading

squahtx commented Oct 26, 2021

Heidistein commented Oct 26, 2021

Heidistein commented Oct 26, 2021

richvdh commented Oct 26, 2021

squahtx commented Oct 26, 2021

Heidistein commented Oct 26, 2021

squahtx commented Oct 26, 2021

w3bb commented Oct 31, 2021

Heidistein commented Oct 31, 2021 via email

w3bb commented Oct 31, 2021

w3bb commented Oct 31, 2021

w3bb commented Oct 31, 2021

Heidistein commented Oct 31, 2021 via email

Synapse does enormous expensive queries #11184

Synapse does enormous expensive queries #11184

Comments

Heidistein commented Oct 26, 2021

Description

Steps to reproduce

Version information

DMRobertson commented Oct 26, 2021

squahtx commented Oct 26, 2021

Heidistein commented Oct 26, 2021

Heidistein commented Oct 26, 2021 • edited Loading

squahtx commented Oct 26, 2021

Heidistein commented Oct 26, 2021

Heidistein commented Oct 26, 2021

richvdh commented Oct 26, 2021

squahtx commented Oct 26, 2021

Heidistein commented Oct 26, 2021

squahtx commented Oct 26, 2021

w3bb commented Oct 31, 2021

Heidistein commented Oct 31, 2021 via email

w3bb commented Oct 31, 2021

w3bb commented Oct 31, 2021

w3bb commented Oct 31, 2021

Heidistein commented Oct 31, 2021 via email

Heidistein commented Oct 26, 2021 •

edited

Loading