Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse does enormous expensive queries #11184

Closed
Heidistein opened this issue Oct 26, 2021 · 17 comments
Closed

Synapse does enormous expensive queries #11184

Heidistein opened this issue Oct 26, 2021 · 17 comments

Comments

@Heidistein
Copy link

Description

I have upgraded my OS from fedora 33 to 34, synapse is always on the latest version per auto-updates. Since then after one or two minutes the complete server hangs (on postgres doing IO). 8 cores, 8GB RAM. After four days it is not recovered, requests timeout, server is not useable.

Steps to reproduce

  • run a server for years
  • upgrade to fedora34 (I dont see how this is related)
  • no profit

Version information

matrix-synapse-1.45.1-1.fc34.5.noarch from obs.infoserver.lv

Running on a VM, 8 cores, 8GB ram.
When I ask postgres the F it its doing, its running 10 times the same query, with a hidious expensive sort:

synapse=# explain WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Unique  (cost=37421005.02..38175030.12 rows=40000 width=104)
   CTE state
     ->  Recursive Union  (cost=0.00..59933.02 rows=126231 width=8)
           ->  Result  (cost=0.00..0.01 rows=1 width=8)
           ->  Hash Join  (cost=0.33..5740.84 rows=12623 width=8)
                 Hash Cond: (e.state_group = s.state_group)
                 ->  Seq Scan on state_group_edges e  (cost=0.00..4667.57 rows=252457 width=16)
                 ->  Hash  (cost=0.20..0.20 rows=10 width=8)
                       ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8)
   ->  Sort  (cost=37361072.00..37612413.70 rows=100536680 width=104)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         ->  Hash Join  (cost=2844.70..7503862.43 rows=100536680 width=104)
               Hash Cond: (state_groups_state.state_group = state.state_group)
               ->  Seq Scan on state_groups_state  (cost=0.00..5854729.60 rows=201073360 width=104)
               ->  Hash  (cost=2842.20..2842.20 rows=200 width=8)
                     ->  HashAggregate  (cost=2840.20..2842.20 rows=200 width=8)
                           Group Key: state.state_group
                           ->  CTE Scan on state  (cost=0.00..2524.62 rows=126231 width=8)
(18 rows)
@DMRobertson
Copy link
Contributor

upgrade to fedora34 (I dont see how this is related)

Perhaps f34 ships a newer postgres version and there's some change we don't respond well to. Are you able to rollback to f33 to see if that remedies the problem?

@squahtx
Copy link
Contributor

squahtx commented Oct 26, 2021

On my (admittedly very small) homeserver, I get a different query plan:

synapse=# EXPLAIN ANALYZE WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=10747.39..10785.27 rows=5050 width=91) (actual time=0.675..0.679 rows=0 loops=1)
   CTE state
     ->  Recursive Union  (cost=0.00..795.03 rows=101 width=8) (actual time=0.005..0.520 rows=1 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.002..0.003 rows=1 loops=1)
           ->  Nested Loop  (cost=0.28..79.30 rows=10 width=8) (actual time=0.511..0.513 rows=0 loops=1)
                 ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.003 rows=1 loops=1)
                 ->  Index Scan using state_group_edges_idx on state_group_edges e  (cost=0.28..7.90 rows=1 width=16) (actual time=0.503..0.503 rows=0 loops=1)
                       Index Cond: (state_group = s.state_group)
   ->  Sort  (cost=9952.36..9964.99 rows=5050 width=91) (actual time=0.674..0.675 rows=0 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=2.84..9641.73 rows=5050 width=91) (actual time=0.605..0.606 rows=0 loops=1)
               ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=0.531..0.532 rows=1 loops=1)
                     Group Key: state.state_group
                     ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8) (actual time=0.007..0.520 rows=1 loops=1)
               ->  Index Scan using state_groups_state_type_idx on state_groups_state  (cost=0.56..94.93 rows=50 width=91) (actual time=0.069..0.069 rows=0 loops=1)
                     Index Cond: (state_group = state.state_group)
 Planning Time: 0.431 ms
 Execution Time: 1.120 ms
(19 rows)

Could you verify that the state_groups_state_type_idx index exists in your database?

@Heidistein
Copy link
Author

Yes, that index exist. I must say, looking at your query, the cost of the sort is still significant, however because its less rows it's not as high.
I am waiting on the tool rust-synapse-compress-state to complete. However this also already runs for 3 full days.

Indexes:
    "state_group_idx" btree (state_group)
    "state_groups_state_type_idx" btree (state_group, type, state_key)

@Heidistein
Copy link
Author

Heidistein commented Oct 26, 2021

upgrade to fedora34 (I dont see how this is related)

Perhaps f34 ships a newer postgres version and there's some change we don't respond well to. Are you able to rollback to f33 to see if that remedies the problem?

Sorry, I cannot downgrade, made no backup. The database is also 'upgraded' to the new format.
It went from 12.7 to 13.4. However, I cannot imagine I am the only one running v13...

Oh, and I reindexed twice by now.

edit: Yes, yes! I will import a databasedump export backup sql file in an older version. Please allow me some time.

@squahtx
Copy link
Contributor

squahtx commented Oct 26, 2021

Strange. Are you able to run an EXPLAIN ANALYZE to get the actual vs estimated number of rows going through the sort? 100,000,000 seems excessive.

@Heidistein
Copy link
Author

Strange. Are you able to run an EXPLAIN ANALYZE to get the actual vs estimated number of rows going through the sort? 100,000,000 seems excessive.

It's running, expect hours.

@Heidistein
Copy link
Author

I exaggerated, was 3 minutes:

synapse=# EXPLAIN ANALYZE WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT DISTINCT ON (type, state_key) type, state_key, event_id FROM state_groups_state WHERE state_group IN ( SELECT state_group FROM state ) ORDER BY type, state_key, state_group DESC;
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=37421005.02..38175030.12 rows=40000 width=104) (actual time=216192.973..216238.654 rows=53874 loops=1)
   CTE state
     ->  Recursive Union  (cost=0.00..59933.02 rows=126231 width=8) (actual time=0.011..502.764 rows=9 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.007..0.013 rows=1 loops=1)
           ->  Hash Join  (cost=0.33..5740.84 rows=12623 width=8) (actual time=55.618..55.839 rows=1 loops=9)
                 Hash Cond: (e.state_group = s.state_group)
                 ->  Seq Scan on state_group_edges e  (cost=0.00..4667.57 rows=252457 width=16) (actual time=0.011..28.668 rows=252457 loops=9)
                 ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.005..0.007 rows=1 loops=9)
                       Buckets: 1024  Batches: 1  Memory Usage: 9kB
                       ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.001 rows=1 loops=9)
   ->  Sort  (cost=37361072.00..37612413.70 rows=100536680 width=104) (actual time=216192.963..216223.953 rows=53876 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: external merge  Disk: 5536kB
         ->  Hash Join  (cost=2844.70..7503862.43 rows=100536680 width=104) (actual time=91925.222..216053.961 rows=53876 loops=1)
               Hash Cond: (state_groups_state.state_group = state.state_group)
               ->  Seq Scan on state_groups_state  (cost=0.00..5854729.60 rows=201073360 width=104) (actual time=0.064..183731.189 rows=201073361 loops=1)
               ->  Hash  (cost=2842.20..2842.20 rows=200 width=8) (actual time=502.828..502.846 rows=9 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  HashAggregate  (cost=2840.20..2842.20 rows=200 width=8) (actual time=502.791..502.807 rows=9 loops=1)
                           Group Key: state.state_group
                           Batches: 1  Memory Usage: 40kB
                           ->  CTE Scan on state  (cost=0.00..2524.62 rows=126231 width=8) (actual time=0.019..502.723 rows=9 loops=1)
 Planning Time: 0.836 ms
 Execution Time: 216248.673 ms
(24 rows)

@richvdh
Copy link
Member

richvdh commented Oct 26, 2021

see also #5064 with some notes from people who have previously had problems here. (See in particular the final comment which notes that someone had this problem due to having duplicate data in state_group_edges.)

Running this smaller query can help provide enlightenment:

WITH RECURSIVE state(state_group) AS ( VALUES(285458::bigint) UNION ALL SELECT prev_state_group FROM state_group_edges e, state s WHERE s.state_group = e.state_group ) SELECT count(*) FROM state;

(it should return a number below about 200).

@squahtx
Copy link
Contributor

squahtx commented Oct 26, 2021

I suspect this may be a postgres issue rather than a Synapse issue. I'd recommend bringing it to #synapse:matrix.org if you're able to use matrix.

For some reason your postgres instance is avoiding doing any index scans, both in the recursive part and the main query.
Could you check the following config options?

SHOW seq_page_cost;
SHOW random_page_cost;
SHOW enable_indexscan;

The statistics for state_groups_state also seem wildly off, namely the estimated number of rows:

Hash Join  (cost=2844.70..7503862.43 rows=100536680 width=104) (actual time=91925.222..216053.961 rows=53876 loops=1)

#5064, which rich mentioned, has a few comments about fixing table statistics that may be worth trying

@Heidistein
Copy link
Author

---------------
 1
(1 row)

 random_page_cost
------------------
 4
(1 row)

 enable_indexscan
------------------
 on
(1 row)

Doing only the analyze (on state_groups_state) already has a massive improvement. Analyze all tables yealds even better results:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2523.27..2561.15 rows=5050 width=90) (actual time=206.298..304.561 rows=53874 loops=1)
   CTE state
     ->  Recursive Union  (cost=0.00..848.78 rows=101 width=8) (actual time=0.019..34.414 rows=9 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=1)
           ->  Nested Loop  (cost=0.42..84.67 rows=10 width=8) (actual time=3.817..3.818 rows=1 loops=9)
                 ->  WorkTable Scan on state s  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.001 rows=1 loops=9)
                 ->  Index Scan using state_group_edges_idx on state_group_edges e  (cost=0.42..8.44 rows=1 width=16) (actual time=3.814..3.815 rows=1 loops=9)
                       Index Cond: (state_group = s.state_group)
   ->  Sort  (cost=1674.49..1687.12 rows=5050 width=90) (actual time=206.290..291.612 rows=53876 loops=1)
         Sort Key: state_groups_state.type, state_groups_state.state_key, state_groups_state.state_group DESC
         Sort Method: external merge  Disk: 5536kB
         ->  Nested Loop  (cost=2.84..1363.86 rows=5050 width=90) (actual time=34.515..68.025 rows=53876 loops=1)
               ->  HashAggregate  (cost=2.27..3.28 rows=101 width=8) (actual time=34.474..34.486 rows=9 loops=1)
                     Group Key: state.state_group
                     Batches: 1  Memory Usage: 24kB
                     ->  CTE Scan on state  (cost=0.00..2.02 rows=101 width=8) (actual time=0.028..34.429 rows=9 loops=1)
               ->  Index Scan using state_group_idx on state_groups_state  (cost=0.57..12.97 rows=50 width=90) (actual time=0.014..2.266 rows=5986 loops=9)
                     Index Cond: (state_group = state.state_group)
 Planning Time: 3.208 ms
 Execution Time: 312.548 ms
(20 rows)

I will try to start matrix now, and if that works I can join the channel.

@squahtx
Copy link
Contributor

squahtx commented Oct 26, 2021

Those config options look normal. It sounds like the issue was the table statistics?

I'll close the issue, since it sounds like the problem has been resolved and it's more to do with postgres than synapse.

@squahtx squahtx closed this as completed Oct 26, 2021
@w3bb
Copy link

w3bb commented Oct 31, 2021

Dealing with a similar problem, after upgrading today there's been a ton of high CPU usage SELECT requests in htop from it.

@Heidistein
Copy link
Author

Heidistein commented Oct 31, 2021 via email

@w3bb
Copy link

w3bb commented Oct 31, 2021

I've done that already and it didn't help.

@w3bb
Copy link

w3bb commented Oct 31, 2021

Okay so the problem I'm seeing is during a sync. I realised that when my phone attempts to sync that's when the CPU skyrockets. After doing eight million different combinations of reindexing and vacuuming and analyizing it seems to be better, but now it just isn't syncing.

@w3bb
Copy link

w3bb commented Oct 31, 2021

It was because I forgot I turned down the maximum workers to a tiny amount, at its normal amount it's at 100% CPU

@Heidistein
Copy link
Author

Heidistein commented Oct 31, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants