Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Performance problems with bridges with ephemeral events enabled #8903

Open
Half-Shot opened this issue Dec 9, 2020 · 6 comments
Open

Performance problems with bridges with ephemeral events enabled #8903

Half-Shot opened this issue Dec 9, 2020 · 6 comments
Labels
A-Application-Service Related to AS support S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution

Comments

@Half-Shot
Copy link
Collaborator

A lot of these problems seem to happen at scale. This is what we saw when we restarted matrix.org with ephemeral events enabled for the OFTC bridge:

  • The memory usage of the appservice worker has increased ~3.5x, presumably to handle caching of members in rooms ahead of time.
  • CPU will churn during startup handling notify_interested_services_ephemeral
  • The database will be hammered for get_users_in_room and set_type_stream_id_for_appservice
@Half-Shot
Copy link
Collaborator Author

Half-Shot commented Dec 9, 2020

There is also 2020-12-09 10:55:13,448 - synapse.metrics.background_process_metrics - 217 - ERROR - notify_interested_services_ephemeral-19 - Background process 'notify_interested_services_ephemeral' threw an exception aplenty in the logs

as well as 2020-12-09 11:45:32,310 - synapse.storage.txn - 517 - WARNING - notify_interested_services_ephemeral-135952 - [TXN OPERROR] {set_type_stream_id_for_appservice-61d89} could not serialize access due to concurrent update

@Half-Shot
Copy link
Collaborator Author

Thought about this some more and we could save a lot of CPU by just adding a linearizer to notify_interested_services_ephemeral. I think I'll have a go at that.

@Half-Shot Half-Shot added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Minor Blocks non-critical functionality, workarounds exist. and removed A-Application-Service Related to AS support A-Performance Performance, both client-facing and admin-facing labels Feb 15, 2021
@Half-Shot
Copy link
Collaborator Author

I've stuck some tags on this that I think are appropriate.

@kevinrademan
Copy link

When I have this feature enable my app service worker logs get spammed with the following error
"could not serialize access due to concurrent update"

It also results in several minutes delay before events are delivered to my app services.

@richvdh richvdh changed the title Ephemeral event enabled bridge performances problems Performance problems with bridges with ephemeral events enabled May 5, 2021
@anoadragon453
Copy link
Member

anoadragon453 commented Aug 25, 2021

@kevinrademan Note that that indicates a higher increase in time that Synapse is waiting for database access. Thus in its current state this feature potentially has a high impact on database resources.

@anoadragon453 anoadragon453 added the Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution label Sep 16, 2021
@anoadragon453
Copy link
Member

See also #10836.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Application-Service Related to AS support S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Help-Wanted We know exactly how to fix this issue, and would be grateful for any contribution
Projects
None yet
Development

No branches or pull requests

4 participants