-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
There is no retry logic for sending EDUs to application services #11150
Comments
As a slight clarification, we - perhaps purposefully - don't wait until the transaction has been sent to the application service before updating the stream token for a given service's stream_key (e.g Instead, we should just keep track of failed requests and try them again in the background, backing off if necessary. I'm not sure what we want to do in the case of a certain range of stream tokens failing for >24 hours. Something equivalent to federation catchup I suspect. |
The current implementation of #11215 will only send 100 to-device messages at a time. Sending to-device messages to an AS is triggered by a new to-device message coming in. Thus, if there is a large backlog of to-device messages to send to the AS, they will only gradually get them as new to-device messages come in. We can speed this process up by triggering a new transaction to the AS if we happen to hit this limit - though it is unclear whether this would overload the AS. Needs to be tried in practice I think. |
When we send ephemeral events to appservices, we keep track of a stream token per appservice per EDU type. We do this for read receipts and presence updates (but not for typing, those are a little ephemeral to care about).
When a new read receipt comes through the server, we consider whether that should be sent to any connected application services. If so, we send those read receipts off. We then record the stream token of the read receipt for this appservice, even if we ended up determining that the appservice wasn't interested in the event (so that we wouldn't have to check that again later on).
This system works (although has scalability concerns), however there is no retry logic. We record the updated stream token even if sending to the application service fails:
synapse/synapse/handlers/appservice.py
Lines 239 to 245 in e584534
submit_ephemeral_events_for_as
kicks off a background task which logs and then ignores exceptions:synapse/synapse/appservice/scheduler.py
Lines 156 to 159 in 4b965c8
The stream token is stored and updated for this appservice immediately after the background task kicks off. It makes sense to do this so that we do not end up with duplicated work while processing subsequent calls, even if sending to the appservice is slow, however we still need a way of deprecated/holding off updating the stored stream id until a 200 is returned from the appservice. Only then should we mark the appservice as successfully having processed up to that stream token.
This doesn't matter so much with read receipts or presence updates, but will become much more important when we start passing things like device lists over this channel. A blip in the network will lead to decryption errors down the line.
Note that this behaviour for the current set of supported EDUs is intentional:
synapse/synapse/storage/databases/main/appservice.py
Lines 192 to 200 in e584534
The stream tokens are stored in the
application_services_state
table, which has the schema:The text was updated successfully, but these errors were encountered: