Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document rendezvous server error handling #15

Open
piegamesde opened this issue Aug 5, 2021 · 8 comments
Open

Document rendezvous server error handling #15

piegamesde opened this issue Aug 5, 2021 · 8 comments

Comments

@piegamesde
Copy link
Member

piegamesde commented Aug 5, 2021

At the moment, this part is heavily under-documented in the specification, only giving the following information:

  • * S->C error {error: str, orig:}
  • The error field in the welcome message

I suggest having a look at our reference server implementation (https:/magic-wormhole/magic-wormhole-mailbox-server) and then write down some details that are compatible with it. Open questions at the moment:

  • When does the server send an error message? What error strings are common?
  • Is the orig field always present?
  • Is the error message always sent in direct response to a client request?
  • Does the server close the WebSocket connection after an error occurred, or does the show go on?
  • When does the server use the error field on the welcome message, and which values are common?
@meejah
Copy link
Member

meejah commented Aug 5, 2021

When does the server use the error field on the welcome message, and which values are common?

Whenever the program is started with --signal-error and the value will be "whatever the operator typed there".

@piegamesde
Copy link
Member Author

I did a quick code dive and found some answers too:

  • All error messages I could find are some kind of protocol errors. I don't know how IO errors etc. are handled.
  • At the moment, an orig field is always present, and contains the full original message.
  • Error messages are always raised during the processing and in direct response to a client message
  • The input handling loop always goes on. But since all error messages I could find are caused by implementation issues, I'd suggest handling them as non-recoverable on the client side. But the server won't do anything, so it's up to the client to close things
  • The error field on welcome should make clients terminate, but there's nothing that actually prevents them from doing so in the code (of course if the server was actually down for maintenance, then another server would take its place to send the error welcome and that one wouldn't include any of the server functionality).

@piegamesde
Copy link
Member Author

piegamesde commented Aug 5, 2021

So the most important question is: "should we restrict error messages to be in direct reply to client messages?".

If yes:

If no:

  • The orig field becomes optional
  • The error field in the welcome message becomes redundant, and may be deprecated

@piegamesde
Copy link
Member Author

I just noticed that this has implications for our new permissions system too: what does it actually mean for the server to "error out"?

  • We could say that there would be an error message response to the submit-permissions message. But in that case, how long do we wait for that to happen or not before we decide we're good?
    • An option would be to say that the ack message must not come if there was an error. This would help distinguishing, but I fear that it would interfere with some of the Python tooling.
      • Enforcing the order error before ack would work too, but eeh
  • We could say that trying to send any further message will result in getting an error response. (I think this should probably be done regardless, because at the moment the server never closes the WebSockets connection and always continues to process messages.)
  • Making the server close the WebSockets connection on an error would solve this, and quite a few other issues around error handling

@meejah
Copy link
Member

meejah commented Aug 5, 2021

It makes sense to at least close on "permissions failed" I think, since those permissions are scoped the "the connection" so what else could you possibly do .. besides maybe try something else, i.e. a new permissions message, but also re-connecting doesn't seem like much of a burden there.

@piegamesde
Copy link
Member Author

piegamesde commented Aug 9, 2021

I propose the following changes and clarifications to the spec:

  1. Error messages that contain the orig field are in direct response to some other message, those that don't happened due to a different cause
  2. Error messages with an orig field SHOULD initiate a proper unhappy shutdown within the client. Communication can still continue normally, as if the error-triggering message had never been sent.
  3. Error messages without an orig field result in a direct tear-down of the communication on both sides (the server releases the nameplate and mailbox if necessary in behalf of the client). No further ordinary* communication may occur (* except WS pings etc.).
  4. Ack messages signify "the message got successfully processed". In accordance with point 2., no ack message is sent on an error
  5. A failure in submit-permissions results in an error message
  6. The error field in the welcome message gets deprecated. Instead, the server uses an error message as described in point 3.
  7. In case of 3., 5., or 6., all further attempts of communication are met with an error response.

Backwards compatibility:

  • On the client side, all changes are made to things that are under-spec'ed and that clients shouldn't rely on. My biggest fear is that error message handling on their side always expects an orig message. Also, clients may rely on the current server implementation always sending an ack first.
  • On the server side, this requires a few changes. New clients meeting an old server implementation might error out on a protocol error. This only applies to error cases, so it will at most affect some error messages but not the functioning on the "happy path".
  • The client-server protocol is versioned. This may warrant a /v2 WebSockets endpoint

What do you think?

@meejah
Copy link
Member

meejah commented Aug 9, 2021

I'm not sure why we should mess with the ack messages. In case you meant to describe current behavior in point 4 above, that's not what they currently mean. What they currently mean is, "the message arrived at the server" .. and are only used by the timing tool.

So, I don't know that we need to assign any semantic significance to them; indeed, it may collide with the goal of measuring timings (because we'd delay -- slightly -- when that message gets sent if the semantics become "processed correctly" rather than "received + parsed").

Several message types already allow for "error" to be sent as a response. They also have explicit "reply" messages for success (e.g. ->allocate versus <-allocated or ->claim vs. <-claimed). Given this, it might make more sense to spec particular responses to any message-types that currently lack an explicit response (where the client needs to make some sort of decision). Are there any messages like this?

The best way to analyze this is likely by tying it back to the state-machine(s); are there any places we can get "stuck" in the machine?


Regardless of that, part of this seems to be small enough to be its own proposal: that orig-less error messages be allowed; that Welcome[error] be deprecated; that an orig-less error message become the "new way" of doing a more-general error (like "server under maintenance"). Are there any other cases where an orig-less error might be required? (Partly I'm thinking maybe this points to making a special message for this case? like Unwelcome or something? .. might be cleaner than having to specially-handle every error depending on orig or not...)

@piegamesde
Copy link
Member Author

Given this, it might make more sense to spec particular responses to any message-types that currently lack an explicit response (where the client needs to make some sort of decision).

I thought of this too recently, and it is indeed a viable solution if messing with acks is not acceptable.

Are there any messages like this?

Yes, the recently added submit-permissions message. That was an oversight on our part. And add and open kind of too (their response is a message, which is a bit annoying to correlate with the request. I'm not sure about open, since it returns a series of messages but AFAICT that series could also be empty and thus no response message).

The best way to analyze this is likely by tying it back to the state-machine(s); are there any places we can get "stuck" in the machine?

Not sure what you mean exactly, and which machine of which implementation you are referring to, but in either case the answer is a pretty confident yes :)

Are there any other cases where an orig-less error might be required?

The only one that I can think of is magic-wormhole/magic-wormhole-mailbox-server#19

might be cleaner than having to specially-handle every error depending on orig or not

We can also make this a separate error type, FWIW


What are your thoughts of bumping the WebSockets end point version instead of bothering with backwards compatibility? I wrote down some ideas earlier today inspired by all of this that simplify a lot of other things. I still need to check how easy they would be to add to the current relay server implementation (gut feeling says it should be okay).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants