Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectd: add own err codes instead of generic -1 #3397

Merged
merged 2 commits into from
Jan 21, 2020

Conversation

vasild
Copy link
Contributor

@vasild vasild commented Jan 5, 2020

Make it possible for connectd to send an error code to lightningd in
addition to the error message. Introduce two new error codes, replacing
the catch-all -1.

This change, together with
#3395
will implement #3366

Changelog-Changed: The connect command now returns its own error codes instead of a generic -1.

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Jan 6, 2020

Ugly typecast is ugly. @rustyrussell opinion?

@cdecker
Copy link
Member

cdecker commented Jan 9, 2020

Ugly typecast is ugly. @rustyrussell opinion?

We could add towire_int and fromwire_int to wire/{from,to}wire.{c,h} to pass the types correctly? That'd be the cleanest possible solution.

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Jan 9, 2020

We could add towire_int and fromwire_int to wire/{from,to}wire.{c,h} to pass the types correctly? That'd be the cleanest possible solution.

Largely agree as well. A bit more work but probably worth it for cleanliness.

@vasild
Copy link
Contributor Author

vasild commented Jan 9, 2020

Agree, especially given that towire_int would be useful in the future in other places too.

@vasild
Copy link
Contributor Author

vasild commented Jan 12, 2020

I looked into this and realized there are two problems with towire_int():

  1. The size of int is not defined by the standard. The standard states just that int must be able to hold integers in the range [-32768, 32767], which means int can be either 2, 4, or 8 bytes. This, of course, makes it meaningless to have a portable to/from function that operates on int. This can be overcome by introducing towire_s32() since we know s32 is 4 bytes. But still we want to pass the outcome from fromwire_s32() to command_fail() which takes an int argument and we may run into size mismatch on platforms where int is not 4 bytes.

  2. Representing a signed integer in a platform independent way. I used a typecast to u32 inside towire_s32(). This will work as long as both sending and receiving platforms represent signed integers in the same way in memory. I think "all" platforms today use two's complement but I have not done extensive research and this is not a way to write software that lasts. To be done properly this needs our own "wire" representation and some witchcraft in towire_s32() to convert from host to our wire format and in fromwire_s32() to convert from our wire format to the host's (likely all 3 of "source", "wire" and "destination" being two's complement).

That said, I added a new commit e5fa911 that introduces towire_s32() but given that we wanted this to avoid a typecast, we avoided it, but introduced another one inside towire_s32() and given the above issues, I would suggest that we ditch e5fa911 and leave the patch as is without it.

Thoughts?

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Jan 13, 2020

While the wire system is designed to be used for cross-system transport, all communications between daemons are currently on the same machine, thus same endianness and size and encoding. We could put a big warning around towire_int/fromwire_int that it should only be used between subdaemons and not for the actual Lightning protocol. That way size and endianness is a nonissue, it is the same machine that all subdaemons run on anyway.

Xref. fromwire_double and towire_double, which just sends the raw representation of double, including endianness of the machine.

(Though xref #3372 (comment))

2s complement won and nobody in their right mind used sign-magnitude or 1s complement in the past two decades (IEEE 754 floats tho... but that was > three decades ago). Endianness is a bit iffier, though ultimately little-endian seems to be winning in the processor side (ARM is switching to it, and nobody uses anything but ARM or x86-derived anyway) while big-endian won on the network side (i.e. "network byte order" is big-endian).

Make it possible for connectd to send an error code to lightningd in
addition to the error message. Introduce two new error codes, replacing
the catch-all -1.

This change, together with
ElementsProject#3395
will implement ElementsProject#3366

Changelog-Changed: The `connect` command now returns its own error codes instead of a generic -1.
Add towire_int() and fromwire_int() functions to "(de)serialize"
"int". This will only work as long as both the caller of towire_int()
and the caller of fromwire_int() use the same in-memory representation
of signed integers and have the same sizeof(int).

Changelog-None
@vasild
Copy link
Contributor Author

vasild commented Jan 13, 2020

@ZmnSCPxj I amended the last commit to add towire_int() instead of towire_s32(), now in 732e790.

But I would really advise against that - it is a future trap for little or no justification (to avoid a typecast). Yes, all daemons currently run on the same machine, but thanks to the wire system it would be easy to move a given one on another machine and change stdout/stdin communication with a socket. Just that. But with such unportable towire_* functions one would also have to search for all of them and unroot them before moving a given daemon to a remote machine. towire_int() and towire_double() are undermining the excellent idea behind #3372.

towire_double() is used in just one place, a few lines after a contradictory comment "We don't pass doubles". I would rather remove towire_double() than add another unportable function like it.

@ZmnSCPxj
Copy link
Collaborator

The original comment regarding double not being passable through wire was mine, originally there was a complex mapping from 0.0 -> 1.0 to 0 -> UINT32_MAX to allow passing it through the wire, but this was changed to just pass around doubles directly. In any case, it seems plausible to use s32 and revert the switch to passing around doubles, but maybe @rustyrussell can chime in; my memory suggests it was him who suggested instead implementing and using towire_double and fromwire_double.

@vasild
Copy link
Contributor Author

vasild commented Jan 14, 2020

Ok, so we have a few options:

  • Introduce towire_int() and use it for error codes as in 732e790.
  • Introduce towire_s32() and use it for error codes as in e5fa911.
  • Use the existent towire_u32() for error codes (take just the first commit from this PR). We pass 400 and 401. Typecast to int when calling command_fail() because it takes an int argument.

Comment on lines +172 to +174
if (!fromwire(cursor, max, &ret, sizeof(ret)))
return 0;
return ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be missing an endianness conversion, be32_to_cpu should do the trick.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be32 assumes 32-bit ints though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZmnSCPxj would you prefer towire_s32() over towire_int()?

I am still most inclined to "Use the existent towire_u32() for error codes (take just the first commit from this PR). We pass 400 and 401. Typecast to int when calling command_fail() because it takes an int argument."

@@ -87,6 +87,11 @@ void towire_bool(u8 **pptr, bool v)
towire(pptr, &val, sizeof(val));
}

void towire_int(u8 **pptr, int v)
{
towire(pptr, &v, sizeof(v));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, cpu_to_be32.

@ZmnSCPxj
Copy link
Collaborator

Since this PR is getting stuck with no good resolution, I'll just jump in and give what I think should be done:

  • Implement fromwire_int / towire_int and ignore all cross-machine int endianness and size and encoding issues.
    • Cross-machine implementations of the daemons will probably require a local proxy for the remote daemon anyway, the local proxy being the one that lightningd launches and which knows how to trigger the execution of the remote daemon, and can be the one that knows how to translate the local int and double to network-compatible versions.
    • A vast majority of users will compile and run the daemons on the same machine with the exact same compiler and etc etc details anyway, so the endianness carefulness is not really that useful for types that do not get transmitted over the BOLT protocol anyway (i.e. double, int).

If @vasild is willing to do that I can ACK that change so we can get some progress here.

@ZmnSCPxj
Copy link
Collaborator

ACK 732e790

@cdecker
Copy link
Member

cdecker commented Jan 21, 2020

SGTM, I'll add a cleanup that just does the endiannes conversion, so we don't have this cornercase to keep in mind :-)

ACK 732e790

@cdecker cdecker merged commit fb7c006 into ElementsProject:master Jan 21, 2020
@vasild vasild deleted the connect_error_codes branch January 26, 2020 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants