-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the multi-part payments #3857
Conversation
Cherry-picked the contents of #3856 as suggested by @ZmnSCPxj but maintained authorship, since credit goes to @vincenzopalazzo and @ZmnSCPxj for finding this and fixing it before I could even look at it 👍 |
f1b1e83
to
61241cd
Compare
6715cec
to
7b77ad5
Compare
Rebased and squashed after merding #3860 |
OK, this passed the Twitter test! Minor issue, it splits 2x too aggressively, proposed fix here: Actually, I've just pushed it on the end, along with another fix I lost somehow :( |
Ack d6bb191 |
Perhaps late to bring this up, but: do we need to actually arrange payments in a tree as opposed to a flat array of sub-payments? The reason I bring this up is because we might want to merge sub-payments. For example, consider the case where the only viable paths for two sub-payments cost too much (i.e. fee budget is violated), but if we merge them (and their fee budgets) we could succeed. But if the sub-payments we want to merge are cousins instead of siblings in the tree of payments, how do we viably merge them? For example, consider the case below:
Suppose 4 and 6 are currently ongoing (possibly because they reached the destination, so the destination is waiting on the other parts). And 3 and 5 are failing As each subpayment needs a unique numeric id, we could use a monotonically-increasing counter and add an explicit field. Merged payments can get any of the ids of the merged payments. And of course modules have to handle merges as well, if their data structures are not |
Trivial rebase for typo fix. |
4f8dd06
to
0270e5e
Compare
Discussions are always welcome, though I don't think that this should be discussed in a PR where it is likely to get lost, so I opened a new issue for this. The TL;DR is that I think the merging or not of payments is independent of how we link the payments internally. We can represent the tree in an array just as well, and as you point out the I'm wondering whether we'd want to merge at all, or whether other techniques such as fee stealing from other subpayments that we consider to "have worked" (TM) are more promising. |
With the presplitter in particular we would have n attempts but the array contains n+1 entries, which is kinda weird.
Mainly to help my future self remember
This was highlighted in #3851, so I added an assertion. After the rewrite in the next commit we would simply skip if any of the constraints were not maintained, but this serves as the canary in the coalmine, so we don't paper over.
We now check against both constraints on the modifier and the payment before applying either. This "fixes" the assert that was causing the crash in #3851, but we are still looking for the source of the inconsistency where the modifier constraints, initialized to 1/4th of the payment, suddenly get more permissive than the payment itself.
Mainly used for testing so we make sure we exclude or constrain the correct channels. Test to follow.
We skip most payment steps and all sub-payments, so consolidate the skip conditions in one if-statement. We also not use `payment_set_step` to skip any modifiers after us after the step change.
When using mpp we need to always have partids>0, since we bumped the partid for the root, but not the next_id we'd end up with partid=1 being duplicated. Not a big problem since we never ended up sending the root to lightningd, instead skipping it, but it was confusing me while trying to trace sub-payment's ancestry.
We were using the current constraints, including any shadow route and other modifications, when computing the remainder that the second child should use. Instead we should use the `start_constraints` on the parent payment, which is a copy of `constraints` created in `payment_start` exactly for this purpose. Also added an assert for the invariant on the multiplier.
Because listchannels responses are async, with mpp we can end up fighting over use of the parent's data giving results from incorrect final CLTVs to assertion failures like below: ``` pay: plugins/libplugin-pay.c:1964: shadow_route_listchannels: Assertion `amount_msat_greater_eq(p->constraints.fee_budget, d->constraints.fee_budget)' failed. pay: FATAL SIGNAL 6 (version v0.9.0rc2-11-g29d32e0-modded) 0x563129eb6a15 send_backtrace common/daemon.c:38 0x563129eb6abf crashdump common/daemon.c:51 0x7fe37c8b920f ??? ???:0 0x7fe37c8b918b ??? ???:0 0x7fe37c898858 ??? ???:0 0x7fe37c898728 ??? ???:0 0x7fe37c8a9f35 ??? ???:0 0x563129ea5c63 shadow_route_listchannels plugins/libplugin-pay.c:1964 0x563129e9c56a handle_rpc_reply plugins/libplugin.c:547 0x563129e9cbfa rpc_read_response_one plugins/libplugin.c:662 0x563129e9ccf0 rpc_conn_read_response plugins/libplugin.c:681 0x563129ece660 next_plan ccan/ccan/io/io.c:59 0x563129ecf245 do_plan ccan/ccan/io/io.c:407 0x563129ecf287 io_ready ccan/ccan/io/io.c:417 0x563129ed151f io_loop ccan/ccan/io/poll.c:445 0x563129e9ef03 plugin_main plugins/libplugin.c:1284 0x563129e9b099 main plugins/pay.c:2010 0x7fe37c89a0b2 ??? ???:0 0x563129e9452d ??? ???:0 0xffffffffffffffff ??? ???:0 ```
Seems like we get *4* failures, since failing to find a route counts now? Signed-off-by: Rusty Russell <[email protected]>
The code had incorrect assertions, partially because it didn't clearly distinguish errors from the final node (which, barring blockheight issues, mean a complete failre) and intermediate nodes. In particular, we can't trust the *values*, so we need to distinguish these by the *sender*. If a route is of length 2 (A, B): - erring_index == 0 means us complaining about channel A. - erring_index == 1 means A.node complaining about channel B. - erring_index == 2 means the final destination node B.node. This is particularly of note because Travis does NOT run test_pay_routeboost! Signed-off-by: Rusty Russell <[email protected]>
Somehow, we occasionally set the wrong amount field? Doesn't happen all the time, but when it does: b'2020-07-20T05:40:15.510Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10839569msat < 500000000msat' b'2020-07-20T05:40:15.510Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.510Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 5 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.513Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC' b'2020-07-20T05:40:15.514Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9710103msat < 500000000msat' b'2020-07-20T05:40:15.514Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.514Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 10 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.518Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10915092msat < 500000000msat' b'2020-07-20T05:40:15.518Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.518Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 0 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.521Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9143652msat < 500000000msat' b'2020-07-20T05:40:15.521Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.521Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 3 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.524Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9840417msat < 500000000msat' b'2020-07-20T05:40:15.524Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.524Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 6 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.527Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10524535msat < 500000000msat' b'2020-07-20T05:40:15.527Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.527Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 8 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.536Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9579583msat < 500000000msat' b'2020-07-20T05:40:15.536Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.536Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 1 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.541Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 9048144msat < 500000000msat' b'2020-07-20T05:40:15.541Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.541Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 7 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.544Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10858167msat < 500000000msat' b'2020-07-20T05:40:15.544Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.544Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 9 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.548Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10137155msat < 500000000msat' b'2020-07-20T05:40:15.548Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.548Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 2 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.551Z DEBUG lightningd: Attept to pay 528691fdf7baf0043ef2305e504988a5ae510dcb6127b463b3103b40c2b82a87 with amount 10002298msat < 500000000msat' b'2020-07-20T05:40:15.551Z DEBUG lightningd: WIRE_INCORRECT_OR_UNKNOWN_PAYMENT_DETAILS: lightningd/htlc_set.c:113' b'2020-07-20T05:40:15.551Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-chan#1: HTLC in 4 RCVD_ADD_ACK_REVOCATION->SENT_REMOVE_HTLC' b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: NEW:: HTLC REMOTE 14 = RCVD_ADD_HTLC/SENT_ADD_HTLC' b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC' b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: NEW:: HTLC REMOTE 15 = RCVD_ADD_HTLC/SENT_ADD_HTLC' b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC' b'2020-07-20T05:40:15.554Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-channeld-chan#1: NEW:: HTLC REMOTE 16 = RCVD_ADD_HTLC/SENT_ADD_HTLC'
Changelog-Fixed: plugin: `bcli` no longer logs a harmless warning about being unable to connect to the JSON-RPC interface. Changelog-Added: plugin: Plugins can opt out of having an RPC connection automatically initialized on startup.
With MPP we end up with more than 1 attempt almost always.
It turns out that by aggressively splitting payments we may end up exhausting the number of HTLCs we can add to a channel quickly. By tracking the number of HTLCs we can still add, and excluding the channels to which we cannot add any more we increase the route diversity, and avoid quickly exhausting the HTLC budget. In the next commit we'll also implement an early abort if we've exhausted all channels, so we don't end up splitting indefinitely and we can also optimize the initial split to not run afoul of that limit.
The presplit modifier could end up exceeding the maximum number of HTLCs we can add to a channel right out the gate, so we switch to a dynamic presplit if that is the case. The presplit will now at most use 1/3rd of the available HTLCs on the channels if the normal split would exceed the number of availabe HTLCs. And we also abort early if we don't have a sufficient HTLCs available.
There is little point in trying to split if the resulting HTLCs exceed the maximum number of HTLCs we can add to our channels. So abort if a split would result in more HTLCs than our channels can support.
This may be related to the issue #3862, however the water was muddied by it being the wrong error to return, and the node should not expect this courtesy feature to be present at all...
We were removing the current hint from the list and not inheriting the current routehint, so we'd be forgetting a hint at each retry. Now we keep the array unchanged in the root, and simply skip the ones that are not usable given the current information we have about the channels (in the form of channel_hints). Fixes #3861
As the hints get new fields added it is easy to forget to amend one of the places we create them, since we already have an update method let's use that to handle all additions to the array of known channel hints.
The adaptive MPP test was showing an issue with always using a routehint, even when it wasn't necessary: we would insist on routhing to the entrypoint of the routehint, even through the actual destination. If a channel on that loop would result being over capacity we'd slam below 0, and then increase again by unapplying the route. The solution really is not to insist on routing through a routehint, so we implement random skipping of routehints, and we rotate them if we have multiples.
We're using it in a couple of places to see if we even performed the attempt, so we need to make sure it's initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack aad62fc
Fixes #3852
Fixes #3851
Fixes #3855
Removes the scary warning about
bcli
not being able to auto-connect to the JSON-RPC:Fixes #3553
Fixes #3859