Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lotus daemon network traffic is over 25 Mbps which is so higher than before #12381

Closed
5 of 11 tasks
solopine opened this issue Aug 13, 2024 · 26 comments
Closed
5 of 11 tasks
Labels
kind/bug Kind: Bug P1 P1: Must be resolved

Comments

@solopine
Copy link

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus fvm/fevm - Lotus FVM and FEVM interactions
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt/WinningPoSt)
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon: lotus version 1.28.1+mainnet+git.6bdbbc024

Repro Steps

  1. Run 'lotus daemon'
  2. Do 'iftop' to see network traffic
  3. See "over 25 Mbps network traffic on lotus daemon libp2p port.". But the traffic is very slow (less then 1 Mbps) before current network 23 upgrade.

Describe the Bug

Currently I noticed lotus daemon node has unnormal network traffic than before upgrade (now>25 Mbps, before<1 Mbps).
I want to know my daemon traffic is expected? If yes, why does daemon need exchange such many data now? How to config to minimize data exchange?

Btw, the daemon functions well.

Logging Information

The daemon log has no error.
@solopine solopine added the kind/bug Kind: Bug label Aug 13, 2024
@MikeH1999
Copy link

follow

@baseonejj
Copy link

baseonejj commented Aug 14, 2024

Encountering the same issue, continuous output of logs not previously seen
The approximate time of the occurrence would be after Beijing Time 2024-08-13 06:28:00.

024-08-14T16:33:36.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWBfHbJ1wqsceTVrTciWRdsUsQD97Kqnv96csJftCapDZZ didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:36.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWGvebEYdG87SV4FdJRVz5Qi9Q6wShjp2SPyeGPep6cjWu didn't follow up in 4 IWANT requests; adding penalty   
2024-08-14T16:33:36.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWFby9K8XfVzx3QaZfjh2hZJ4XqYVrxxUAKgT546QGBXHT didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:37.158+0800    WARN    hello   hello/hello.go:117      other peer hasnt completed libp2p identify, waiting a bit
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWHnKVudkSBBxTSAww4uAgt1aNnCvNQFMnBNbAiFVfRcbA didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWBFofVxbtBR28EY8Q81LxmFghcxXrprJ3Mx6QNX5SiBnB didn't follow up in 1 IWANT requests; adding penalty   ot find baf
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWHJyVtcYZdKfcbCssETensu8JkbfCdQM3LwSrZZNwAAih didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWMxgwDgLPgkGBN8oafAstz7UpnwmEEBUf5bf7mySvyiuZ didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWFVwGibiCzbwLPVUfq2ecRmtZjGLSLSYmAd4KYvHboeRV didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWPgGRQksUqiNfCwq5BrbDi3qpb3wLC6rF3uGRrJq99een didn't follow up in 3 IWANT requests; adding penalty   
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWHnfopCmep1tVm5LCmaPK814Q1MPNU3bNTN1qXymusotq didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:37.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWKwXCjeVmR2nJP1KzAFGN49ArK1nG1xrtVaBUupEmEvKZ didn't follow up in 1 IWANT requests; adding penalty   
2024-08-14T16:33:38.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWM3VnQDYRxAARP4G2mRkJHRNcZmwtiJ9nLsZrjiKUuzWq didn't follow up in 1 IWANT requests; adding penalty   73u6,bafy2b
2024-08-14T16:33:38.582+0800    INFO    pubsub  [email protected]/gossipsub.go:1684      peer 12D3KooWS2Wqk1U3JmvLAAuQcSMYqEBmWRM3hnNYWZucY1CcsRpf didn't follow up in 1 IWANT requests; adding penalty   ot find baf
2024-08-14T16:33:39.040+0800    INFO    connmgr connmgr/connmgr.go:595  tried to remove tag from untracked peer: 12D3KooWAc1FGBh7fExFGkMToFcVTZifV4rBqFU4MbRcFs8r2pXo
2024-08-14T16:33:39.287+0800    INFO    connmgr connmgr/connmgr.go:595  tried to remove tag from untracked peer: 12D3KooWLu7iCqdFVauMmbNPDNM4yKqKgEwUTyechNgyRZMYkZW6

@solopine
Copy link
Author

solopine commented Aug 14, 2024

Below is my iftop snapshot. My p2p port is 30901, you can see some p2p connection has 7 Mpbs triffic
image

@solopine
Copy link
Author

here is my latest log snapshot
image

@beck-8
Copy link
Contributor

beck-8 commented Aug 14, 2024

I'm going to throw out a possible solution.

lotus net  bandwidth
lotus net  bandwidth --by-peer

# then
lotus net block xx

@solopine
Copy link
Author

Here is my "lotus net bandwidth --by-peer" result. some peer has 2GB in and 1 GB out, what data does they exchange? Market retrieve data? but I even do not run boost now.

image

@beck-8
Copy link
Contributor

beck-8 commented Aug 14, 2024

If you find a high current speed or high historical speed, you can try to block it. They are communicating about when the bull market will come.

@solopine
Copy link
Author

Oh my god, the total count of peers is 24360 in my side when I run "lotus net bandwidth --by-peer"... let me see which peers want to communicate with me so happily.

@zelin44913
Copy link

follow

@rjan90 rjan90 added the P1 P1: Must be resolved label Aug 14, 2024
@solopine
Copy link
Author

I'm going to throw out a possible solution.

lotus net  bandwidth
lotus net  bandwidth --by-peer

# then
lotus net block xx

It can not work for me. I block many peers and IPs, but other peers take place of them, and the bandwidth still remains high.

@zelin44913
Copy link

This upgrade was a complete disaster for us who run a large number of nodes. Each node had a large number of traffic requests, and the public network and the intranet were on the verge of collapse.

@solopine
Copy link
Author

This upgrade was a complete disaster for us who run a large number of nodes. Each node had a large number of traffic requests, and the public network and the intranet were on the verge of collapse.

Yes, it is a disaster for me, I failed many WindowPost because of the busy traffic. Some of my storage are outside, and connected to my Post miner node by Internet, now the internet bandwidth is fully occupied by lotus daemon, so WdPost failed.

Hope the issue can be resolved quickly.

@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 14, 2024

We are working on finding a resolution for this issue. Could you post the result of lotus net scores and lotus net scores -x, optimally as a https://gist.github.com/ .

@hdusten
Copy link

hdusten commented Aug 14, 2024

This is a problem: I couldn't even imagine people with multiple nodes. Here is our network traffic over the last week. We were late to upgrade by two days so the two big spike's in traffic is us downloading a snapshot:

image

From left to right:
Arrow 1: Traffic Consumption
Arrow 2: Lotus Node Upgraded and brought online
Arrow 3: The start of the bandwidth hog
On our network we have isolated it directly to the lotus daemon instance. Traffic has gone up multiple x's and it started Yesterday not when we upgraded.

@beck-8
Copy link
Contributor

beck-8 commented Aug 14, 2024

We are working on finding a resolution for this issue. Could you post the result of lotus net scores and lotus net scores -x, optimally as a https://gist.github.com/ .

https://gist.github.com/beck-8/6d87ee70a7c4a54adad52ff75f55146d

@beck-8
Copy link
Contributor

beck-8 commented Aug 14, 2024

/meshsub/1.1.0
It was this protocol that caused the traffic surge

@solopine
Copy link
Author

We are working on finding a resolution for this issue. Could you post the result of lotus net scores and lotus net scores -x, optimally as a https://gist.github.com/ .

here is mine: https://gist.github.com/solopine/b9d9f8f67638ac2668e3bdf3bb743655

@hdusten
Copy link

hdusten commented Aug 14, 2024

We are working on finding a resolution for this issue. Could you post the result of lotus net scores and lotus net scores -x, optimally as a https://gist.github.com/ .

https://gist.github.com/hdusten/e3d3a485632d5391aa914221c7a12344

@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 14, 2024

Thank you for sending the logs.
We have identified the issue and are working on a patch, which should be available later today. For up-to-date information, see #fil-lotus-announcements in Filecoin Slack.

@jennijuju
Copy link
Member

#12390 will fix this, please read this if you are curious about the cause.

Thank you to everyone who reported issues and provided logs to help us with troubleshooting!

@MikeH1999
Copy link

@jennijuju Can I merge patches now?

@jennijuju
Copy link
Member

@MikeH1999 we have done basic testing and ✅ it fixes the issue before devs 👋 for the day. That said, we plan to add integration test & do more testing in butterflynet today/tmr (base on which timezone you are in 😄 ) before we merge the PR. So unless this is impacting your node synchronization (in which case we are not expecting that would like to know if so!) , otherwise, i'd recommend you to wait for the official patch (we aim to ship it by EOD thursday)

@jennijuju
Copy link
Member

image

@jennijuju
Copy link
Member

This upgrade was a complete disaster for us who run a large number of nodes. Each node had a large number of traffic requests, and the public network and the intranet were on the verge of collapse.

This upgrade was a complete disaster for us who run a large number of nodes. Each node had a large number of traffic requests, and the public network and the intranet were on the verge of collapse.

Yes, it is a disaster for me, I failed many WindowPost because of the busy traffic. Some of my storage are outside, and connected to my Post miner node by Internet, now the internet bandwidth is fully occupied by lotus daemon, so WdPost failed.

Hope the issue can be resolved quickly.

@solopine @zelin44913 Sorry to hear that and we do apologize for the damage that this may have caused you. Thank you for bringing the issue to us -> capture and fix this edge case before the fast finality consensus network upgrade.

@solopine
Copy link
Author

Thank you team. I applied #12390 , it worked for me.

@rjan90
Copy link
Contributor

rjan90 commented Aug 16, 2024

Hey everyone!

Thank you all for reporting this issue, and engaging in this thread. A patch release with a fix for this has been shipped in https:/filecoin-project/lotus/releases/tag/v1.28.2.

If you want to have a deeper dive into what the issue was, you can read up on it here: #12287 (comment).

@rjan90 rjan90 closed this as completed Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug P1 P1: Must be resolved
Projects
Status: ☑️ Done (Archive)
Development

No branches or pull requests

9 participants