Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket flow control is broken (deadlock+memory leak on websocket.receive) #1845

Closed
alexmnazarenko opened this issue Apr 26, 2017 · 16 comments

Comments

@alexmnazarenko
Copy link

alexmnazarenko commented Apr 26, 2017

Long story short

Hello! I've ran into serious problem while implementing custom RPC over websockets (aiohttp on both sides). On a dense stream of larger messages consumer just freezes (while eating gigabytes of memory). Sender is "fine" (await ws.send_bytes() proceeds).

It seems to be correlated with message size, but I'm not sure. Surfaces randomly, but quite often on larger (64k+) messages. Who is client and who is server isn't important, so I'll stick to sender/consumer terms.

The problem manifested itself only in actual distributed test, if sender-consumer are both on localhost everything works fine.

Why do I think that issue is somewhere in flow control - the usecase was fine on older versions of aiohttp before it was implemented.

Any feedback is much appreciated )

Expected behaviour

  • Consumer doesn't hang
  • Sender blocks if needed

Actual behaviour

  • On consumer's side: await ws.receive (or async for msg in ws) hangs at some point, memory grows to infinity and beyond
  • On sender's side: everything looks fine (but you'll never get any response as consumer is dead)
  • Even more interesting: very rarely consumer shows an error and disconnects

Steps to reproduce

Original program was quite convoluted, so I've made a simple example reproducing the issue.

Consumer script (implemented as server):

import aiohttp.web

async def consumer(request):
    ws = aiohttp.web.WebSocketResponse()
    await ws.prepare(request)

    msg_index = 0
    async for msg in ws:
        msg_index += 1
        print("Block", msg.type, msg_index)
    return ws


def main():
    app = aiohttp.web.Application()
    app.router.add_get("/ws", consumer)
    aiohttp.web.run_app(app, port=9090)

if __name__ == '__main__':
    main()

Sender script (implemented as client):

import asyncio
import aiohttp
import uuid

async def main():
    # can use larger number for increase the chance to hang, but beware of consumer's memory use.
    # with provided constant values it'll peak at ~1.3Gb
    BLOCK_COUNT = 5000
    BLOCK_SIZE_MULTIPLIER = 2**14
    # uuid4 is 16 bytes, 256k block total size
    BLOCK_SIZE = 16 * BLOCK_SIZE_MULTIPLIER
    async with aiohttp.ClientSession() as session:
        # TODO: use your own IP :)
        async with session.ws_connect("http://xxx.xxx.xxx.xxx:9090/ws") as ws:
            for block_idx in range(BLOCK_COUNT):
                await ws.send_bytes(uuid.uuid4().bytes * BLOCK_SIZE_MULTIPLIER)
                print("Block", block_idx + 1, "sent")
    
if __name__ == '__main__':
  asyncio.get_event_loop().run_until_complete(main())

Sender prints everything up to "Block 5000 sent". Consumer often, but not always stops somewhere much eariler (seen 293, 3097 - completely random).

Your environment

  • 2 machines, Ubuntu 16.04
  • Python 3.6
  • aiohttp 2.0.7
  • Gbit LAN
@alexmnazarenko
Copy link
Author

Has anyone been able to reproduce it yet? I hope I'm not crazy )

@fafhrd91
Copy link
Member

I will check in couple days

@alexmnazarenko
Copy link
Author

Update: error may happen even on localhost connection, but probability is really low

@alexmnazarenko
Copy link
Author

Update2: using uvloop doesn't help either (but it's faster, so failure rate is lower)

@fafhrd91
Copy link
Member

I am planing to look into this problem before 2.1 release.

Btw did you try aiohttp from master?

@alexmnazarenko
Copy link
Author

Will try it right now, ETA ~15min )

@alexmnazarenko
Copy link
Author

Update: with aiohttp-master still hangs and leaks, pretty reliably

@alexmnazarenko
Copy link
Author

Any news? 2.1 behaves the same. I really hope to look into it myself at the end of month, but kinda caught in the deadlines crossfire for now.

@fafhrd91
Copy link
Member

fafhrd91 commented Jun 5, 2017

Sorry, I didn't have enough time. Will try to look into this problem this week.

@pjknkda
Copy link
Contributor

pjknkda commented Jun 7, 2017

@alexmnazarenko Could you test this issue with #1962? I've tested it and the MR seems to fix the issue in my machine.

@alexmnazarenko
Copy link
Author

@pjknkda Works like a charm, tested it on multiple machines ) Thank you!

@alexmnazarenko
Copy link
Author

@fafhrd91 Can we expect a release with this fix anytime soon?

@fafhrd91
Copy link
Member

fafhrd91 commented Jun 7, 2017

yes, but I'd like to check #1955 as well

@kummahiih
Copy link

kummahiih commented Jan 2, 2018

This seems to be security critical. Any idea of how long it takes to fix this?

@asvetlov asvetlov added this to the 3.0 milestone Jan 2, 2018
@asvetlov asvetlov modified the milestones: 3.0, 3.1 Feb 12, 2018
@asvetlov asvetlov modified the milestones: 3.1, 3.2 Mar 22, 2018
@asvetlov asvetlov modified the milestones: 3.2, 3.3 May 7, 2018
@asvetlov
Copy link
Member

Everything is fixed, at least 3.2 has no such problems

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https:/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants