Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforced data size in channels 2.1.7 prevent large file uploads #1242

Closed
wants to merge 3 commits into from

Conversation

agateblue
Copy link

@agateblue agateblue commented Feb 7, 2019

Fix #1240

Todo:

  • Test case to reproduce the issue
  • Fix :D

I think the actual fix will be a bit tricky because right now, we rely on the Content-Length header value to accept or refuse the request. Mimicking Django's implementation would probably be a better option:

  • If the request CONTENT_TYPE is multipart, then we need to check things when we lazily read the body
  • Else, we can use the CONTENT_LENGHT to accept or refuse the request upfront

If that looks good to you, then I can start working on a fix :)

cc @Zarathustra2 @carltongibson

@carltongibson
Copy link
Member

Okay, awesome. I’m not going to get to this until next week now BUT, your breakdown is exactly what I was half-thinking , so if you’d like to sketch that out too, that would be great. (First draft doesn’t have to be perfect.)

@agateblue
Copy link
Author

@carltongibson I think I got a working fix, it took less time than expected :D

I've added another test case, (multipart + file data + too large POST data) just to ensure we've got that right too. Let me know what you think of it. Basically:

  • if we're in a multipart request, we skip channels check, and let django do the check lazily via the multipartparser when accessing POST data
  • otherwise, we fail early and loudly, when instanciating the request

@agateblue
Copy link
Author

I tested on my project and it indeed fix the denied file upload issue, by the way!

channels/http.py Show resolved Hide resolved
@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

Why didn't you decide to set DATA_UPLOAD_MAX_MEMORY_SIZE=None in your project which should disable this check at all, or why did you decide not to increase your DATA_UPLOAD_MAX_MEMORY_SIZE ?

@agateblue
Copy link
Author

Why didn't you decide to set DATA_UPLOAD_MAX_MEMORY_SIZE=None in your project which should disable this check at all, or why did you decide not to increase your DATA_UPLOAD_MAX_MEMORY_SIZE ?

@jpic as explained in #1240, there is a bug in the current implementation. As stated in Django's documentation about DATA_UPLOAD_MAX_MEMORY_SIZE:

The check is done when accessing request.body or request.POST and is calculated against the total request size excluding any file upload data.`

I do want to benefit from this protection in my project, I just don't want to have it applied to file uploads, which is the behaviour implemented in Django.

@jpic
Copy link
Contributor

jpic commented Feb 7, 2019

Nice, thanks @EliotBerriot. So, your implementation works for me in the sense that it does what it means to do. However, I'm wondering if it's not the occasion to move that check "upon access" as it is it Django, rather than keeping it in the constructor.

I mean, having a too big request should prevent me to read it with RequestTooBig, but should it really prevent me from instantiating a request object at all ? Sorry if it's out of the scope of your PR which does fix a bug.

@agateblue
Copy link
Author

I mean, having a too big request should prevent me to read it with RequestTooBig, but should it really prevent me from instantiating a request object at all ? Sorry if it's out of the scope of your PR which does fix a bug.

@jpic oh, I understand what you mean. Well that's really an API design decision. I don't see anything inherently wrong with the current approach. In terms of security/resource usage, failing upon instanciation is probably the best choice.

Now, if we need to instanciate a big request, it could indeed be impractical. I trying to think of a few situations/use cases where we'd need to do that, but cannot find any right now :)

@agateblue
Copy link
Author

@jpic I've pushed a commit that factorize the boundary / multipart request thing, let me know what you think about it :)

@agateblue agateblue changed the title WIP: Enforced data size in channels 2.1.7 prevent large file uploads Enforced data size in channels 2.1.7 prevent large file uploads Feb 8, 2019
+ b'Content-Disposition: form-data; name="pdf"; filename="book.pdf"\r\n\r\n'
+ b"FAKEPDFBYTESGOHERETHISISREALLYLONGBUTNOTUSEDTOCOMPUTETHESIZEOFTHEREQUEST"
+ END_BOUNDARY
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if body is the same as above, would it work to define a module or class level variable ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpic I can, but my own experience is that we usually don't need the same level of factorization in tests and in runtime code, and I tend to avoid shared state between tests.

Tests should be readable, and independant. If we try to factorize everything, it makes it really harder to understand and maintain tests. Quick example: it took me literally 30 seconds to write those tests, because I could read above tests and copy-paste the code, and edit it as needed.

It would have take me way more time if I had to jump between 5 constants, understand if each matched my needs. Then figure if I should create a new one for my test (which may be slightly different), or edit the one in place (which could break other tests). What if someone edit that contstant in the future? They would need to understand their tests, and mine, to avoid breaking anything.

I like that you can screenshot/copy-paste a test and understand what's going on. It's far more important than factorization IMHO, which quickly become a nightmare with huge codebases and 1000s of tests.

That being said, I'm not maintaining this project, and I'll happily move the whole body in a constant if you think it's the best thing :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I can live with anything, it was just my advice because I like to refactor tests after writing them, let's see what others think about it ;)

Grats for your contribution and have a great day !

@agateblue
Copy link
Author

Hey there, what are we doing about this? :)

@carltongibson
Copy link
Member

Hi @EliotBerriot did you see #1240 (comment)?

@agateblue
Copy link
Author

Hi @carltongibson :) I did, but I don't really know what this mean for the current PR, hence my question. Do we close this? Do we merge this? Do I need to amend this PR?

I understand the need to refactor how channels handle request body. I still think there was a backward incompatible change and a non-standard behaviour (according to django's documentation) with the 2.1.7 release, and in this context I'd say fixing the issue is still relevant. It does not prevent a refactor at a later point.

Setting DATA_UPLOAD_MAX_MEMORY_SIZE=None works in theory but I don't want to allow anyone to send requests with megabytes of data in the body, unless it's a file.

@carltongibson
Copy link
Member

Hey @EliotBerriot. Yes, I understand. It doesn't make any sense from a project management POV to just merge this, just to alter it again immediately. Better to think through the approach, do it right and then roll a release, with appropriate change notes so people know what has occurred.

If you're in a super rush here, you can deploy your branch (using Pip's super VCS support) or create a middleware that just wraps the check for you, which you can remove later.

I am grateful for your effort here, and in particular the test cases that help show what's going on!

At heart, in patching one issue, we've revealed a sub-optimal area. We'll improve that and make progress. I appreciate that you having found it want a fix to be released now, but as I said, measure twice, cut once here. (It's top of my list.)

@agateblue
Copy link
Author

Hey @EliotBerriot. Yes, I understand. It doesn't make any sense from a project management POV to just merge this, just to alter it again immediately. Better to think through the approach, do it right and then roll a release, with appropriate change notes so people know what has occurred.

Sure, this makes complete sense, and we can close this. I wanted to avoid having an open PR lying around for months :D

If you're in a super rush here, you can deploy your branch (using Pip's super VCS support) or create a middleware that just wraps the check for you, which you can remove later.

For now, I've pinned the previous version, and will follow the progress on this topic as well (I'm not sure if I'll be able to help with the dev part, but I can test things, which is still something)

Thank you for the dedication and maintenance effort :)

@agateblue agateblue closed this Feb 14, 2019
@carltongibson carltongibson reopened this Feb 14, 2019
@carltongibson
Copy link
Member

No, don't close it. 😀

It won't be months. It'll be a week or so. (I just need a hour or two to sit down with it.)

I need it open so I don't think I've done it, when I haven't. Thanks. 👍

@carltongibson
Copy link
Member

OK, @EliotBerriot. Thanks for your efforts here.

I've begun work on #1251, which incorporates the tests you've provided. (Very helpful, thank you!) so I'll close this now. I'll ping you on #1251 when that's ready for you to run. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enforced data size in channels 2.1.7 prevent large file uploads
4 participants