Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap parse_mimetype in an lru_cache #3341

Merged
merged 3 commits into from
Oct 13, 2018
Merged

Wrap parse_mimetype in an lru_cache #3341

merged 3 commits into from
Oct 13, 2018

Conversation

orf
Copy link
Contributor

@orf orf commented Oct 12, 2018

What do these changes do?

parse_mimetype seems like an ideal candidate for an lru_cache. The return result is immutable, it's non-trivial, called fairly often and has a single, immutable input.

Are there changes in behavior for the user?

Nope.

Related issue number

None.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@@ -237,6 +237,7 @@ class MimeType:
parameters = attr.ib(type=MultiDict) # type: MultiDict[str]


@functools.lru_cache(maxsize=56)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. But why 56?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. It's potentially controlled by an attacker, so it cannot be too high. Could make it 100?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There 75 mime types in nginx default mime.types file. Quite much of them are not common / popular. I'm pretty sure that 10 will be more than enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does include the charset as well, so maybe 10 might be too low? There isn't much harm in having it higher, if it's too low it defeats the entire point and just adds a bit of overhead however.

@codecov-io
Copy link

codecov-io commented Oct 12, 2018

Codecov Report

Merging #3341 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3341      +/-   ##
==========================================
- Coverage   98.03%   97.99%   -0.04%     
==========================================
  Files          44       44              
  Lines        8039     8039              
  Branches     1357     1357              
==========================================
- Hits         7881     7878       -3     
- Misses         65       67       +2     
- Partials       93       94       +1
Impacted Files Coverage Δ
aiohttp/helpers.py 97.64% <100%> (ø) ⬆️
aiohttp/tcp_helpers.py 90% <0%> (-6.67%) ⬇️
aiohttp/client_reqrep.py 97.49% <0%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9ea8a02...b7e691d. Read the comment docs.

@asvetlov
Copy link
Member

I'm curious what is the performance boost?

@orf
Copy link
Contributor Author

orf commented Oct 12, 2018

Master:

> %timeit parse_mimetype('text/html; charset=utf-8')
> 5.06 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This patch:

> %timeit parse_mimetype('text/html; charset=utf-8')
> 95.8 ns ± 0.771 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

@orf
Copy link
Contributor Author

orf commented Oct 12, 2018

Ahh, one small thing: the parameters object needs to be immutable. I've changed it to use MultiDictProxy.

Not sure if this is a breaking change, but the speedup is pretty nice if we can accept this.

@asvetlov asvetlov merged commit aac7a69 into aio-libs:master Oct 13, 2018
@asvetlov
Copy link
Member

thanks

@orf orf deleted the patch-1 branch October 13, 2018 01:22
@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https:/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants