Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IpAddr::from_str is unnecessarily slow for IPv6 addresses #94825

Open
jyn514 opened this issue Mar 10, 2022 · 6 comments
Open

IpAddr::from_str is unnecessarily slow for IPv6 addresses #94825

jyn514 opened this issue Mar 10, 2022 · 6 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Mar 10, 2022

from_str is implemented like so:

/// Read an IP Address, either IPv4 or IPv6.
fn read_ip_addr(&mut self) -> Option<IpAddr> {
self.read_ipv4_addr().map(IpAddr::V4).or_else(move || self.read_ipv6_addr().map(IpAddr::V6))
}

This unnecessarily penalizes ipv6 addresses, because the parser will always do a full linear scan of the input before it even starts to parse the address. Instead, it could use the knowledge that IPv4 addresses are at most 15 bytes long to skip directly to read_ipv6_addr for longer strings.

@rustbot label: +I-slow +T-libs +C-enhancement

@rustbot rustbot added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 10, 2022
@jyn514
Copy link
Member Author

jyn514 commented Mar 10, 2022

IPv4 addresses are at most 15 bytes long

This is true because octal addresses were banned in #83652.

@equt
Copy link

equt commented Mar 10, 2022

@rustbot claim

@equt
Copy link

equt commented Mar 12, 2022

I do agree that it could benefit the read_ip_addr, and maybe also the read_socket_addr. But I believe that

the parser will always do a full linear scan of the input

is not true.

If I didn't get it wrong, the current parser will realize the input is an IPv6 after parsing at most four bytes. And if we're going to use the input size as a factor to guess, something like 0000::1 will still suffer from the parser's backtracking.

@mbartlett21
Copy link
Contributor

the current parser will realize the input is an IPv6 after parsing at most four bytes

That seems correct

let mut groups = [0; 4];
for (i, slot) in groups.iter_mut().enumerate() {
*slot = p.read_separator('.', i, |p| {

@equt equt removed their assignment Oct 30, 2022
@Noratrieb
Copy link
Member

There are probably still further performance improvements that could be made, but unless someone has a real-world use case for parsing that many IPv6 addresses, this seems unnecessary.

@saethlin
Copy link
Member

unless someone has a real-world use case for parsing that many IPv6 addresses

My employer maintains an ETL process that mostly chomps through netflow data, spending a nontrivial amount of cycles parsing IP addresses depending on configuration.

I don't particularly see any reason to close this; if we can have super-fast IP address parsing, we should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants