-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the ioutil.ReadAll usage #36151
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
trying something out here: #36215 |
Let's get it straight first:
The only reason it's deprecated because it was moved to another package and now it's
Okay, I was not sure about this and collected some data: Analysis
Using BenchmarkingTo prove the hypothesis I created this benchmark: package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"testing"
)
func BenchmarkReadBytes(b *testing.B) {
sizes := []int{
100, // 100 bytes
100 * 1024, // 100KB
1024 * 1024, // 1MB
}
for _, size := range sizes {
b.Run(fmt.Sprintf("size: %d", size), func(b *testing.B) {
// emulate a file or an HTTP response
generated := bytes.Repeat([]byte{'a'}, size)
content := bytes.NewReader(generated)
b.ResetTimer()
b.Run("deprecated ioutil.ReadAll", func(b *testing.B) {
for i := 0; i < b.N; i++ {
content.Seek(0, io.SeekStart) // reset
data, err := ioutil.ReadAll(content)
if err != nil {
b.Fatal(err)
}
if len(data) != size {
b.Fatalf("size does not match, expected %d, actual %d", size, len(data))
}
}
})
b.Run("bytes.Buffer+io.Copy", func(b *testing.B) {
for i := 0; i < b.N; i++ {
content.Seek(0, io.SeekStart) // reset
buf := bytes.NewBuffer(nil)
_, err := io.Copy(buf, content)
if err != nil {
b.Fatal(err)
}
if buf.Len() != size {
b.Fatalf("size does not match, expected %d, actual %d", size, buf.Len())
}
}
})
})
}
} And indeed,
In terms of CPU (smaller is better):In terms of memory (smaller is better):ConclusionEven a simple replacement of Additionally, we can also investigate places where we might know the fixed buffer size and re-use the same buffer multiple times like we do in filestream fingerprinting. However, this is more effort and will take significantly more time. |
@rdner well done, great summary. |
I estimated how big of a change it would be:
|
Just realised that we need to fix this in the https:/elastic/elastic-agent-libs repository too (10 files). |
I think we should prioritize the input side of Beats when consuming HTTP endpoints, especially for the metricbeat. During EAH I discussed with some folks from the platform SRE complaining about the memory usage of the "agent", which was mostly consuming metrics running as a pod inside MKI. Given the impact that @rdner showed with this, I think we should start with replacing that part of the code, then talk with SRE-o11y and canary it to measure the impact. |
@alexsapran I was planning to contribute to Go itself replacing the |
@rdner I'm browsing around looking at perf issues... any progress on the fix to Go itself? Or should we go ahead and fix our ReadAll usages and get those performance gains soon? Or if we want to break this down further, are there particular usages which may give us a better ROI on our time spent refactoring? I think we've already got SRE-o11y ready to monitor any perf gains, as Alex mentioned above. |
Hey @rowlandgeoff |
I found the optimizations discussed here to be incredibly insightful. Great work on the findings and investigation! Considering the current development cycle, Go 1.23 is frozen for the August release. Therefore, the most realistic timeframe for these changes to be included would be the Go 1.24 release in February 2025, assuming a CL with the improvements is approved and merged by then. The issue at go.dev/issues/50774 seems to have been stale for two years, so it will likely remain that way unless someone pushes the effort forward. Readers and writers are excellent abstractions and two of the three most important interfaces in Go (alongside the empty interface). However, covering and optimizing all possible implementations and permutations of these is very difficult and time-consuming, often involving trade-offs. The Go standard library leverages these abstractions extensively, and if a particular case presents too many downsides, it is preferable to maintain stable performance in the standard library rather than optimize for a specific scenario. Given that said, I would advice to not wait for this to land in the stdlib, the performance gains seems more than justifiable to just keep an optimized version of io.Readall around. |
I spent some time looking into this (probably more time than I should) and the conclusion that I got is a big it depends. It is clear that io.ReadAll uses more resources than a bytes.Buffer + io.Copy in general, but this is not always the case. TL;DR;Most Readers in the standard library implement io.WriterTo, I'll leave a list here for posterity. This seems to be the main optimization that makes the bytes.Buffer + io.Copy fast. The io.ReadAll function cannot take advantage of this. Third party io.Readers might not implement this optimization, and this makes the < 512 bytes reads 4x slower, this is due to io.ReadAll preallocating a 512 byte buffer. For some use cases you might want faster performance in this scenario. That makes me skeptical about replacing io.ReadAll everywhere because we might not know about the underlying reader and context. But in general, it seems safe to do this replacement as long as we pay attention to these details. $ cd $GOROOT/api; rg '\) WriteTo\('go1.22.txt 118:pkg net, method (*TCPConn) WriteTo(io.Writer) (int64, error) #58808 125:pkg os, method (*File) WriteTo(io.Writer) (int64, error) #58808 Small Reads (< 512 bytes)For smaller reads (up to 512 bytes) io.ReadAll is faster, since it preallocates a single 512 bytes chunk of memory by default and does not need to grow it until it needs more space. The downside is that it will always allocate 512 bytes even if the io.Reader you have only yields 32 bytes of data, it trades a slightly larger memory footprint for cpu. Also there are io.Readers and there are io.Readers. Some readers implement io.WriterTo, which is an optimization that allows the Read operation to avoid creating an intermediary buffer during io.Copy. Some readers, let's call then "dumb readers", only implement io.Read. For io.ReadAll this optimization does not really matter as it only relies on Read anyway.
So in conclusion, for readers yielding < 512 bytes:
Remaining Reads (> 512 bytes)For these sizes it does not really matter if the reader is dumb or not, since io.ReadAll does not take advantage of the optimization. This means that bytes.Buffer + io.Copy is able to use io.WriterTo from the source and read it in one Go (no pun intended), with this there is a huge perf gain. If the reader is dumb then we rely on bytes.Buffer growing algorithm, which is less picky than the runtime append growth ratio.
In conclusion, for > 512 byte reads:
|
Thanks @mauri870 ! |
Given that most readers in the standard library do implement io.WriterTo, it seems wise for us to take advantage of this whenever we can. Unfortunately there is a multitude of io.Readers in the stdlib and third party packages that it is hard to track the details, this starts to blur the line of microptimization territory IMHO. Generally when you deal with readers you just want to read from them an not spend time figuring out which kind of reader it is or if it yields x bytes of data. This is one of the reasons it is very unlikely that this would land in the stdlib, there is too many corner cases and details, there is no way to make it faster in all scenarios, it is very dependent on the situation. My plan is to add this optimized ReadAll to elastic-agent-libs alongside tests and benchmarks, like what @rdner did with the http optimized ReadAll. But in order to actually use this we need to identify places where we can use this and don't have downsides. That means careful analysis of the surrounding context and preferably a benchmark to confirm it is actually a speedup. For example:
We could just blindly replace this with a bytes.Buffer + io.Copy, but pre Go 1.22 os.File did not implement io.WriterTo. This means that if this file was < 512 bytes this change would be 4x slower. |
I believe this issue has deviated slightly from its original purpose. The initial goal was to improve the usage of io.ReadAll in the libbeat Elasticsearch client, which has already been addressed in #36275. This is confirmed by the benchmarks introduced in #40885. While there are still many instances of io.ReadAll across multiple projects, I think it's impractical to replace all occurrences with a buffer without first confirming that doing so doesn’t introduce a slowdown in the scenarios previously mentioned. I’ve added an optimized version of readall with accompanying benchmarks elastic/elastic-agent-libs#229. If we want to spend more time investigating each occurrence, we should open a separate issue for that. Perhaps those more familiar with the codebase can highlight areas where these optimizations would provide significant benefits, allowing us to focus our efforts more effectively. |
Ack. Thank you @mauri870 . |
Given the focus on the OTel, it's safe to say we can close this. |
While we were working on the offsite and analyzing some of the memory profiles of Filebeat the output to ES was consuming memory for parsing the buffer from the
_bulk
response.Looking further into it, we see that when we are executing the bulk request and consuming the response we call the
ioutil.ReadAll(resp.Body)
which is deprecated and consumes more memory, especially for those big responses.We should replace this with the Buffer and maybe the
io.Copy
to copy the response Buffer into a consumable reusable Buffer.While this issue points to a specific case of the usage of
ReadAll
taking a look into the codebase of other Beats, it looks like we can even optimize that as well.The desired output is to write some go benchmark that compares the before and after.
/cc @AndersonQ @pierrehilbert
The text was updated successfully, but these errors were encountered: