Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the ioutil.ReadAll usage #36151

Closed
alexsapran opened this issue Jul 25, 2023 · 17 comments
Closed

Replace the ioutil.ReadAll usage #36151

alexsapran opened this issue Jul 25, 2023 · 17 comments
Assignees
Labels
Filebeat Filebeat help wanted Indicates that a maintainer wants help on an issue or pull request Team:Elastic-Agent Label for the Agent team

Comments

@alexsapran
Copy link
Contributor

alexsapran commented Jul 25, 2023

While we were working on the offsite and analyzing some of the memory profiles of Filebeat the output to ES was consuming memory for parsing the buffer from the _bulk response.

Looking further into it, we see that when we are executing the bulk request and consuming the response we call the ioutil.ReadAll(resp.Body) which is deprecated and consumes more memory, especially for those big responses.

We should replace this with the Buffer and maybe the io.Copy to copy the response Buffer into a consumable reusable Buffer.

While this issue points to a specific case of the usage of ReadAll taking a look into the codebase of other Beats, it looks like we can even optimize that as well.

The desired output is to write some go benchmark that compares the before and after.

/cc @AndersonQ @pierrehilbert

@alexsapran alexsapran added Filebeat Filebeat help wanted Indicates that a maintainer wants help on an issue or pull request labels Jul 25, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 25, 2023
@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Jul 25, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jul 25, 2023
@amitkanfer
Copy link
Collaborator

trying something out here: #36215

@rdner
Copy link
Member

rdner commented Feb 7, 2024

Let's get it straight first:

Looking further into it, we see that when we are executing the bulk request and consuming the response we call the ioutil.ReadAll(resp.Body) which is deprecated and consumes more memory, especially for those big responses.

The only reason it's deprecated because it was moved to another package and now it's io.ReadAll

We should replace this with the Buffer and maybe the io.Copy to copy the response Buffer into a consumable reusable Buffer.

Okay, I was not sure about this and collected some data:

Analysis

io.ReadAll allocates a buffer of 512 bytes and then re-allocates it to a larger size if the content exceeds this size. I believe in the most of our cases it would probable re-allocate.

Using io.Copy and bytes.Buffer would work in a similar way – first time you try to write to a bytes.Buffer it allocates a slice to accommodate the currently written chunk, if the chunk is smaller than 64 bytes, the minimal allocation is 64 bytes. The next written chunk would probably cause another allocation. However, the buffer growing algorithm is quite sophisticated and probably more efficient in some edge cases.

Benchmarking

To prove the hypothesis I created this benchmark:

package main

import (
	"bytes"
	"fmt"
	"io"
	"io/ioutil"
	"testing"
)

func BenchmarkReadBytes(b *testing.B) {
	sizes := []int{
		100,         // 100 bytes
		100 * 1024,  // 100KB
		1024 * 1024, // 1MB
	}
	for _, size := range sizes {
		b.Run(fmt.Sprintf("size: %d", size), func(b *testing.B) {

			// emulate a file or an HTTP response
			generated := bytes.Repeat([]byte{'a'}, size)
			content := bytes.NewReader(generated)
			b.ResetTimer()

			b.Run("deprecated ioutil.ReadAll", func(b *testing.B) {
				for i := 0; i < b.N; i++ {
					content.Seek(0, io.SeekStart) // reset
					data, err := ioutil.ReadAll(content)
					if err != nil {
						b.Fatal(err)
					}
					if len(data) != size {
						b.Fatalf("size does not match, expected %d, actual %d", size, len(data))
					}
				}
			})
			b.Run("bytes.Buffer+io.Copy", func(b *testing.B) {
				for i := 0; i < b.N; i++ {
					content.Seek(0, io.SeekStart) // reset
					buf := bytes.NewBuffer(nil)
					_, err := io.Copy(buf, content)
					if err != nil {
						b.Fatal(err)
					}
					if buf.Len() != size {
						b.Fatalf("size does not match, expected %d, actual %d", size, buf.Len())
					}
				}
			})
		})
	}
}

And indeed, bytes.Buffer + io.Copy is significantly faster on every size I tested (100 bytes, 100KB, 1MB) and it allocates almost 5 times less memory:

go test -run=none -bench='BenchmarkReadBytes/' -benchmem -benchtime=1000x
goos: darwin
goarch: arm64
pkg: github.com/rdner/tar-test
BenchmarkReadBytes/size:_100/deprecated_ioutil.ReadAll-10                   1000               226.6 ns/op           512 B/op          1 allocs/op
BenchmarkReadBytes/size:_100/bytes.Buffer+io.Copy-10                        1000               173.9 ns/op           160 B/op          2 allocs/op
BenchmarkReadBytes/size:_102400/deprecated_ioutil.ReadAll-10                1000             55907 ns/op          514308 B/op         18 allocs/op
BenchmarkReadBytes/size:_102400/bytes.Buffer+io.Copy-10                     1000              6875 ns/op          106544 B/op          2 allocs/op
BenchmarkReadBytes/size:_1048576/deprecated_ioutil.ReadAll-10               1000            349308 ns/op         5241107 B/op         27 allocs/op
BenchmarkReadBytes/size:_1048576/bytes.Buffer+io.Copy-10                    1000             72474 ns/op         1048626 B/op          2 allocs/op
PASS
ok      github.com/rdner/tar-test       0.700s

In terms of CPU (smaller is better):

Screenshot 2024-02-07 at 10 57 11

In terms of memory (smaller is better):

Screenshot 2024-02-07 at 10 57 23

Conclusion

Even a simple replacement of ioutil.ReadAll (io.ReadAll) with bytes.Buffer + io.Copy (no buffer re-use) can give us a significant boost in both CPU and memory performance. So, it's worth spending time on this change.

Additionally, we can also investigate places where we might know the fixed buffer size and re-use the same buffer multiple times like we do in filestream fingerprinting. However, this is more effort and will take significantly more time.

@rdner rdner self-assigned this Feb 7, 2024
@amitkanfer
Copy link
Collaborator

@rdner well done, great summary.

@rdner
Copy link
Member

rdner commented Feb 12, 2024

I estimated how big of a change it would be:

@rdner
Copy link
Member

rdner commented Feb 13, 2024

Just realised that we need to fix this in the https:/elastic/elastic-agent-libs repository too (10 files).

@alexsapran
Copy link
Contributor Author

I think we should prioritize the input side of Beats when consuming HTTP endpoints, especially for the metricbeat. During EAH I discussed with some folks from the platform SRE complaining about the memory usage of the "agent", which was mostly consuming metrics running as a pod inside MKI.

Given the impact that @rdner showed with this, I think we should start with replacing that part of the code, then talk with SRE-o11y and canary it to measure the impact.

@rdner
Copy link
Member

rdner commented Mar 5, 2024

@alexsapran I was planning to contribute to Go itself replacing the ReadAll implementation there with the more efficient one. Then it's just updating to the next version and gaining all this performance for free. I'll keep you updated here.

@rowlandgeoff
Copy link
Contributor

@rdner I'm browsing around looking at perf issues... any progress on the fix to Go itself? Or should we go ahead and fix our ReadAll usages and get those performance gains soon? Or if we want to break this down further, are there particular usages which may give us a better ROI on our time spent refactoring?

I think we've already got SRE-o11y ready to monitor any perf gains, as Alex mentioned above.

@pierrehilbert
Copy link
Collaborator

Hey @rowlandgeoff
As @rdner is currently focused on our CI reliability, he didn't make any progress on this for a while.
If we can have your team's to automate the triaging he is doing and fix some issues (not related to the tests themselves) we are having, it will allow us to come back to this important topic.

@mauri870
Copy link
Member

mauri870 commented Jul 4, 2024

@rdner I'm browsing around looking at perf issues... any progress on the fix to Go itself? Or should we go ahead and fix our ReadAll usages and get those performance gains soon? Or if we want to break this down further, are there particular usages which may give us a better ROI on our time spent refactoring?

I think we've already got SRE-o11y ready to monitor any perf gains, as Alex mentioned above.

I found the optimizations discussed here to be incredibly insightful. Great work on the findings and investigation! Considering the current development cycle, Go 1.23 is frozen for the August release. Therefore, the most realistic timeframe for these changes to be included would be the Go 1.24 release in February 2025, assuming a CL with the improvements is approved and merged by then. The issue at go.dev/issues/50774 seems to have been stale for two years, so it will likely remain that way unless someone pushes the effort forward.

Readers and writers are excellent abstractions and two of the three most important interfaces in Go (alongside the empty interface). However, covering and optimizing all possible implementations and permutations of these is very difficult and time-consuming, often involving trade-offs. The Go standard library leverages these abstractions extensively, and if a particular case presents too many downsides, it is preferable to maintain stable performance in the standard library rather than optimize for a specific scenario.

Given that said, I would advice to not wait for this to land in the stdlib, the performance gains seems more than justifiable to just keep an optimized version of io.Readall around.

@pierrehilbert pierrehilbert assigned mauri870 and unassigned rdner Aug 29, 2024
@ycombinator ycombinator changed the title Replace the ioutils.ReadAll usage Replace the ioutil.ReadAll usage Sep 3, 2024
@mauri870
Copy link
Member

I spent some time looking into this (probably more time than I should) and the conclusion that I got is a big it depends.

It is clear that io.ReadAll uses more resources than a bytes.Buffer + io.Copy in general, but this is not always the case.

TL;DR;

Most Readers in the standard library implement io.WriterTo, I'll leave a list here for posterity. This seems to be the main optimization that makes the bytes.Buffer + io.Copy fast. The io.ReadAll function cannot take advantage of this.

Third party io.Readers might not implement this optimization, and this makes the < 512 bytes reads 4x slower, this is due to io.ReadAll preallocating a 512 byte buffer. For some use cases you might want faster performance in this scenario. That makes me skeptical about replacing io.ReadAll everywhere because we might not know about the underlying reader and context. But in general, it seems safe to do this replacement as long as we pay attention to these details.

$ cd $GOROOT/api; rg '\) WriteTo\('
go1.22.txt
118:pkg net, method (*TCPConn) WriteTo(io.Writer) (int64, error) #58808
125:pkg os, method (*File) WriteTo(io.Writer) (int64, error) #58808

go1.5.txt
585:pkg go/types, method (*Scope) WriteTo(io.Writer, int, bool)

go1.8.txt
211:pkg net, method (*Buffers) WriteTo(io.Writer) (int64, error)

go1.txt
197:pkg bytes, method (*Buffer) WriteTo(io.Writer) (int64, error)
4428:pkg net, method (*IPConn) WriteTo([]uint8, Addr) (int, error)
4479:pkg net, method (*UDPConn) WriteTo([]uint8, Addr) (int, error)
4498:pkg net, method (*UnixConn) WriteTo([]uint8, Addr) (int, error)
5756:pkg runtime/pprof, method (*Profile) WriteTo(io.Writer, int) error

go1.1.txt
29:pkg bufio, method (*Reader) WriteTo(io.Writer) (int64, error)
37:pkg bufio, method (ReadWriter) WriteTo(io.Writer) (int64, error)
47:pkg bytes, method (*Reader) WriteTo(io.Writer) (int64, error)
2636:pkg strings, method (*Reader) WriteTo(io.Writer) (int64, error)
(beats) ➜ api git:(master) ✗ cd $GOROOT/api; rg ') WriteTo(' *.txt
go1.22.txt
118:pkg net, method (*TCPConn) WriteTo(io.Writer) (int64, error) #58808
125:pkg os, method (*File) WriteTo(io.Writer) (int64, error) #58808

go1.5.txt
585:pkg go/types, method (*Scope) WriteTo(io.Writer, int, bool)

go1.8.txt
211:pkg net, method (*Buffers) WriteTo(io.Writer) (int64, error)

go1.txt
197:pkg bytes, method (*Buffer) WriteTo(io.Writer) (int64, error)
4428:pkg net, method (*IPConn) WriteTo([]uint8, Addr) (int, error)
4479:pkg net, method (*UDPConn) WriteTo([]uint8, Addr) (int, error)
4498:pkg net, method (*UnixConn) WriteTo([]uint8, Addr) (int, error)
5756:pkg runtime/pprof, method (*Profile) WriteTo(io.Writer, int) error

go1.1.txt
29:pkg bufio, method (*Reader) WriteTo(io.Writer) (int64, error)
37:pkg bufio, method (ReadWriter) WriteTo(io.Writer) (int64, error)
47:pkg bytes, method (*Reader) WriteTo(io.Writer) (int64, error)
2636:pkg strings, method (*Reader) WriteTo(io.Writer) (int64, error)
(beats) ➜ api $(git_prompt_info)
(beats) ➜ api git:(master) ✗
(beats) ➜ api git:(master) ✗ cd $GOROOT/api; rg ') WriteTo(' *.txt
go1.22.txt
118:pkg net, method (*TCPConn) WriteTo(io.Writer) (int64, error) #58808
125:pkg os, method (*File) WriteTo(io.Writer) (int64, error) #58808

go1.5.txt
585:pkg go/types, method (*Scope) WriteTo(io.Writer, int, bool)

go1.8.txt
211:pkg net, method (*Buffers) WriteTo(io.Writer) (int64, error)

go1.txt
197:pkg bytes, method (*Buffer) WriteTo(io.Writer) (int64, error)
4428:pkg net, method (*IPConn) WriteTo([]uint8, Addr) (int, error)
4479:pkg net, method (*UDPConn) WriteTo([]uint8, Addr) (int, error)
4498:pkg net, method (*UnixConn) WriteTo([]uint8, Addr) (int, error)
5756:pkg runtime/pprof, method (*Profile) WriteTo(io.Writer, int) error

go1.1.txt
29:pkg bufio, method (*Reader) WriteTo(io.Writer) (int64, error)
37:pkg bufio, method (ReadWriter) WriteTo(io.Writer) (int64, error)
47:pkg bytes, method (*Reader) WriteTo(io.Writer) (int64, error)
2636:pkg strings, method (*Reader) WriteTo(io.Writer) (int64, error)

Small Reads (< 512 bytes)

For smaller reads (up to 512 bytes) io.ReadAll is faster, since it preallocates a single 512 bytes chunk of memory by default and does not need to grow it until it needs more space. The downside is that it will always allocate 512 bytes even if the io.Reader you have only yields 32 bytes of data, it trades a slightly larger memory footprint for cpu.

Also there are io.Readers and there are io.Readers. Some readers implement io.WriterTo, which is an optimization that allows the Read operation to avoid creating an intermediary buffer during io.Copy. Some readers, let's call then "dumb readers", only implement io.Read. For io.ReadAll this optimization does not really matter as it only relies on Read anyway.
If the source is a dumb reader then the performance of bytes.Buffer + io.Copy becomes much worse than an io.ReadAll for < 512 bytes since it cannot take advantage of this optimization:

BenchmarkReadAll/size_64B/io.ReadAll/WriterTo-32                 7117320               166.3 ns/op           512 B/op          1 allocs/op
BenchmarkReadAll/size_64B/io.ReadAll/Reader-32                   7319155               167.4 ns/op           512 B/op          1 allocs/op
BenchmarkReadAll/size_64B/ReadAll/WriterTo-32                   10699069               113.4 ns/op           112 B/op          2 allocs/op
BenchmarkReadAll/size_64B/ReadAll/Reader-32                      2122534               551.0 ns/op          1584 B/op          3 allocs/op

So in conclusion, for readers yielding < 512 bytes:

  • io.ReadAll never takes io.WriterTo/io.ReaderFrom optimization into account.
  • For dumb readers io.ReadAll is 4x faster than bytes.Buffer+io.Copy due to the preallocated buffer.
  • If the source implements io.WriterTo then bytes.Buffer+io.Copy is faster and uses less memory than io.ReadAll

Remaining Reads (> 512 bytes)

For these sizes it does not really matter if the reader is dumb or not, since io.ReadAll does not take advantage of the optimization. This means that bytes.Buffer + io.Copy is able to use io.WriterTo from the source and read it in one Go (no pun intended), with this there is a huge perf gain. If the reader is dumb then we rely on bytes.Buffer growing algorithm, which is less picky than the runtime append growth ratio.

BenchmarkReadAll/size_100KB/io.ReadAll/WriterTo-32                 13582             88505 ns/op          514304 B/op         18 allocs/op
BenchmarkReadAll/size_100KB/io.ReadAll/Reader-32                   13425             89588 ns/op          514304 B/op         18 allocs/op
BenchmarkReadAll/size_100KB/ReadAll/WriterTo-32                    53968             22920 ns/op          106544 B/op          2 allocs/op
BenchmarkReadAll/size_100KB/ReadAll/Reader-32                      22172             54702 ns/op          261680 B/op         10 allocs/op

In conclusion, for > 512 byte reads:

  • io.ReadAll is slower and more resource intensive in all cases, its only advantage was the 512 bytes preallocated buffer, but at this point it needs to regrow multiple times, following the append growth ratio.
  • bytes.Buffer + io.Copy is the better choice for dumb and optimized readers.

@amitkanfer
Copy link
Collaborator

Thanks @mauri870 !
So what should be our next step here? Should we look into the different places where we use io.ReadAll and assess if the average allocation is smaller or bigger than 512 bytes?

@mauri870
Copy link
Member

Given that most readers in the standard library do implement io.WriterTo, it seems wise for us to take advantage of this whenever we can. Unfortunately there is a multitude of io.Readers in the stdlib and third party packages that it is hard to track the details, this starts to blur the line of microptimization territory IMHO. Generally when you deal with readers you just want to read from them an not spend time figuring out which kind of reader it is or if it yields x bytes of data.

This is one of the reasons it is very unlikely that this would land in the stdlib, there is too many corner cases and details, there is no way to make it faster in all scenarios, it is very dependent on the situation.

My plan is to add this optimized ReadAll to elastic-agent-libs alongside tests and benchmarks, like what @rdner did with the http optimized ReadAll. But in order to actually use this we need to identify places where we can use this and don't have downsides. That means careful analysis of the surrounding context and preferably a benchmark to confirm it is actually a speedup.

For example:

f, _ := os.Open("x.txt")
b, _ := io.ReadAll(f)

We could just blindly replace this with a bytes.Buffer + io.Copy, but pre Go 1.22 os.File did not implement io.WriterTo. This means that if this file was < 512 bytes this change would be 4x slower.

@mauri870
Copy link
Member

mauri870 commented Oct 1, 2024

I believe this issue has deviated slightly from its original purpose. The initial goal was to improve the usage of io.ReadAll in the libbeat Elasticsearch client, which has already been addressed in #36275. This is confirmed by the benchmarks introduced in #40885.

While there are still many instances of io.ReadAll across multiple projects, I think it's impractical to replace all occurrences with a buffer without first confirming that doing so doesn’t introduce a slowdown in the scenarios previously mentioned. I’ve added an optimized version of readall with accompanying benchmarks elastic/elastic-agent-libs#229.

If we want to spend more time investigating each occurrence, we should open a separate issue for that. Perhaps those more familiar with the codebase can highlight areas where these optimizations would provide significant benefits, allowing us to focus our efforts more effectively.

@amitkanfer
Copy link
Collaborator

Ack. Thank you @mauri870 .
@alexsapran @jeniawhite @rdner - specific places we need to focus at, or should we close this issue as done? (or both ;) )

@alexsapran
Copy link
Contributor Author

Given the focus on the OTel, it's safe to say we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat help wanted Indicates that a maintainer wants help on an issue or pull request Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

7 participants