-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PNG decode adaptor iterator #207
Conversation
This incomplete PNG decoder presents as an Iterator reading u8 bytes directly from a Reader. Aproaching a png decoder as an Iterator requires significantly less Heap than the traditional batch aproach. On Precursor, this enables PNG bytes to be intercepted at the network interface and processed into the much smaller gam::Bitmap format without ever having to store the original PNG file.
There is some benchmarking/tuning to be done at some stage with:
These 2 buffers consume heap space, but also probably impact speed. |
The |
caused by flakey Internet connection
I just discovered that the |
I'm unable to fully test this because of #208, but I did check sizes with and without ditherpunk: With
Without
The good news is the bloat seems to be gone in Most of the weight now sits in |
Actually, I was on the wrong branch. This is what it looks like with your branch. The performance is pretty slow -- takes maybe...3-5 minutes? to decode the PNG. I need to wrap a timer around it because my attention span isn't long enough to time it. But the memory usage seems pretty spot on: heap didn't go over 128k, which is really good. It's the usual trade-off tho, isn't it? speed versus space. :-/ |
Here's the runtime breakdown:
The decode took 10 seconds; the conversion to bitmap took 492 seconds (about 8.2 minutes); the modal display was...less than 18 seconds, but it includes the time it took for me to notice it finished and dismiss the modal, so it could be much much faster than that. Anyways, the long pole in the tent is the conversion to the bitmap, stuff behind this line of code: let bm = gam::Bitmap::from_png(&mut png, Some(modal_size)); No big worries, I'm quite pleased with where this is all going tbh. Performance tuning is much easier imho than space tuning, I'm betting there's probably just an issue where data is being sent between processes or something and a little shuffling of abstractions will clear it all up. I'll have a look later, for now, just getting the net stack working to the point where I could decode the images reliably was surprisingly hard. |
Yep that looks like the correct branch. Firstly the image is aligned correctly in the modal, and secondly there is some distortion in the image in a couple of places (behind bunny and above bunnies back) that is some problem with my png decode implementation. Have to chase that down at some point with fresh eyes.
Yep that is pretty slow!
Yep - that is the trade-off - but we now have functional image processing machinery that fits in the Precursor footprint. There are some easy blunt tools available for tuning (
I need to drill into these timings in much more detail. Where exactly did you place these checkpoints? And, how did you go about inserting the timer? (to save me reinventing the wheel)
Well done on the net-stack. The net-stack issues were going to come to light sooner or later - so it is undoubtedly better to have the hard work done sooner, and have it largely (hopefully) behind us. |
I looked at the log above a little harder and I think I can make some guesses
At this point we have consumed the entire Tcp header and know to expect a png.
At this point we have read only the first IHDR chunk of the png and it has provided us with some basic info regards the png data that will follow in the IDAT chunks
At this point we have consumed all of the png data from the IDAT chunks AND completed the entire conversion to gam::Bitmap pixel by pixel. I think I can conclude from this that the choke-point is actually in the process of reading bytes from the network. This is because it has taken 10sec to read the TCP header and the PNG IHDR chuck - which only amounts to well less than 1k bytes. So I suspect reading one u8 at a time from the TcpStream is the culprit. And, this little comment in the code might hold the solution:
|
Huh interesting! Here's the code used to generate the log: I didn't realize it was just streaming down byte-by-byte. Yes, that would take a very long time to download. I think BufRead might work, now that the underlying TcpStream works better, but I don't actually have a test case to check it on hand. |
Yep, confirmed BufReader works now! yay. |
The change to BufReader made <10% improvement in hosted mode. |
I pulled in your latest
That's a nice victory lap for fixing BufReader. 🎉 I'd say the decoder is fully usable now. The rendering needs a little more tweaking but a 19.2x speed improvement is already huge. |
The checksum calc is inner-loop so removed for performance gain. If a hacker can manipulate a byte then then they can manipulate a RGB byte or a checksum byte - so the checksum doesn't really give any protection from hacking afaik. I presume the checksum is in there to detect corruption during storage or transmission - but if a couple of bytes are corrupted you will not notice the change in the image - and if a lot of bytes are corrupted then the distortion in the image will be obvious. This is even more the case in Precursor, where we are shrinking and dithering the image as a matter of course. So I don't really see the point of the checkum
png filter specification calls for initial prior_line = [0u8] see: https://www.w3.org/TR/PNG/#9Filters but [0u8] results in distortion ??????? so use [125u8]
I ran some quick tests in hosted mode:
|
I think those benchmarks in hosted mode are in agreement with what would happen on hardware -- my guess is the PNG decoder is fine, what's broken is the TCP performance. Sussing out that performance issue is for another day, though... |
This incomplete PNG decoder presents as an Iterator reading u8 bytes directly from a Reader.
Approaching a png decoder as an Iterator requires significantly less Heap than
the traditional batch approach.
On Precursor, this enables PNG bytes to be intercepted at the network interface
and processed into the much smaller gam::Bitmap format without ever having to
store the original PNG file.