Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PNG decode adaptor iterator #207

Merged
merged 23 commits into from
Aug 10, 2022
Merged

Conversation

nworbnhoj
Copy link
Contributor

This incomplete PNG decoder presents as an Iterator reading u8 bytes directly from a Reader.

Approaching a png decoder as an Iterator requires significantly less Heap than
the traditional batch approach.

On Precursor, this enables PNG bytes to be intercepted at the network interface
and processed into the much smaller gam::Bitmap format without ever having to
store the original PNG file.

This incomplete PNG decoder presents as an Iterator reading u8 bytes directly from a Reader.

Aproaching a png decoder as an Iterator requires significantly less Heap than
the traditional batch aproach.

On Precursor, this enables PNG bytes to be intercepted at the network interface
and processed into the much smaller gam::Bitmap format without ever having to
store the original PNG file.
@nworbnhoj
Copy link
Contributor Author

nworbnhoj commented Jul 27, 2022

There is some benchmarking/tuning to be done at some stage with:

const IDAT_BUFFER_LENGTH: usize = 256;
const INFLATED_BUFFER_LENGTH: usize = 1024;

These 2 buffers consume heap space, but also probably impact speed.

@nworbnhoj
Copy link
Contributor Author

The miniz_oxide InflateState struct is currently on the Stack.
inflate_state: InflateState::new(DataFormat::Zlib),
but it may be better on the Heap.
inflate_state: InflateState::new_boxed(DataFormat::Zlib)
So long as IDAT_BUFFER_LENGTH and INFLATED_BUFFER_LENGTH are relatively small then InflateState will also probably be small.

@nworbnhoj
Copy link
Contributor Author

I just discovered that the miniz_oxide InflateState struct allocated an internal buffer of [u8; 32768] - so it is not going to work very well on the Stack or the Heap. This will need a little more work....

@bunnie
Copy link
Member

bunnie commented Jul 29, 2022

I'm unable to fully test this because of #208, but I did check sizes with and without ditherpunk:

With

    PID   1:  628 k kernel
    PID   2: 1052 k gam
    PID   3:  784 k status
    PID   4:  796 k shellchat
    PID   5:  268 k ime-frontend
    PID   6:  164 k ime-plugin-shell
    PID   7:  880 k graphics-server
    PID   8:  208 k ticktimer-server
    PID   9:  128 k log-server
    PID  10:  228 k com
    PID  11:  176 k xous-names
    PID  12:  196 k keyboard
    PID  13:  184 k trng
    PID  14:  260 k llio
    PID  15:  172 k susres
    PID  16:  236 k codec
    PID  17:  176 k sha2
    PID  18:  200 k engine-25519
    PID  19:  220 k spinor
    PID  20:  832 k root-keys
    PID  21:  168 k jtag
    PID  22:  816 k net
    PID  23:  312 k dns
    PID  24: 1156 k pddb
    PID  25:  448 k modals
    PID  26:  268 k usb-device-xous
    PID  27: 1016 k vault
11972 k total

Without

    PID   1:  628 k kernel
    PID   2:  396 k gam
    PID   3:  780 k status
    PID   4:  692 k shellchat
    PID   5:  268 k ime-frontend
    PID   6:  164 k ime-plugin-shell
    PID   7:  488 k graphics-server
    PID   8:  208 k ticktimer-server
    PID   9:  128 k log-server
    PID  10:  228 k com
    PID  11:  176 k xous-names
    PID  12:  196 k keyboard
    PID  13:  184 k trng
    PID  14:  260 k llio
    PID  15:  172 k susres
    PID  16:  236 k codec
    PID  17:  176 k sha2
    PID  18:  200 k engine-25519
    PID  19:  220 k spinor
    PID  20:  828 k root-keys
    PID  21:  168 k jtag
    PID  22:  828 k net
    PID  23:  312 k dns
    PID  24: 1148 k pddb
    PID  25:  412 k modals
    PID  26:  272 k usb-device-xous
    PID  27: 1096 k vault
10864 k total

The good news is the bloat seems to be gone in status and modals!

Most of the weight now sits in gam and graphics-server which is where you'd expect it to go. I'm not quite sure how you did it but that's awesome.

@bunnie
Copy link
Member

bunnie commented Aug 4, 2022

Been wrestling with the network stack the past few days, and finally got it working well enough to do this:

photo1659621300

slowly coming together...

@bunnie
Copy link
Member

bunnie commented Aug 5, 2022

photo1659687130

Actually, I was on the wrong branch. This is what it looks like with your branch.

The performance is pretty slow -- takes maybe...3-5 minutes? to decode the PNG. I need to wrap a timer around it because my attention span isn't long enough to time it. But the memory usage seems pretty spot on: heap didn't go over 128k, which is really good.

It's the usual trade-off tho, isn't it? speed versus space. :-/

@bunnie
Copy link
Member

bunnie commented Aug 5, 2022

Here's the runtime breakdown:

INFO:shellchat::cmds::net_cmd: fetching response.... (services\shellchat\src\cmds\net_cmd.rs:315)
INFO:shellchat::cmds::net_cmd: trying to read entire response... (services\shellchat\src\cmds\net_cmd.rs:317)
INFO:shellchat::cmds::net_cmd: "HTTP/1.1 200 OK\r\n" (services\shellchat\src\cmds\net_cmd.rs:347)
INFO:shellchat::cmds::net_cmd: "Date: Fri, 05 Aug 2022 07:57:46 GMT\r\n" (services\shellchat\src\cmds\net_cmd.rs:347)
...[snip]...
INFO:shellchat::cmds::net_cmd: "Content-Type: image/png\r\n" (services\shellchat\src\cmds\net_cmd.rs:347)
INFO:shellchat::cmds::net_cmd: attr: content-type,  image/png
 (services\shellchat\src\cmds\net_cmd.rs:351)
INFO:shellchat::cmds::net_cmd: found end of header after 10 lines. (services\shellchat\src\cmds\net_cmd.rs:336)
INFO:shellchat::cmds::net_cmd: heap size: 65536 (services\shellchat\src\cmds\net_cmd.rs:364)
INFO:gam::bitmap::decode_png: DecodePng ready: size(313,245) bit-depth=8 color_type=2 bytes_per_line=940 (services\gam\src\bitmap\decode_png.rs:126)
INFO:shellchat::cmds::net_cmd: png decoded in 10371ms (services\shellchat\src\cmds\net_cmd.rs:367)
INFO:shellchat::cmds::net_cmd: bitmap created (services\shellchat\src\cmds\net_cmd.rs:373)
INFO:gam::bitmap: rotate image (services\gam\src\bitmap.rs:183)
INFO:shellchat::cmds::net_cmd: converted to bitmap in 492825ms (services\shellchat\src\cmds\net_cmd.rs:376)
INFO:shellchat::cmds::net_cmd: heap size: 131072 (services\shellchat\src\cmds\net_cmd.rs:378)
INFO:shellchat::cmds::net_cmd: showing image (services\shellchat\src\cmds\net_cmd.rs:380)
INFO:gam::modal::image: bitmap height 312 : margin 8 (services\gam\src\modal\image.rs:35)
INFO:gam::modal::image: bitmap height 312 : margin 8 (services\gam\src\modal\image.rs:35)
INFO:gam::modal::image: drawing bitmap (services\gam\src\modal\image.rs:44)
INFO:graphics_server: ClipObject size: 4124 (services\graphics-server\src\lib.rs:223)
INFO:graphics_server: ClipObject size: 4124 (services\graphics-server\src\lib.rs:223)
INFO:graphics_server: ClipObject size: 4124 (services\graphics-server\src\lib.rs:223)
INFO:gam::modal::image: bitmap height 312 : margin 8 (services\gam\src\modal\image.rs:35)
INFO:gam::modal::image: drawing bitmap (services\gam\src\modal\image.rs:44)
...[snip]...
INFO:shellchat::cmds::net_cmd: done in 18482 ms (services\shellchat\src\cmds\net_cmd.rs:383)

The decode took 10 seconds; the conversion to bitmap took 492 seconds (about 8.2 minutes); the modal display was...less than 18 seconds, but it includes the time it took for me to notice it finished and dismiss the modal, so it could be much much faster than that.

Anyways, the long pole in the tent is the conversion to the bitmap, stuff behind this line of code:

let bm = gam::Bitmap::from_png(&mut png, Some(modal_size));

No big worries, I'm quite pleased with where this is all going tbh. Performance tuning is much easier imho than space tuning, I'm betting there's probably just an issue where data is being sent between processes or something and a little shuffling of abstractions will clear it all up. I'll have a look later, for now, just getting the net stack working to the point where I could decode the images reliably was surprisingly hard.

@nworbnhoj
Copy link
Contributor Author

Actually, I was on the wrong branch. This is what it looks like with your branch.

Yep that looks like the correct branch. Firstly the image is aligned correctly in the modal, and secondly there is some distortion in the image in a couple of places (behind bunny and above bunnies back) that is some problem with my png decode implementation. Have to chase that down at some point with fresh eyes.

The performance is pretty slow -- takes maybe...3-5 minutes? to decode the PNG.

Yep that is pretty slow!

But the memory usage seems pretty spot on: heap didn't go over 128k, which is really good. It's the usual trade-off tho, isn't it? speed versus space. :-/

Yep - that is the trade-off - but we now have functional image processing machinery that fits in the Precursor footprint. There are some easy blunt tools available for tuning (IDAT_BUFFER_LENGTH INFLATED_BUFFER_LENGTH) and undoubtedly some code optimisations.

The decode took 10 seconds; the conversion to bitmap took 492 seconds (about 8.2 minutes); the modal display was...less than 18 seconds,

I need to drill into these timings in much more detail. Where exactly did you place these checkpoints? And, how did you go about inserting the timer? (to save me reinventing the wheel)

getting the net stack working to the point where I could decode the images reliably was surprisingly hard.

Well done on the net-stack. The net-stack issues were going to come to light sooner or later - so it is undoubtedly better to have the hard work done sooner, and have it largely (hopefully) behind us.

@nworbnhoj
Copy link
Contributor Author

I looked at the log above a little harder and I think I can make some guesses

INFO:shellchat::cmds::net_cmd: attr: content-type,  image/png
 (services\shellchat\src\cmds\net_cmd.rs:351)
INFO:shellchat::cmds::net_cmd: found end of header after 10 lines. 

At this point we have consumed the entire Tcp header and know to expect a png.

(services\shellchat\src\cmds\net_cmd.rs:336)
INFO:shellchat::cmds::net_cmd: heap size: 65536 (services\shellchat\src\cmds\net_cmd.rs:364)
INFO:gam::bitmap::decode_png: DecodePng ready: size(313,245) bit-depth=8 color_type=2 bytes_per_line=940 (services\gam\src\bitmap\decode_png.rs:126)
INFO:shellchat::cmds::net_cmd: png decoded in 10371ms 

At this point we have read only the first IHDR chunk of the png and it has provided us with some basic info regards the png data that will follow in the IDAT chunks

INFO:shellchat::cmds::net_cmd: bitmap created (services\shellchat\src\cmds\net_cmd.rs:373)
INFO:gam::bitmap: rotate image (services\gam\src\bitmap.rs:183)
INFO:shellchat::cmds::net_cmd: converted to bitmap in 492825ms (services\shellchat\src\cmds\net_cmd.rs:376)
INFO:shellchat::cmds::net_cmd: heap size: 131072 (services\shellchat\src\cmds\net_cmd.rs:378)

At this point we have consumed all of the png data from the IDAT chunks AND completed the entire conversion to gam::Bitmap pixel by pixel.

I think I can conclude from this that the choke-point is actually in the process of reading bytes from the network. This is because it has taken 10sec to read the TCP header and the PNG IHDR chuck - which only amounts to well less than 1k bytes.

So I suspect reading one u8 at a time from the TcpStream is the culprit. And, this little comment in the code might hold the solution:

// use std::io::BufRead; // not implemented yet!

@bunnie
Copy link
Member

bunnie commented Aug 6, 2022

Huh interesting! Here's the code used to generate the log:

https:/bunnie/xous-core/blob/d0f99ea44d862f83aa6368d9050e5298f2d9533d/services/shellchat/src/cmds/net_cmd.rs#L363-L384

I didn't realize it was just streaming down byte-by-byte. Yes, that would take a very long time to download.

I think BufRead might work, now that the underlying TcpStream works better, but I don't actually have a test case to check it on hand.

@bunnie
Copy link
Member

bunnie commented Aug 6, 2022

Yep, confirmed BufReader works now! yay.

@nworbnhoj
Copy link
Contributor Author

The change to BufReader made <10% improvement in hosted mode.

@bunnie
Copy link
Member

bunnie commented Aug 7, 2022

I pulled in your latest BufReader changes, and the performance is much better now:

INFO:shellchat::cmds::net_cmd: png decoded in 621ms (services\shellchat\src\cmds\net_cmd.rs:365)
INFO:shellchat::cmds::net_cmd: bitmap created (services\shellchat\src\cmds\net_cmd.rs:371)
INFO:gam::bitmap: rotate image (services\gam\src\bitmap.rs:183)
INFO:shellchat::cmds::net_cmd: converted to bitmap in 25666ms (services\shellchat\src\cmds\net_cmd.rs:374)

  • Decode time goes from 10371 ms -> 621 ms (94% improvement)
  • Render time goes from 492825ms -> 25666ms (95% improvement)

That's a nice victory lap for fixing BufReader. 🎉

I'd say the decoder is fully usable now. The rendering needs a little more tweaking but a 19.2x speed improvement is already huge.

The checksum calc is inner-loop so removed for performance gain.

If a hacker can manipulate a byte then then they can manipulate a RGB
byte or a checksum byte - so the checksum doesn't really give any
protection from hacking afaik. I presume the checksum is in there to
detect corruption during storage or transmission - but if a couple of
bytes are corrupted you will not notice the change in the image - and
if a lot of bytes are corrupted then the distortion in the image will be
obvious. This is even more the case in Precursor, where we are shrinking
and dithering the image as a matter of course. So I don't really see the
point of the checkum
png filter specification calls for initial prior_line = [0u8]
see: https://www.w3.org/TR/PNG/#9Filters
but [0u8] results in distortion ???????
so use [125u8]
@nworbnhoj
Copy link
Contributor Author

I ran some quick tests in hosted mode:

  • The time to convert local bunny.png file to gam::bitmap was around 100ms
  • The time to convert tcp bunny.png to gam::bitmap was 500-800ms
  • The time to get tcp bunny.png then read and drop each byte was 600-900ms
    My internet connection is pretty variable but the tcp step is taking much longer than the png-bitmap conversion in hosted mode at least.

@bunnie
Copy link
Member

bunnie commented Aug 9, 2022

I think those benchmarks in hosted mode are in agreement with what would happen on hardware -- my guess is the PNG decoder is fine, what's broken is the TCP performance. Sussing out that performance issue is for another day, though...

@bunnie bunnie marked this pull request as ready for review August 10, 2022 11:00
@bunnie bunnie merged commit 5e1145a into betrusted-io:main Aug 10, 2022
@nworbnhoj nworbnhoj deleted the iter branch August 10, 2022 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants