Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow for large archives #61

Open
AntoniosBarotsis opened this issue Sep 9, 2024 · 7 comments
Open

Extremely slow for large archives #61

AntoniosBarotsis opened this issue Sep 9, 2024 · 7 comments

Comments

@AntoniosBarotsis
Copy link

As the title mentions, when I try to decompress a large archive (~11gb, 800k+ files according to win file explorer) it takes a lot of time, after letting it run for just under an hour, I cancelled it. In contrast, the unrar binary that comes with an installation of WinRAR took 12 minutes for the same archive. Considering this crate is a wrapper, I should be able to get nearly identical performance in both cases. Note that this was ran in release as well as the following profile:

[profile.release]
lto = true
codegen-units = 1
opt-level = 3
strip = true

It is very possible that the code I used is wildly suboptimal, I can't tell.

This is coming from ouch-org/ouch#714 after I did my own testing and arrived at the numbers I mentioned above. You can find the code here. It reads in a test.rar archive and extracts all files to a test directory while keeping track of the amount of files extracted.

@muja
Copy link
Owner

muja commented Sep 9, 2024

I've tried with the linux codebase, it's ~2 GiB (528MiB packed) with ~100k files. Extracting that rar file with the library takes over a minute for me. Extracting with rar x takes 6 seconds. I've tried different methods, both the basic_extract example and the more low-level unrar_sys/examples/lister.rs which does nothing but call the FFI. There were multiple suspects on my list but I've ruled them all out, even when not extracting at all and just calling test on every single entry it takes 60 seconds.

This is obviously unacceptable but also impossible to fix from my side. Maybe there is a regression with a recent version, so my next approach would be to checkout older versions and see if they yield the same results.

After that, not much I can do except mail the DLL authors or RIIR^TM, but this is just a fun side project which hasn't even any use for me anymore, so investing too much time is not really an option.

I'll report back

@justbispo
Copy link

Maybe there is a regression with a recent version

The first version of ouch that used this library was v0.5.0, which uses v0.5.2 of unrar.rs. I've tried this version of ouch and the same slow decompression speed exists, so if there was a regression, it wasn't recent.

@muja
Copy link
Owner

muja commented Sep 10, 2024

I found out why unrar x is so fast: it uses multithreading (11 threads). Unfortunately the library does not provide any parameters for multithreaded extraction, so one must open multiple Archive objects in parallel and split the work among threads. With this very naive implementation I was able to bring the time for the linux.rar from 60+ seconds down to 10-15 seconds, almost on par with unrar x:

const NUM_THREADS: u32 = 16;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let args = std::env::args();
    let file = args.skip(1).next().unwrap_or("archive.rar".to_owned());
    let mut handles = Vec::with_capacity(NUM_THREADS as usize);
    for i in 0..NUM_THREADS {
        let file = file.clone();
        let handle = std::thread::spawn(move || {
            let mut archive = unrar::Archive::new(&file).open_for_processing()?;
            while let Some(header) = archive.read_header()? {
                if header.entry().file_crc % NUM_THREADS == i {
                    archive = header.extract()?;
                } else {
                    archive = header.skip()?;
                }
            }
            anyhow::Ok(())
        });
        handles.push(handle);
    }
    for handle in handles {
        handle.join().unwrap()?;
    }
    Ok(())
}

This would have to be done at the application level. It's a bit harder to design a concept for this in the library.

@AntoniosBarotsis
Copy link
Author

Nice!

I'm confused with the fact that the library doesn't allow for multithreading though, doesn't the unrar binary use that library? How does that do multithreading?

@muja
Copy link
Owner

muja commented Sep 11, 2024

Nice!

I'm confused with the fact that the library doesn't allow for multithreading though, doesn't the unrar binary use that library? How does that do multithreading?

The binary doesn't use the (extern C) DLL functions that we have to use, it directly interacts with the C++ objects so I'm assuming it can do more there, even though I haven't looked at exactly how it achieves multithreading.

@muja
Copy link
Owner

muja commented Oct 7, 2024

@AntoniosBarotsis is there anything you feel has to be done on the library side? Otherwise we can close this, right?

@AntoniosBarotsis
Copy link
Author

I haven't had the time to properly look into this but I guess not. Though keeping it open for anyone that stumbles on the same issue is also an option, up to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants