Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImoHash.SumFile returns "Seek: invalid offset" error if called on a file with a size that is greater than sampleThreshold but less than sampleSize #15

Closed
Hakkin opened this issue Aug 20, 2024 · 1 comment · Fixed by #16

Comments

@Hakkin
Copy link

Hakkin commented Aug 20, 2024

See schollz/croc#786 (comment)

The offending code is here:

imohash/imohash.go

Lines 107 to 131 in 3f885db

if f.Size() < int64(imo.sampleThreshold) || imo.sampleSize < 1 {
if _, err := io.Copy(imo.hasher, f); err != nil {
return emptyArray, err
}
} else {
buffer := make([]byte, imo.sampleSize)
if _, err := f.Read(buffer); err != nil {
return emptyArray, err
}
imo.hasher.Write(buffer) // these Writes never fail
if _, err := f.Seek(f.Size()/2, 0); err != nil {
return emptyArray, err
}
if _, err := f.Read(buffer); err != nil {
return emptyArray, err
}
imo.hasher.Write(buffer)
if _, err := f.Seek(int64(-imo.sampleSize), 2); err != nil {
return emptyArray, err
}
if _, err := f.Read(buffer); err != nil {
return emptyArray, err
}
imo.hasher.Write(buffer)
}

imohash tries to seek -imo.sampleSize from the end of the reader, but does not verify imo.sampleSize <= f.Size() before doing so, leading to Seek returning an error.

Here's a playground link demonstrating this with Sum (panics rather than returning an error): https://go.dev/play/p/wKCwBAo6nsZ

Note that this situation (seemingly?) isn't possible with the default SampleSize and SampleThreshold, since the default SampleThreshold>SampleSize. It's only an issue if you set custom values and SampleSize>SampleThreshold. I don't know the intricacies of the algorithm, so I'm not sure if it's even correct or makes sense to set the values like this.

@kalafut
Copy link
Owner

kalafut commented Aug 21, 2024

This is a good catch and not a particularly obscure edge case. Let me think about the correct way to handle it. Note: I'm traveling, and a PR might take a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants