Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UntilPageHeaderReader may capture a position where it's not actually a beginning of a page #15

Open
TonalidadeHidrica opened this issue Nov 15, 2020 · 6 comments

Comments

@TonalidadeHidrica
Copy link
Contributor

Although extremely rare, the OggS magic pattern may occur in the places where a page does not start. Current UntilPageHeaderReader does not check for the integrity it captured, so it may capture a wrong page span and then return false HashMismatch error.

I have an idea of fixing this issue and currently working on it. It will require a large refactoring, and possibly cause breaking changes.

@est31
Copy link
Member

est31 commented Nov 15, 2020

Do you have any real world data that contains OggS? Any real world use case?

Generally the point of a magic is that it's, well, magic and the sequence that allows you to identify the start of a stream.

@est31
Copy link
Member

est31 commented Nov 15, 2020

Is your proposal to silently ignore hash mismatches and continue the search? What if there is a legitimate hash mismatch then? I think it'd be better to error in that instance than to silently omit a packet and mess up higher layers :).

@TonalidadeHidrica
Copy link
Contributor Author

One of the most simple example is when the comment header contains a text "OggS". I agree that finding the pattern in audio packets is rare, probably almost never happen tho.

@TonalidadeHidrica
Copy link
Contributor Author

My proposal in my brain is to silently skip it. If the skipped packet was within a valid packet position but the page simply contains an error, then we can detect it for the next time we captured a valid packet, since we will see a jump in "page sequence no".

@TonalidadeHidrica
Copy link
Contributor Author

My concern is about seeking. When parsing pages sequentially, maybe we don't have to care about it so much. But when capturing a page right after seeking, because of the misdetection we may lose multiple pages covered by the first "false page". This may cause a problem when binary searching, and that's why I am worrying about it. Do you have any idea to handle them both? Or am I thinking too much?

@est31
Copy link
Member

est31 commented Nov 15, 2020

One of the most simple example is when the comment header contains a text "OggS".

Hmm that's a good point. During normal operation it's not an issue I guess, mainly during seeking.

Do you have any idea to handle them both? Or am I thinking too much?

Nono, I think you are onto a good solution here :). It would be great if for the search of a page header you could specify whether you want pages with hash mismatches to be discarded or not, so that seeking ignores invalid hashes and normal operation when pages strictly follow each other would yield errors at a hash mismatch. I think such a change would be an improvement. Not sure if checking the sequence number is needed. PRs welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants