allocate_vec returns uninitialised memory #139

kaedroho · 2017-05-09T11:06:15Z

https:/tantivy-search/tantivy/blob/0dad02791c74e572ec005abb1f75e3601bd25db5/src/common/mod.rs#L31-L41

This could lead to a number of problems if not used carefully but most rust developers would not expect this behaviour from a "safe" function.

I really think this should be replaced with something safer, or at least marked as unsafe

The text was updated successfully, but these errors were encountered:

fulmicoton · 2017-05-09T13:23:21Z

Interesting...

A bit of background there.

Rust's Vec big initialization is dead slow in debug mode. I started using the unsafe set_len to have unit test run in a manageable time.

This could lead to a number of problems if not used carefully but most rust developers would not expect this behaviour from a "safe" function.

This is unsafe because it is by definition "undefined behavior". In my experience, it can be on the contrary more dangerous to zero out vectors that you use for instance as a memory arena. In C++ it is one of the common cause for code to fail in optimized mode and seem to work perfectly (until when though?) in debug mode.

I also see how it can be a security issue (possibly exposing within the same process)...

@kaedroho and @BurntSushi, I am very interested if you could educate me on the subject. My opinion at the moment is really "In practise, I don't care." and "zero-initialization actually have hidden some bug from me in the past, I would have prefer things to fail fast".

BTW tantivy has a lot of much more unsafe code in its indexer, for performance reasons. If you have time for it, I'd be glad to discuss with you if you can help me figure out alternatives.

BurntSushi · 2017-05-09T13:44:12Z

@fulmicoton To clear the air: allocate_vec as linked in this issue is wrong. Its implementation either needs to be completely safe or it needs to be marked as an unsafe function. Leaving it as-is is just about the worst sin one can commit in Rust. (That is, marking something that's safe that isn't actually safe.)

Consider this example: https://is.gd/ipfyEu --- If you run it, you'll silently get nonsense data. This is because your main function contains undefined behavior, and the fact that there is no unsafe there is really bad. You should never have undefined behavior in safe code. If you do (and you do), it's a bug.

Rust's Vec big initialization is dead slow in debug mode. I started using the unsafe set_len to have unit test run in a manageable time.

You might consider turning on optimizations for running tests. It will increase compile times, of course.

Popping up a level, I will say this: I think justifying unsafe with debug mode improvements is not worth it, even if you made allocate_vec safe. I'd encourage you to either write your test to use smaller data, or to run that test with optimizations on.

BTW tantivy has a lot of much more unsafe code in its indexer, for performance reasons. If you have time for it, I'd be glad to discuss with you if you can help me figure out alternatives.

Can you point to the files? I see src/datastruct/stacker/heap.rs, and that appears to have some interesting uses of unsafe.

I will say these things also:

I try very hard to have an actual benchmark that justifies each use of unsafe.
I try very hard to document every use of unsafe with an argument of why it is safe. Example.

fulmicoton · 2017-05-09T14:20:15Z

Leaving it as-is is just about the worst sin one can commit in Rust.

Well... This is not a public API. Surely having it as a public API would be an ever greater sin. :)

@BurntSushi thanks for the explanation.

I'll see how I can improve that. I will -at the very least- mark the function as unsafe, and
reduce its usage to Vec<u8>.

I think justifying unsafe with debug mode improvements is not worth it, even if you made allocate_vec safe. I'd encourage you to either write your test to use smaller data, or to run that test with optimizations on.

I guess I am not very patient as a person.

src/datastruct/stacker/* is the most complicated one yes. I am in the middle of simplifying it but it will keep a lot of unsafe code for the memory arena.

The role of the memory arena is both about performance (bump allocation, scratch for unalloc) and functionality.
Each thread works "in memory" until a limit. In the first versions of tantivy, the user had to express the limit in term of "number of documents". Tantivy now allows to define a limit in number of MB of anonymous memory used. (This amount of memory is then split accross the indexing threads).

Another usage is in the BitPacker reader. There, there is 2 usages of unsafe that are probably
uncalled for.
https:/tantivy-search/tantivy/blob/master/src/common/bitpacker.rs

I use a *const u8 pointer to the buffer. If I recall correctly, I ended up doing that as a workaround over the "self borrowing struct problem".
I also use read a *const u64 to do the bit unpacking.

I have no elegant solution to the first part of the problem, and the solution to the second problem is probably somewhere in the byteorder crate.

I try very hard to have an actual benchmark that justifies each use of unsafe.
I try very hard to document every use of unsafe with an argument of why it is safe.

This sounds like great practise. I'll try to do the same but I don't think I'll ever reach your degree of discipline / QofCode.

kaedroho · 2017-05-09T15:56:20Z

If you want to go down a safer route, it might be possible to refactor all the places that use allocate_vec(size) to use vec![0u8; size] instead. This syntax benefits from an upcoming optimisation which might solve the performance issue in debug mode (rust-lang/rust#40409).

fulmicoton · 2017-05-10T00:28:35Z

@kaedroho @BurntSushi I found out very recently #136 two horrible design choices that were extremely expensive in term of memory (the ez to explain one, is that there is currently one hashmap per-field).

Once this is fixed, the minimal memory usage in the unit test should be much smaller, and it should be fine to do buffer allocation the "safe way".

@kaedroho Thanks for linking #40409, I didn't know a fix was on the way this is awesome.

BurntSushi · 2017-05-10T00:33:07Z

@fulmicoton Thanks for being receptive. :-) It's crucial that awesome projects like this don't abuse unsafe! :-)

Also. Cleaned up the code to make sure that the logic is only in one place. Removed allocate_vec Closes #141 Closes #139 Closes #142 Closes #138

fulmicoton added a commit that referenced this issue May 11, 2017

FIXED inconsistent Term's field serialization.

c4d78e8

Also. Cleaned up the code to make sure that the logic is only in one place. Removed allocate_vec Closes #141 Closes #139 Closes #142 Closes #138

fulmicoton added a commit that referenced this issue May 11, 2017

FIXED inconsistent Term's field serialization.

f60d506

Also. Cleaned up the code to make sure that the logic is only in one place. Removed allocate_vec Closes #141 Closes #139 Closes #142 Closes #138

fulmicoton mentioned this issue May 11, 2017

FIXED inconsistent Term's field serialization. #144

Merged

fulmicoton closed this as completed in #144 May 11, 2017

fulmicoton added a commit that referenced this issue May 11, 2017

FIXED inconsistent Term's field serialization.

477b913

Also. Cleaned up the code to make sure that the logic is only in one place. Removed allocate_vec Closes #141 Closes #139 Closes #142 Closes #138

fulmicoton added this to the 0.4.0 milestone Jul 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allocate_vec returns uninitialised memory #139

allocate_vec returns uninitialised memory #139

kaedroho commented May 9, 2017 •

edited

Loading

fulmicoton commented May 9, 2017

BurntSushi commented May 9, 2017

fulmicoton commented May 9, 2017

kaedroho commented May 9, 2017 •

edited

Loading

fulmicoton commented May 10, 2017 •

edited

Loading

BurntSushi commented May 10, 2017

allocate_vec returns uninitialised memory #139

allocate_vec returns uninitialised memory #139

Comments

kaedroho commented May 9, 2017 • edited Loading

fulmicoton commented May 9, 2017

BurntSushi commented May 9, 2017

fulmicoton commented May 9, 2017

kaedroho commented May 9, 2017 • edited Loading

fulmicoton commented May 10, 2017 • edited Loading

BurntSushi commented May 10, 2017

kaedroho commented May 9, 2017 •

edited

Loading

kaedroho commented May 9, 2017 •

edited

Loading

fulmicoton commented May 10, 2017 •

edited

Loading