SmallVec<[T; N]> is often larger than it needs to be #219

thomcc · 2020-04-30T20:02:26Z

Currently a usize is used to store the size when inline, but ideally if N < 256 a u8 would be used for the size, and u16 for N < 65536, etc.

The text was updated successfully, but these errors were encountered:

l0calh05t · 2020-04-30T21:07:37Z

This would make sense if SmallVec couldn't grow beyond the bounds of the stack allocation (like arrayvec, for example), but it can and will allocate from the heap if necessary.

thomcc · 2020-04-30T21:09:36Z

That is somewhat true with the current implementation, but it's certainly not true in general -- you'd just have to move from using a (e.g.) u8 to using a usize when you move from inline to heap allocated.

thomcc · 2020-04-30T21:50:36Z

The main difference here I guess is with SmallVec<[u8; N]> where N is low. Other cases don't matter quite as much (although might be slightly larger than they would be otherwise), but IME this is a somewhat common case for people who use SmallVec<[u8; N]> as a kind of SmallString.

l0calh05t · 2020-05-02T16:27:53Z

I still don't quite see how that would work. capacity is used to disambiguate between stack and heap storage and already doubles as the size when not spilled. It would really only work if your maximum capacity should be limited as well.

thomcc · 2020-05-02T17:28:11Z

You wouldn't be able to use capacity for that. That said I don't think this would cause really any additional branching or anything -- you already have to branch on representation in each method.

l0calh05t · 2020-05-02T18:15:36Z

But how would you disambiguate instead? Using an enum? Now you need additional storage for the enum's tag and still need to store two usizes for the heap case. And taking into account alignment, the following enum would take up a minimum of 32 bytes (on a machine with 8-byte pointers and usize):

enum Storage {
	Stack(u8, MaybeUninit<[u8; 4]>),
	Heap(*mut u8, usize, usize),
}

So you could only gain something if the array backing storage is larger than 3*size_of::<usize>() and has an alignment smaller than size_of::<usize>() . (Assuming size_of = align_of for primitives, which is true on most machines)

moshg · 2020-05-07T07:04:07Z

Though I don't know if this can be implemented, we can disambiguate by most significant bit of the pointer, because pointers don't overflow isize.

pickfire · 2020-05-07T10:27:31Z

Could we do something like what https:/maciejhirsz/beef did?

thomcc · 2020-05-07T12:45:35Z

Hmm, on further investigation a lot of this is actually fixed (in part, anyway) by the union feature, since as it is, we have a bunch of space that comes from the enum discriminant.

Or, it's still larger than it needs to be in some cases, but doesn't get better by doing what I suggested (I suspect if/when rustc gets more aggressive niche-filling optimizations that might change, but obviously this could be revisited if that ever happens and it makes a difference).

That said, I'm still not sure why this uses the size to differentiate than storing everything in the enum, which I think would eliminate the extra size overhead on stable (or am I mistaken again...), but I guess the preference (mentioned in #183) is to wait for unions to stabilize.

thomcc · 2020-05-07T12:51:44Z

Regarding beef, a quick look says that it's doing things which limit size/capacity (on the lean types) to u32, which is a tradeoff that might be undesirable. (I mean there probably exists some user of smallvec who cares about this, and it's not something I'd be willing to argue should change...)

There might be more too -- it mentions it only being 64-bit compatible and there are a lot of tricks you can do by taking advantage of the somewhat peculiar way that the address space is laid out on x86_64 and aarch64 ... I'm not sure this sort of thing is really worth the hassle for smallvec though. My gut would be to say "no not really".

Anyway I'm just going to close it since I filed it due to a misunderstanding. I still wish SmallVec was, err, smaller though >_>

thomcc closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmallVec<[T; N]> is often larger than it needs to be #219

SmallVec<[T; N]> is often larger than it needs to be #219

thomcc commented Apr 30, 2020 •

edited

Loading

l0calh05t commented Apr 30, 2020

thomcc commented Apr 30, 2020

thomcc commented Apr 30, 2020

l0calh05t commented May 2, 2020

thomcc commented May 2, 2020

l0calh05t commented May 2, 2020

moshg commented May 7, 2020 •

edited

Loading

pickfire commented May 7, 2020

thomcc commented May 7, 2020 •

edited

Loading

thomcc commented May 7, 2020

SmallVec<[T; N]> is often larger than it needs to be #219

SmallVec<[T; N]> is often larger than it needs to be #219

Comments

thomcc commented Apr 30, 2020 • edited Loading

l0calh05t commented Apr 30, 2020

thomcc commented Apr 30, 2020

thomcc commented Apr 30, 2020

l0calh05t commented May 2, 2020

thomcc commented May 2, 2020

l0calh05t commented May 2, 2020

moshg commented May 7, 2020 • edited Loading

pickfire commented May 7, 2020

thomcc commented May 7, 2020 • edited Loading

thomcc commented May 7, 2020

thomcc commented Apr 30, 2020 •

edited

Loading

moshg commented May 7, 2020 •

edited

Loading

thomcc commented May 7, 2020 •

edited

Loading