Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallVec<[T; N]> is often larger than it needs to be #219

Closed
thomcc opened this issue Apr 30, 2020 · 10 comments
Closed

SmallVec<[T; N]> is often larger than it needs to be #219

thomcc opened this issue Apr 30, 2020 · 10 comments

Comments

@thomcc
Copy link
Contributor

thomcc commented Apr 30, 2020

Currently a usize is used to store the size when inline, but ideally if N < 256 a u8 would be used for the size, and u16 for N < 65536, etc.

@l0calh05t
Copy link
Contributor

This would make sense if SmallVec couldn't grow beyond the bounds of the stack allocation (like arrayvec, for example), but it can and will allocate from the heap if necessary.

@thomcc
Copy link
Contributor Author

thomcc commented Apr 30, 2020

That is somewhat true with the current implementation, but it's certainly not true in general -- you'd just have to move from using a (e.g.) u8 to using a usize when you move from inline to heap allocated.

@thomcc
Copy link
Contributor Author

thomcc commented Apr 30, 2020

The main difference here I guess is with SmallVec<[u8; N]> where N is low. Other cases don't matter quite as much (although might be slightly larger than they would be otherwise), but IME this is a somewhat common case for people who use SmallVec<[u8; N]> as a kind of SmallString.

@l0calh05t
Copy link
Contributor

I still don't quite see how that would work. capacity is used to disambiguate between stack and heap storage and already doubles as the size when not spilled. It would really only work if your maximum capacity should be limited as well.

@thomcc
Copy link
Contributor Author

thomcc commented May 2, 2020

You wouldn't be able to use capacity for that. That said I don't think this would cause really any additional branching or anything -- you already have to branch on representation in each method.

@l0calh05t
Copy link
Contributor

But how would you disambiguate instead? Using an enum? Now you need additional storage for the enum's tag and still need to store two usizes for the heap case. And taking into account alignment, the following enum would take up a minimum of 32 bytes (on a machine with 8-byte pointers and usize):

enum Storage {
	Stack(u8, MaybeUninit<[u8; 4]>),
	Heap(*mut u8, usize, usize),
}

So you could only gain something if the array backing storage is larger than 3*size_of::<usize>() and has an alignment smaller than size_of::<usize>() . (Assuming size_of = align_of for primitives, which is true on most machines)

@moshg
Copy link

moshg commented May 7, 2020

Though I don't know if this can be implemented, we can disambiguate by most significant bit of the pointer, because pointers don't overflow isize.

@pickfire
Copy link

pickfire commented May 7, 2020

Could we do something like what https:/maciejhirsz/beef did?

@thomcc
Copy link
Contributor Author

thomcc commented May 7, 2020

Hmm, on further investigation a lot of this is actually fixed (in part, anyway) by the union feature, since as it is, we have a bunch of space that comes from the enum discriminant.

Or, it's still larger than it needs to be in some cases, but doesn't get better by doing what I suggested (I suspect if/when rustc gets more aggressive niche-filling optimizations that might change, but obviously this could be revisited if that ever happens and it makes a difference).

That said, I'm still not sure why this uses the size to differentiate than storing everything in the enum, which I think would eliminate the extra size overhead on stable (or am I mistaken again...), but I guess the preference (mentioned in #183) is to wait for unions to stabilize.

@thomcc
Copy link
Contributor Author

thomcc commented May 7, 2020

Regarding beef, a quick look says that it's doing things which limit size/capacity (on the lean types) to u32, which is a tradeoff that might be undesirable. (I mean there probably exists some user of smallvec who cares about this, and it's not something I'd be willing to argue should change...)

There might be more too -- it mentions it only being 64-bit compatible and there are a lot of tricks you can do by taking advantage of the somewhat peculiar way that the address space is laid out on x86_64 and aarch64 ... I'm not sure this sort of thing is really worth the hassle for smallvec though. My gut would be to say "no not really".


Anyway I'm just going to close it since I filed it due to a misunderstanding. I still wish SmallVec was, err, smaller though >_>

@thomcc thomcc closed this as completed May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants