Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate the copying of blobs when serving reads from the cache #10297

Closed
wants to merge 11 commits into from

Conversation

gangliao
Copy link
Contributor

@gangliao gangliao commented Jul 1, 2022

Summary:

The blob cache enables an optimization on the read path: when a blob is found in the cache, we can avoid copying it into the buffer provided by the application. Instead, we can simply transfer ownership of the cache handle to the target PinnableSlice. (Note: this relies on the Cleanable interface, which is implemented by PinnableSlice.)

This has the potential to save a lot of CPU, especially with large blob values.

This task is a part of #10156

Summary:

The blob cache enables an optimization on the read path: when a blob is found in the cache, we can avoid copying it into the buffer provided by the application. Instead, we can simply transfer ownership of the cache handle to the target `PinnableSlice`. (Note: this relies on the `Cleanable` interface, which is implemented by `PinnableSlice`.)

This has the potential to save a lot of CPU, especially with large blob values.
db/blob/blob_source.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

// application, we can simply transfer ownership of the cache handle to
// the target PinnableSlice. This has the potential to save a lot of
// CPU, especially with large blob values.
blob_entry.TransferTo(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there's a better way to do this. We could consider adding a PinnableSlice::PinSlice overload that takes a Slice and a CachableEntry; however, this would involve exposing CachableEntry (which is currently an internal/implementation class) in the public API. So we probably wouldn't want to do that.

What I'm thinking is using the existing PinSlice(const Slice& s, CleanupFunction f, void* arg1, void* arg2) overload (with ReleaseCacheHandle as the fn, and the cache and cache handle as args, like below; note that we're guaranteed to be dealing with a cached value here, so there is no need for an "own value" branch), and then make the CachableEntry relinquish ownership of the handle by calling blob_entry.TransferTo(nullptr);. What do you guys think? Bit of a hack I suppose (and we would definitely want to explain what's happening in a comment) but we wouldn't have to add a TransferTo overload to CachableEntry for just this specific use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, since there is no "own value" case here, we could also consider moving from CachableEntry to CacheHandleGuard. We would have to add a TransferTo(Cleanable*) method to CacheHandleGuard though b/c it doesn't have one rn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-related to this: we could also think about eliminating the copy that happens when we insert a blob into the cache (i.e. the string::assign that happens in BlobSource::PutBlobIntoCache). This would most likely involve a change to BlobFileReader's interface but that shouldn't be a big deal now that we always read blobs via the BlobSource layer. But this is definitely for later (could be a separate follow-up task)

Copy link
Contributor Author

@gangliao gangliao Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm thinking is using the existing PinSlice(const Slice& s, CleanupFunction f, void* arg1, void* arg2) overload (with ReleaseCacheHandle as the fn, and the cache and cache handle as args, like below;

I've thought about it this way before. But ReleaseCacheHandle is private. So we need to expose it, right?
Or, we plan to use a lambda function in here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can use a lambda; I think it's cleaner than taking a dependency on CachableEntry just for the purposes of reusing ReleaseCacheHandle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. When I first thought about this, the ReleaseCacheHandle being private was my concern. I am fine with either.

Copy link
Contributor Author

@gangliao gangliao Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also think about eliminating the copy that happens when we insert a blob into the cache (i.e. the string::assign that happens in BlobSource::PutBlobIntoCache). This would most likely involve a change to BlobFileReader's interface but that shouldn't be a big deal now that we always read blobs via the BlobSource layer. But this is definitely for later (could be a separate follow-up task)

If so, who will take ownership? Oh I see. That's really similar to eliminate the copying of blobs when serving reads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: an additional stretch task for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, who will take ownership? Oh I see. That's really similar to eliminate the copying of blobs when serving reads

We could look at e.g. how BlockContents or UncompressionDict are used for inspiration. BTW, as part of this additional task, we could also address this TODO: https:/facebook/rocksdb/blob/main/db/blob/blob_source.cc#L68-L69

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants