Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment Replication - Support realtime reads with GET requests #8536

Closed
Tracked by #6761
mch2 opened this issue Jul 7, 2023 · 3 comments · Fixed by #9212
Closed
Tracked by #6761

Segment Replication - Support realtime reads with GET requests #8536

mch2 opened this issue Jul 7, 2023 · 3 comments · Fixed by #9212
Assignees
Labels
bug Something isn't working discuss Issues intended to help drive brainstorming and decision making distributed framework feedback needed Issue or PR needs feedback Indexing:Replication Issues and PRs related to core replication framework eg segrep v2.10.0

Comments

@mch2
Copy link
Member

mch2 commented Jul 7, 2023

Describe the bug
Today document replication provides the option for "realtime" reads with the GET API that reads and parses source from the translog to serve strong reads. This same option should be supported while using segment replication because each shard copy still has the ability to read the translog, however today the NRTReplicationEngine only reads the index when serving GET requests.

Expected behavior
GET requests "realtime" option to be supported with segment replicaiton.

@mch2
Copy link
Member Author

mch2 commented Jul 9, 2023

Took a closer look at this - WIP mch2@db72e96

To support this on replicas we would need to use a versionMap similar to InternalEngine that tracks a live view of doc uuid to translog location.
There are two issues with this on NRT replicas.

  1. LiveVersionMap is built to clear after every refresh because with docrep all refreshed on docs are now searchable in the index. If the memory footprint of the map gets too large a local refresh will run that clears the map. We can't do this with SR because we only refresh when a new set of external segments arrives. This is similar to why WAIT_UNTIL is not currently supported as a RefreshPolicy during ingest, because refresh listeners will pile up on each shard where no local action can be taken.
  2. The map is completely wiped after every refresh and a new one created. With SR we can't assume all docs in our xlog have arrived with the latest set of segments and would need to preserve up to a certain translog loc/seqNo.

An alternative here is to route any Get req where "realtime" is specified to true to the primary shard in the group. The problem with this is by default all Get requests have realtime=true enabled, so if we do this all Get reqs would only route to primary shards and can be problematic under heavy load. I think providing this option is a better solution though for sensitive requests and if users are performing a high volume of GET requests then document replication would be a better option.

@mch2 mch2 added discuss Issues intended to help drive brainstorming and decision making feedback needed Issue or PR needs feedback and removed untriaged labels Jul 10, 2023
@mch2
Copy link
Member Author

mch2 commented Jul 11, 2023

We will also need to extend this support to MultiGet

@Poojita-Raj
Copy link
Contributor

Poojita-Raj commented Jul 27, 2023

Took a closer look at this - WIP mch2@db72e96

To support this on replicas we would need to use a versionMap similar to InternalEngine that tracks a live view of doc uuid to translog location. There are two issues with this on NRT replicas.

1. LiveVersionMap is built to clear after every refresh because with docrep all refreshed on docs are now searchable in the index.  If the memory footprint of the map gets too large a local refresh will run that clears the map.  We can't do this with SR because we only refresh when a new set of external segments arrives.  This is similar to why WAIT_UNTIL is not currently supported as a RefreshPolicy during ingest, because refresh listeners will pile up on each shard where no local action can be taken.

2. The map is completely wiped after every refresh and a new one created.  With SR we can't assume all docs in our xlog have arrived with the latest set of segments and would need to preserve up to a certain translog loc/seqNo.

An alternative here is to route any Get req where "realtime" is specified to true to the primary shard in the group. The problem with this is by default all Get requests have realtime=true enabled, so if we do this all Get reqs would only route to primary shards and can be problematic under heavy load. I think providing this option is a better solution though for sensitive requests and if users are performing a high volume of GET requests then document replication would be a better option.

The alternative scenario is:
(1) when GET request realtime=true, we route requests to primary shards that will use Internal Engine's existing mechanism to read from translog
and (2) when GET request realtime=false, we continue with existing behavior where it might route to NRTReplicationEngine in which case, it will read from index and not xlog

When realtime=true, users can also set the preference parameter for GET requests to "_local" or specific shards. In that case, we give priority to the user specified preference for the shard when we decide where the requests should be routed to.

https://opensearch.org/docs/1.3/api-reference/document-apis/get-documents/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discuss Issues intended to help drive brainstorming and decision making distributed framework feedback needed Issue or PR needs feedback Indexing:Replication Issues and PRs related to core replication framework eg segrep v2.10.0
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants