Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Add searchable snapshots topic. #63040

Merged
merged 22 commits into from
Oct 22, 2020

Conversation

debadair
Copy link
Contributor

No description provided.

@debadair debadair added >docs General docs changes :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.10.0 labels Sep 30, 2020
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of comments, I am in doubt about how we want to frame searchable snapshots in docs so that we make it clear that only the full copy version is available now, but can add the other variant in the future without too much confusion. I would be inclined to not introduce the fully remote variant in docs now.

docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
@debadair
Copy link
Contributor Author

debadair commented Sep 30, 2020

I added a couple of comments, I am in doubt about how we want to frame searchable snapshots in doc so that we make it clear that only the full copy version is available now, but can add the other variant in the future without too much confusion.

This initial version is something of an exploration of how we want to talk about searchable snapshots. I included the "fully-remote" option in this draft to make sure the descriptions we use now leave room for talking about it later. It introduces a number of new concepts/terms:

  • searchable snapshot - a snapshot of an index or data stream? that resides in a remote data store such as S3 and can be accessed dynamically at query time.

  • backup snapshot - a snapshot of a cluster, data stream, or index that resides in a remote data store such as S3 and is used for backup and recovery.

  • snapshot - captures the state of a cluster, index, or data stream at a particular point in time. See backup snapshot and searchable snapshot. (Currently defined as: A backup taken from a running Elasticsearch cluster. A snapshot can include backups of an entire cluster or only data streams and indices you specify.)

  • snapshot-backed index - a read-only index that relies on a searchable snapshot for redundancy.

  • fully-remote storage - a searchable snapshot that has no corresponding index in the cluster. Data is always loaded dynamically to to process asynchronous searches.

We might be able to come up with a better name than "fully-remote storage". I'll add the other terms to the glossary & include them in this PR so we can fine-tune the definitions.

@tlrx
Copy link
Member

tlrx commented Oct 5, 2020

We might be able to come up with a better name than "fully-remote storage". I'll add the other terms to the glossary & include them in this PR so we can fine-tune the definitions.

I agree with Henning, this "fully-remote storage" is not implemented. By mentioning it I'm afraid that it would confuse users or triggers questions that we don't have the answers yet. I thinks it should be removed from the glossary and any doc.

@andreidan andreidan added v7.11.0 and removed v7.10.0 labels Oct 7, 2020
@debadair
Copy link
Contributor Author

@elasticmachine retest this please

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Oct 16, 2020

I've expanded the conceptual docs a bit, see d4292ec. I think they cover everything I wanted to say, but please check for gaps.

I have avoided the "snapshot-backed index" terminology in favour of just calling them "searchable snapshots" or "searchable snapshot indices". I realise this is not so technically correct and we could go back on that and make a clearer distinction between the index and the snapshot behind it if we'd prefer.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveCTurner. I left a few comments, in particular on the terminology. Otherwise looking good.

docs/reference/glossary.asciidoc Outdated Show resolved Hide resolved

{search-snaps-cap} let you reduce your operating costs by treating the snapshot
as the authoritative copy of some of your indices. The high reliability of the
snapshot repository removes the need to keep multiple copies of their data in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

their seems wrong, did you mean the?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant their (as in "belonging to the indices") but it's awkward. Reworded in 29402a5


A searchable snapshot can be searched just like any other index.
{search-snaps-cap} are often used to access a large archive of historical data,
for which searches may sometimes be complex and time-consuming.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we are here framing searchable snapshots as being slow, which is not necessarily the case. I think I get that the time span of the search will be large and this adds to query time, hence async search is necessary, but I would like to move that to the meat of the section more than this introducing paragraph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying not to, but yes it still suggests slowness. I moved this to a TIP in 8beb64b and replaced this sentence with one indicating that performance should be similar to a regular index.

other index. You can, for instance, use <<shard-allocation-filtering>> to
restrict these shards to a subset of your nodes.

Normally you will use {search-snaps-cap} via the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"use" implies "search" in my head, I would prefer to say "create" or "mount" here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to talk about something more general than just creating them -- after all ILM does also take care of aliases which lets you search them too. How about manage? 90a2184


If a node fails while holding some zero-replica searchable snapshot then there
will be a brief window of time before {es} allocates these shards elsewhere.
During this window of time the cluster health will be `red` and searches that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the cluster will currently only be yellow unless we worked on this recently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes now that the recovery source is always "snapshot" this should be the case. Hedging my bets in c43841d by saying "not green".

@@ -450,6 +456,12 @@ in the <<glossary-mapping,mapping>>.
// end::routing-def[]
--

[[glossary-searchable-snapshot]] searchable snapshot ::
// tag::searchable-snapshot-def[]
A <<glossary-snapshot, snapshot>> of an index or data stream that resides in a remote data store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the "searchable snapshot" object is the index or the snapshot? We talked about "snapshot-backed index", which is clearly the index. I think of it as the index that makes the snapshot searchable.

We could also define "searchable snapshot index/indices" here instead.

I find the current definition here slightly confusing in that any snapshot can be made searchable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworked the glossary entries in c026c9f.

@jrodewig
Copy link
Contributor

@elasticmachine run elasticsearch-ci/docs

@DaveCTurner DaveCTurner marked this pull request as ready for review October 20, 2020 14:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Oct 20, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@elasticmachine elasticmachine added the Team:Docs Meta label for docs team label Oct 20, 2020
@DaveCTurner
Copy link
Contributor

I think this is ready for a full review now; @debadair your input would be useful but you opened the PR so I can't request a review from you.

The preview is up and running at https://elasticsearch_63040.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/searchable-snapshots.html

Copy link
Contributor Author

@debadair debadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still some redundancy with the last section that could be cleaned up later.

docs/reference/glossary.asciidoc Outdated Show resolved Hide resolved
@@ -450,6 +450,22 @@ in the <<glossary-mapping,mapping>>.
// end::routing-def[]
--

[[glossary-searchable-snapshot]] searchable snapshot ::
// tag::searchable-snapshot-def[]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're still not quite there with these definitions. If a "searchable snapshot" is an index mounted from snapshot, do we even need the notion of the searchable snapshot index? As written, these definitions don't make the distinction between them clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like us to be able to distinguish the index-in-the-snapshot from the index-in-the-cluster. In principle we are searching the index-in-the-snapshot, hence "searchable snapshot", and we implement this today by creating a corresponding index-in-the-cluster. I think the distinction is important since we may in future support searches directly against snapshots too. I've changed the wording slightly: "index in a snapshot" -> "snapshot of an index" -- does that help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of it more like "searchable snapshot" is the concept, whereas a "searchable snapshot index" is a concrete index backed by a searchable snapshot. I.e., there is no object anywhere that is a "searchable snapshot", since all snapshots can be made searchable (through a "searchable snapshot index"). But I am also ok with the current text.

do we even need the notion of the searchable snapshot index

I think referring to an index as just a "searchable snapshot" is unintuitive, since it is an index, not a snapshot.

Comment on lines 455 to 457
An index in a <<glossary-snapshot, snapshot>> that is mounted as a
<<glossary-searchable-snapshot-index, searchable snapshot index>> and can be
searched as if it were a regular index.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An index in a <<glossary-snapshot, snapshot>> that is mounted as a
<<glossary-searchable-snapshot-index, searchable snapshot index>> and can be
searched as if it were a regular index.
A read-only index mounted from a <<glossary-snapshot, snapshot>> that can be searched like any other index. Searchable snapshots do not need
<<glossary-replica-shard,replica shards>> for resilience, since their data is
reliably stored in the snapshot repository.

Comment on lines +460 to +467
[[glossary-searchable-snapshot-index]] searchable snapshot index ::
// tag::searchable-snapshot-index-def[]
An <<glossary-index, index>> whose data is stored in a <<glossary-snapshot,
snapshot>> that resides in a separate <<glossary-snapshot-repository,snapshot
repository>> such as AWS S3. Searchable snapshot indices do not need
<<glossary-replica-shard,replica>> shards for resilience, since their data is
reliably stored outside the cluster.
// end::searchable-snapshot-index-def[]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[glossary-searchable-snapshot-index]] searchable snapshot index ::
// tag::searchable-snapshot-index-def[]
An <<glossary-index, index>> whose data is stored in a <<glossary-snapshot,
snapshot>> that resides in a separate <<glossary-snapshot-repository,snapshot
repository>> such as AWS S3. Searchable snapshot indices do not need
<<glossary-replica-shard,replica>> shards for resilience, since their data is
reliably stored outside the cluster.
// end::searchable-snapshot-index-def[]

Comment on lines 66 to 69
We recommend that you <<indices-forcemerge, force-merge>> indices to a single
segment per shard before mounting them as {search-snaps}. Each read from a
snapshot repository takes time and costs money, and the fewer segments there
are the fewer reads are needed to restore the snapshot.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up

Suggested change
We recommend that you <<indices-forcemerge, force-merge>> indices to a single
segment per shard before mounting them as {search-snaps}. Each read from a
snapshot repository takes time and costs money, and the fewer segments there
are the fewer reads are needed to restore the snapshot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather this was down here. It's not very important to force-merge things before mounting them, and if you're mounting an existing snapshot you basically have no choice since you can't do anything about the segment count without restoring each index, merging it and re-snapshotting it.

docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
Comment on lines 82 to 87
{search-snaps-cap} are ideal for managing a large archive of historical data.
Historical information is typically searched less frequently than recent data
and therefore may not need replicas for their performance benefits.

You can use <<async-search>> with {search-snaps}, which is especially useful
for more complex or time-consuming searches.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{search-snaps-cap} are ideal for managing a large archive of historical data.
Historical information is typically searched less frequently than recent data
and therefore may not need replicas for their performance benefits.
You can use <<async-search>> with {search-snaps}, which is especially useful
for more complex or time-consuming searches.
{search-snaps-cap} are ideal for managing large archives of historical data.
Historical information is typically searched less frequently than recent data
and performance is less important.
For more complex or time-consuming searches, you can use <<async-search>> with {search-snaps}.

Copy link
Contributor

@DaveCTurner DaveCTurner Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep this wording as-is: "performance is less important" suggests to me that there's a general performance penalty for using searchable snapshots -- in fact the only drawback most of the time is the lack of replicas.

I'll apply the change to the wording re. async searches separately.

docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a number of mostly minor comments.

Comment on lines 29 to 32
You can control the allocation of the shards of {search-snap} indices using the
same mechanisms as for regular indices. For example, you could use
<<shard-allocation-filtering>> to restrict {search-snap} shards to a subset of
your nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should go below the ILM section in the interest of explaining the "easy/normal" option first and then the more advanced option afterwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent -- I put it here since we're starting off by talking about how these indices are mostly maniuplated (searched & allocated) as if they were normal indices, but I've moved it in 990707b.

docs/reference/searchable-snapshots/index.asciidoc Outdated Show resolved Hide resolved
[[using-searchable-snapshots]]
=== Using {search-snaps}

Searching a {search-snap} is the same as searching any other index. Search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "Searching a searchable snapshot index" to be consistent with glossary. I think it reads better too.

Same comment goes for a number of the "{search-snap}" mentions throughout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've added a few "index" or "shard" nouns throughout in 990707b.

@@ -450,6 +450,22 @@ in the <<glossary-mapping,mapping>>.
// end::routing-def[]
--

[[glossary-searchable-snapshot]] searchable snapshot ::
// tag::searchable-snapshot-def[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of it more like "searchable snapshot" is the concept, whereas a "searchable snapshot index" is a concrete index backed by a searchable snapshot. I.e., there is no object anywhere that is a "searchable snapshot", since all snapshots can be made searchable (through a "searchable snapshot index"). But I am also ok with the current text.

do we even need the notion of the searchable snapshot index

I think referring to an index as just a "searchable snapshot" is unintuitive, since it is an index, not a snapshot.

@debadair debadair merged commit b95d9c4 into elastic:master Oct 22, 2020
debadair added a commit to debadair/elasticsearch that referenced this pull request Oct 22, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit to debadair/elasticsearch that referenced this pull request Oct 22, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit that referenced this pull request Oct 22, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit that referenced this pull request Oct 22, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit to debadair/elasticsearch that referenced this pull request Oct 28, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit to debadair/elasticsearch that referenced this pull request Oct 28, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit that referenced this pull request Oct 28, 2020
* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
debadair added a commit that referenced this pull request Oct 28, 2020
* [DOCS] Add searchable snapshots topic. (#63040)

* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <[email protected]>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>

* Fixed glossary entries

Co-authored-by: David Turner <[email protected]>
Co-authored-by: James Rodewig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >docs General docs changes Team:Distributed Meta label for distributed team Team:Docs Meta label for docs team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants