Skip to content

Commit

Permalink
Merge branch '6.x' into ccr-6.x
Browse files Browse the repository at this point in the history
* 6.x:
  Watcher: Make settings reloadable (#31746)
  [Rollup] Histo group config should support scaled_floats (#32048)
  lazy snapshot repository initialization (#31606)
  Add secure setting for watcher email password (#31620)
  Watcher: cleanup ensureWatchExists use (#31926)
  Add second level of field collapsing (#31808)
  Added lenient flag for synonym token filter (#31484) (#31970)
  Test: Fix a second case of bad watch creation
  [Rollup] Use composite's missing_bucket (#31402)
  Docs: Restyled cloud link in getting started
  Docs: Change formatting of Cloud options
  Re-instate link in StringFunctionUtils javadocs
  Correct spelling of AnalysisPlugin#requriesAnalysisSettings (#32025)
  Fix problematic chars in javadoc
  [ML] Move open job failure explanation out of root cause (#31925)
  [ML] Switch ML native QA tests to use a 3 node cluster (#32011)
  • Loading branch information
dnhatn committed Jul 13, 2018
2 parents 7fde7a0 + 6e1b187 commit 0a90962
Show file tree
Hide file tree
Showing 97 changed files with 1,879 additions and 699 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,49 @@ PUT /test_index
The above configures a `search_synonyms` filter, with a path of
`analysis/synonym.txt` (relative to the `config` location). The
`search_synonyms` analyzer is then configured with the filter.
Additional settings are: `expand` (defaults to `true`).

Additional settings are:

* `expand` (defaults to `true`).
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:

[source,js]
--------------------------------------------------
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "standard",
"filter" : ["my_stop", "synonym_graph"]
}
},
"filter" : {
"my_stop": {
"type" : "stop",
"stopwords": ["bar"]
},
"synonym_graph" : {
"type" : "synonym_graph",
"lenient": true,
"synonyms" : ["foo, bar => baz"]
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE
With the above request the word `bar` gets skipped but a mapping `foo => baz` is still added. However, if the mapping
being added was "foo, baz => bar" nothing would get added to the synonym list. This is because the target word for the
mapping is itself eliminated because it was a stop word. Similarly, if the mapping was "bar, foo, baz" and `expand` was
set to `false` no mapping would get added as when `expand=false` the target mapping is the first word. However, if
`expand=true` then the mappings added would be equivalent to `foo, baz => foo, baz` i.e, all mappings other than the
stop word.

[float]
==== `tokenizer` and `ignore_case` are deprecated
Expand Down
47 changes: 45 additions & 2 deletions docs/reference/analysis/tokenfilters/synonym-tokenfilter.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,55 @@ PUT /test_index

The above configures a `synonym` filter, with a path of
`analysis/synonym.txt` (relative to the `config` location). The
`synonym` analyzer is then configured with the filter. Additional
settings is: `expand` (defaults to `true`).
`synonym` analyzer is then configured with the filter.

This filter tokenize synonyms with whatever tokenizer and token filters
appear before it in the chain.

Additional settings are:

* `expand` (defaults to `true`).
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:

[source,js]
--------------------------------------------------
PUT /test_index
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "standard",
"filter" : ["my_stop", "synonym"]
}
},
"filter" : {
"my_stop": {
"type" : "stop",
"stopwords": ["bar"]
},
"synonym" : {
"type" : "synonym",
"lenient": true,
"synonyms" : ["foo, bar => baz"]
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE
With the above request the word `bar` gets skipped but a mapping `foo => baz` is still added. However, if the mapping
being added was "foo, baz => bar" nothing would get added to the synonym list. This is because the target word for the
mapping is itself eliminated because it was a stop word. Similarly, if the mapping was "bar, foo, baz" and `expand` was
set to `false` no mapping would get added as when `expand=false` the target mapping is the first word. However, if
`expand=true` then the mappings added would be equivalent to `foo, baz => foo, baz` i.e, all mappings other than the
stop word.


[float]
==== `tokenizer` and `ignore_case` are deprecated

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/getting-started.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,13 @@ With that out of the way, let's get started with the fun part...

== Installation

[TIP]
==============
You can skip installation completely by using our hosted
Elasticsearch Service on https://www.elastic.co/cloud[Elastic Cloud], which is
available on AWS and GCP. You can
https://www.elastic.co/cloud/elasticsearch-service/signup[try out the hosted service] for free.
==============

Elasticsearch requires at least Java 8. Specifically as of this writing, it is recommended that you use the Oracle JDK version {jdk}. Java installation varies from platform to platform so we won't go into those details here. Oracle's recommended installation documentation can be found on http://docs.oracle.com/javase/8/docs/technotes/guides/install/install_overview.html[Oracle's website]. Suffice to say, before you install Elasticsearch, please check your Java version first by running (and then install/upgrade accordingly if needed):

Expand Down
102 changes: 102 additions & 0 deletions docs/reference/search/request/collapse.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,105 @@ The default is based on the number of data nodes and the default search thread p

WARNING: `collapse` cannot be used in conjunction with <<search-request-scroll, scroll>>,
<<search-request-rescore, rescore>> or <<search-request-search-after, search after>>.

==== Second level of collapsing

Second level of collapsing is also supported and is applied to `inner_hits`.
For example, the following request finds the top scored tweets for
each country, and within each country finds the top scored tweets
for each user.

[source,js]
--------------------------------------------------
GET /twitter/_search
{
"query": {
"match": {
"message": "elasticsearch"
}
},
"collapse" : {
"field" : "country",
"inner_hits" : {
"name": "by_location",
"collapse" : {"field" : "user"},
"size": 3
}
}
}
--------------------------------------------------
// NOTCONSOLE


Response:
[source,js]
--------------------------------------------------
{
...
"hits": [
{
"_index": "twitter",
"_type": "_doc",
"_id": "9",
"_score": ...,
"_source": {...},
"fields": {"country": ["UK"]},
"inner_hits":{
"by_location": {
"hits": {
...,
"hits": [
{
...
"fields": {"user" : ["user124"]}
},
{
...
"fields": {"user" : ["user589"]}
},
{
...
"fields": {"user" : ["user001"]}
}
]
}
}
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "1",
"_score": ..,
"_source": {...},
"fields": {"country": ["Canada"]},
"inner_hits":{
"by_location": {
"hits": {
...,
"hits": [
{
...
"fields": {"user" : ["user444"]}
},
{
...
"fields": {"user" : ["user1111"]}
},
{
...
"fields": {"user" : ["user999"]}
}
]
}
}
}
},
....
]
}
--------------------------------------------------
// NOTCONSOLE

NOTE: Second level of of collapsing doesn't allow `inner_hits`.
5 changes: 5 additions & 0 deletions docs/reference/setup/install.asciidoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
[[install-elasticsearch]]
== Installing Elasticsearch

[float]
=== Hosted Elasticsearch
Elasticsearch can be run on your own hardware or using our hosted
Elasticsearch Service on https://www.elastic.co/cloud[Elastic Cloud], which is
available on AWS and GCP. You can
https://www.elastic.co/cloud/elasticsearch-service/signup[try out the hosted service] for free.

[float]
=== Installing Elasticsearch Yourself

Elasticsearch is provided in the following package formats:

[horizontal]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@
import java.util.Map;
import java.util.TreeMap;

import static org.elasticsearch.plugins.AnalysisPlugin.requriesAnalysisSettings;
import static org.elasticsearch.plugins.AnalysisPlugin.requiresAnalysisSettings;

public class CommonAnalysisPlugin extends Plugin implements AnalysisPlugin {

Expand Down Expand Up @@ -201,11 +201,11 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
filters.put("cjk_width", CJKWidthFilterFactory::new);
filters.put("classic", ClassicFilterFactory::new);
filters.put("czech_stem", CzechStemTokenFilterFactory::new);
filters.put("common_grams", requriesAnalysisSettings(CommonGramsTokenFilterFactory::new));
filters.put("common_grams", requiresAnalysisSettings(CommonGramsTokenFilterFactory::new));
filters.put("decimal_digit", DecimalDigitFilterFactory::new);
filters.put("delimited_payload_filter", LegacyDelimitedPayloadTokenFilterFactory::new);
filters.put("delimited_payload", DelimitedPayloadTokenFilterFactory::new);
filters.put("dictionary_decompounder", requriesAnalysisSettings(DictionaryCompoundWordTokenFilterFactory::new));
filters.put("dictionary_decompounder", requiresAnalysisSettings(DictionaryCompoundWordTokenFilterFactory::new));
filters.put("dutch_stem", DutchStemTokenFilterFactory::new);
filters.put("edge_ngram", EdgeNGramTokenFilterFactory::new);
filters.put("edgeNGram", EdgeNGramTokenFilterFactory::new);
Expand All @@ -216,11 +216,11 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
filters.put("german_normalization", GermanNormalizationFilterFactory::new);
filters.put("german_stem", GermanStemTokenFilterFactory::new);
filters.put("hindi_normalization", HindiNormalizationFilterFactory::new);
filters.put("hyphenation_decompounder", requriesAnalysisSettings(HyphenationCompoundWordTokenFilterFactory::new));
filters.put("hyphenation_decompounder", requiresAnalysisSettings(HyphenationCompoundWordTokenFilterFactory::new));
filters.put("indic_normalization", IndicNormalizationFilterFactory::new);
filters.put("keep", requriesAnalysisSettings(KeepWordFilterFactory::new));
filters.put("keep_types", requriesAnalysisSettings(KeepTypesFilterFactory::new));
filters.put("keyword_marker", requriesAnalysisSettings(KeywordMarkerTokenFilterFactory::new));
filters.put("keep", requiresAnalysisSettings(KeepWordFilterFactory::new));
filters.put("keep_types", requiresAnalysisSettings(KeepTypesFilterFactory::new));
filters.put("keyword_marker", requiresAnalysisSettings(KeywordMarkerTokenFilterFactory::new));
filters.put("kstem", KStemTokenFilterFactory::new);
filters.put("length", LengthTokenFilterFactory::new);
filters.put("limit", LimitTokenCountFilterFactory::new);
Expand All @@ -229,8 +229,8 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
filters.put("multiplexer", MultiplexerTokenFilterFactory::new);
filters.put("ngram", NGramTokenFilterFactory::new);
filters.put("nGram", NGramTokenFilterFactory::new);
filters.put("pattern_capture", requriesAnalysisSettings(PatternCaptureGroupTokenFilterFactory::new));
filters.put("pattern_replace", requriesAnalysisSettings(PatternReplaceTokenFilterFactory::new));
filters.put("pattern_capture", requiresAnalysisSettings(PatternCaptureGroupTokenFilterFactory::new));
filters.put("pattern_replace", requiresAnalysisSettings(PatternReplaceTokenFilterFactory::new));
filters.put("persian_normalization", PersianNormalizationFilterFactory::new);
filters.put("porter_stem", PorterStemTokenFilterFactory::new);
filters.put("remove_duplicates", RemoveDuplicatesTokenFilterFactory::new);
Expand All @@ -241,10 +241,10 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
filters.put("serbian_normalization", SerbianNormalizationFilterFactory::new);
filters.put("snowball", SnowballTokenFilterFactory::new);
filters.put("sorani_normalization", SoraniNormalizationFilterFactory::new);
filters.put("stemmer_override", requriesAnalysisSettings(StemmerOverrideTokenFilterFactory::new));
filters.put("stemmer_override", requiresAnalysisSettings(StemmerOverrideTokenFilterFactory::new));
filters.put("stemmer", StemmerTokenFilterFactory::new);
filters.put("trim", TrimTokenFilterFactory::new);
filters.put("truncate", requriesAnalysisSettings(TruncateTokenFilterFactory::new));
filters.put("truncate", requiresAnalysisSettings(TruncateTokenFilterFactory::new));
filters.put("unique", UniqueTokenFilterFactory::new);
filters.put("uppercase", UpperCaseTokenFilterFactory::new);
filters.put("word_delimiter_graph", WordDelimiterGraphTokenFilterFactory::new);
Expand All @@ -256,8 +256,8 @@ public Map<String, AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
public Map<String, AnalysisProvider<CharFilterFactory>> getCharFilters() {
Map<String, AnalysisProvider<CharFilterFactory>> filters = new TreeMap<>();
filters.put("html_strip", HtmlStripCharFilterFactory::new);
filters.put("pattern_replace", requriesAnalysisSettings(PatternReplaceCharFilterFactory::new));
filters.put("mapping", requriesAnalysisSettings(MappingCharFilterFactory::new));
filters.put("pattern_replace", requiresAnalysisSettings(PatternReplaceCharFilterFactory::new));
filters.put("mapping", requiresAnalysisSettings(MappingCharFilterFactory::new));
return filters;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ public void testProbabilityOfRelevance() {
* 4 | 1 | 0.03125 | 0.078125 | 0.00244140625 |
* }</pre>
*
* err => sum of last column
* err = sum of last column
*/
public void testERRAt() {
List<RatedDocument> rated = new ArrayList<>();
Expand All @@ -94,7 +94,7 @@ public void testERRAt() {
* 4 | 1 | 0.03125 | 0.125 | 0.00390625 |
* }</pre>
*
* err => sum of last column
* err = sum of last column
*/
public void testERRMissingRatings() {
List<RatedDocument> rated = new ArrayList<>();
Expand Down
Loading

0 comments on commit 0a90962

Please sign in to comment.