EQL: Add optional fields and limit joining keys on non-null values only #79677

astefan · 2021-10-22T14:05:53Z

Add optional fields, with usage inside queries' filters and as join keys.
Optional fields have a question mark in front of their name (?some_field) and can be used in standalone event queries and sequences.

If the field exists in at least one index that's getting queried, that field will actually be used in queries.
If the field doesn't exist in any of the indices, all its mentions in query filters will be replaced with null.
For sequences, optional fields as keys can have null as a value, whereas non-optional fields will be restricted to non-null values only.

For example, a query like

sequence by ?process.entity_id, process.pid
  [process where transID == 2]
  [file where transID == 0] with runs=2

can return a sequence with a join key [null,123].
If the sequence will use process.pid as an optional field (sequence by ?process.entity_id, ?process.pid), as well, the sequence can now return join keys as [null,123], [null,null], [512,null].

Restrict joining keys to non-null values only for non-optional fields

elasticmachine · 2021-10-22T15:21:06Z

Pinging @elastic/es-ql (Team:QL)

costin

The change can be made less intrusive by mapping the optional key onto existing concepts.
For example for optional attribute construct a special unresolved attribute by initializing the unresolved message with a singleton string (say 'optional' or '?').
Don't keep track of the optional attributes, simply let the analyzer resolve all of them and add a separate rule in Finish Analysis that looks for all unresolved attributes that have defined string and replace them with a null literal.

This takes advantage of folding the expressions and also extracting the null constant if defined as a key. The verifier would not be tripped (and just has to be updated to consider the null type for key compatibility) while BoxedQuery & co only need to care about converting null to exists queries.

costin · 2021-10-23T13:01:26Z

x-pack/plugin/eql/qa/common/src/main/java/org/elasticsearch/test/eql/EqlSpecLoader.java

+ if (arr != null) {
+ spec.joinKeys(arr.toArray(new String[0]));
+ } else {
+ spec.joinKeys(new String[0]);
+ }


spec.joinKeys(arr != null ? arr.toArray(new String[0]) : new String[0]);

costin · 2021-10-23T13:02:56Z

x-pack/plugin/eql/qa/common/src/main/resources/additional_test_queries.toml

@@ -430,3 +430,63 @@ query = '''
 process where substring(command_line, 5) regex (".*?net[1]? localgroup.*?", ".*? myappserver.py .*?")
 '''

+[[queries]]


👍
Please add some test(s) with until as well.

It's not easy to add a test with until given the current data. That's why I added some more data to extra.data and created more tests with that data set. There is a commented test in test_extra.toml but it is like that because I don't think until works as expected. I can uncomment that and adjust the join_keys to have a test with until.

costin · 2021-10-23T13:05:02Z

x-pack/plugin/eql/qa/common/src/main/resources/test_extra.toml

+expected_event_ids = [18,19,20]
+join_keys = ["512","123"]
+
+# Known issue: same key used as both optional and non-optional doesn't work as expected. Does it even make sense such scenario?


I can't think of a use case where this makes sense - a key has to be either optional or not.
This use-case needs to be prevent in the verifier.

costin · 2021-10-23T13:32:30Z

...in/eql/src/main/java/org/elasticsearch/xpack/eql/expression/OptionalUnresolvedAttribute.java

+import org.elasticsearch.xpack.ql.type.DataType;
+import org.elasticsearch.xpack.ql.type.DataTypes;
+
+public class OptionalUnresolvedAttribute extends UnresolvedAttribute {


As it's currently defined the class is confusing.
It extends UnresolvedAttribute yet it can be resolved.
A better name might be OptionalAttribute.

costin · 2021-10-23T13:34:25Z

...ugin/eql/src/main/java/org/elasticsearch/xpack/eql/execution/assembler/ExecutionManager.java


 public ExecutionManager(EqlSession eqlSession) {
 this.session = eqlSession;
 this.cfg = eqlSession.configuration();
+ this.optionalKeys = new HashSet<>();


I'd opt for LHS to help with debugging and preserving the order of the keys specified in the initial query.

costin · 2021-10-23T13:35:49Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/parser/EqlParser.java

@@ -38,6 +40,7 @@
 private static final Logger log = LogManager.getLogger();

 private final boolean DEBUG = false;
+ private final Set<Expression> keyOptionals = new HashSet<>();


Unless there's a performance problem, for debugging it's better to use LHS instead of a plain HS.

The optional keys are attributes not expressions.

costin · 2021-10-23T13:42:29Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/parser/EqlParser.java

@@ -65,6 +68,10 @@ public Expression createExpression(String expression, ParserParams params) {
 return invokeParser(expression, params, EqlBaseParser::singleExpression, AstBuilder::expression);
 }

+ public Set<Expression> keyOptionals() {


Why the public use of this getter? Why not use the same approach as for ParserParams? That is create the list during the method call since it's just needed inside invokeParser method.

costin · 2021-10-23T13:50:56Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/parser/ExpressionBuilder.java

 try {
- return ctx != null ? visitList(this, ctx.expression(), Attribute.class) : emptyList();
+ List<Attribute> keys = visitList(this, ctx.expression(), Attribute.class);
+ for (Attribute key : keys) {


The information here is redundant and this optimizations has too many repercutions in the method signatures.
Either the ? is remembered through a separate set or the information is embedded inside the UnresolvedAttribute either by subclassing or using a special tag (such as a special string used for identifying it).
I opt for the latter since it avoid the set to be passed as a constructor in classes that have nothing to do with it - instead the discovery of the unresolvedattribute can be done directly at resolution time or through a dedicated rule that looks for them.

costin · 2021-10-23T13:52:06Z

.../plugin/eql/src/main/java/org/elasticsearch/xpack/eql/querydsl/container/QueryContainer.java

@@ -120,6 +121,10 @@ public QueryContainer addColumn(Attribute attr) {
 return new Tuple<>(this, extractorRegistry.fieldExtraction(expression));
 }

+ if (expression instanceof OptionalUnresolvedAttribute) {


This looks like leaked concern - better to make the attribute foldable which maps nicely to the concept of replacing it with null.

costin · 2021-10-23T14:03:41Z

x-pack/plugin/eql/src/main/java/org/elasticsearch/xpack/eql/analysis/Verifier.java

@@ -266,6 +269,9 @@ private void checkJoinKeyTypes(LogicalPlan plan, Set<Failure> localFailures) {
 }

 private static void doCheckKeyTypes(Join join, Set<Failure> localFailures, NamedExpression expectedKey, NamedExpression currentKey) {
+ if (expectedKey instanceof OptionalUnresolvedAttribute || currentKey instanceof OptionalUnresolvedAttribute) {


Better yet, just check if the datatype is not null

Luegg

It looks like the ? syntax will be used for two distinct things: a) treat a non-existing field as null instead of failing the query and b) also use null as a join key

I was not part of the discussions around the feature but I'm wondering wether it wouldn't be better to keep these two concerns separated. In total, there are 4 different ways on how nulls and non-existing fields can be treated in join keys:

don't use nulls as join key, fail on non-existing field (new default)
use nulls as join key, fail on non-existing field
don't use nulls as join key, replace non-existing field with null
use nulls as join key, replace non-existing field with null (? syntax)

The second and third option will not be available to the user with this change but I think the second option is the behavior users might usually prefer if they want to use null as join keys (and get a proper error if they misspelled the field name).

Hence, would it make sense to have a separate syntax for including nulls as join keys? Something like sequence by foo with nulls, bar ...

astefan · 2021-10-25T13:56:12Z

Synced with @Luegg offline on the concern he raised and we agreed it's worth investigating the possibility of being explicit about the "allow/disallow nulls" behavior for join keys:

if it would be a useful feature to have
if it makes sense for it to be added to EQL, in the context of join keys as expressions
if all of the above are valid points, what would be the syntax for it

Replace missing field based on a context either with literal or a dedicated attribute. This allows folding to occur without dealing with aliases or making the named expression act as a literal (which can lead to accidental folding). Make the mandatory key constraint explicit inside each filter query to help in propagating the constraint across sequences without special handling.

costin

In the interest of time, after a discussion with @astefan I've pushed an update based on my comments.
Essentially instead of using one attribute that acts both as a literal, resolved attribute and unresolved one, 3 different attribute classes are introduced.
They simply act as markers and don't introduce new behavior. The upside is the Analyzer and Optimizer can perform fine grained replacement:

the analyzer does the resolution separately and in case things are not found replaces it with a literal (in a filter) or an MissionOptionalAttribute.
the optimizer adds non-null constraints based on all mandatory keys (the optional fields are excluded).
This simplifies things by avoiding passing any set or list - the boxed query generator can determine that based on the type of keys and add the or query when needed.

git log:
Replace missing field based on a context either with literal or a dedicated attribute. This allows folding to occur without dealing with aliases or making the named expression act as a literal (which can lead to accidental folding).

Make the mandatory key constraint explicit inside each filter query to help in propagating the constraint across sequences without special handling.

elasticsearchmachine · 2021-10-26T13:24:45Z

💚 Backport successful

Status	Branch	Result
✅	7.16

…ly (elastic#79677) Add optional fields, with usage inside queries' filters and as join keys. Optional fields have a question mark in front of their name (`?some_field`) and can be used in standalone event queries and sequences. If the field exists in at least one index that's getting queried, that field will actually be used in queries. If the field doesn't exist in any of the indices, all its mentions in query filters will be replaced with `null`. For sequences, optional fields as keys can have `null` as a value, whereas non-optional fields will be restricted to non-null values only. For example, a query like ``` sequence by ?process.entity_id, process.pid [process where transID == 2] [file where transID == 0] with runs=2 ``` can return a sequence with a join key `[null,123]`. If the sequence will use `process.pid` as an optional field (`sequence by ?process.entity_id, ?process.pid`), as well, the sequence can now return join keys as `[null,123]`, `[null,null]`, `[512,null]`.

bpintea

LGTM

…ly (#79677) (#79807) Add optional fields, with usage inside queries' filters and as join keys. Optional fields have a question mark in front of their name (`?some_field`) and can be used in standalone event queries and sequences. If the field exists in at least one index that's getting queried, that field will actually be used in queries. If the field doesn't exist in any of the indices, all its mentions in query filters will be replaced with `null`. For sequences, optional fields as keys can have `null` as a value, whereas non-optional fields will be restricted to non-null values only. For example, a query like ``` sequence by ?process.entity_id, process.pid [process where transID == 2] [file where transID == 0] with runs=2 ``` can return a sequence with a join key `[null,123]`. If the sequence will use `process.pid` as an optional field (`sequence by ?process.entity_id, ?process.pid`), as well, the sequence can now return join keys as `[null,123]`, `[null,null]`, `[512,null]`.

* upstream/master: (209 commits) Enforce license expiration (elastic#79671) TSDB: Automatically add timestamp mapper (elastic#79136) [DOCS] `_id` is required for bulk API's `update` action (elastic#79774) EQL: Add optional fields and limit joining keys on non-null values only (elastic#79677) [DOCS] Document range enrich policy (elastic#79607) [DOCS] Fix typos in 8.0 security migration (elastic#79802) Allow listing older repositories (elastic#78244) [ML] track inference model feature usage per node (elastic#79752) Remove IncrementalClusterStateWriter & related code (elastic#79738) Reuse previous indices lookup when possible (elastic#79004) Reduce merging in PersistedClusterStateService (elastic#79793) SQL: Adjust JDBC docs to use milliseconds for timeouts (elastic#79628) Remove endpoint for freezing indices (elastic#78918) [ML] add timeout parameter for DELETE trained_models API (elastic#79739) [ML] wait for .ml-state-write alias to be readable (elastic#79731) [Docs] Update edgengram-tokenizer.asciidoc (elastic#79577) [Test][Transform] fix UpdateTransformActionRequestTests failure (elastic#79787) Limit CS Update Task Description Size (elastic#79443) Apply the reader wrapper on can_match source (elastic#78988) [DOCS] Adds new transform limitation item and a note to the tutorial (elastic#79479) ... # Conflicts: # server/src/main/java/org/elasticsearch/index/IndexMode.java # server/src/test/java/org/elasticsearch/index/TimeSeriesModeTests.java

astefan · 2021-10-27T11:47:58Z

CC @jrodewig

…ly (elastic#79677) Add optional fields, with usage inside queries' filters and as join keys. Optional fields have a question mark in front of their name (`?some_field`) and can be used in standalone event queries and sequences. If the field exists in at least one index that's getting queried, that field will actually be used in queries. If the field doesn't exist in any of the indices, all its mentions in query filters will be replaced with `null`. For sequences, optional fields as keys can have `null` as a value, whereas non-optional fields will be restricted to non-null values only. For example, a query like ``` sequence by ?process.entity_id, process.pid [process where transID == 2] [file where transID == 0] with runs=2 ``` can return a sequence with a join key `[null,123]`. If the sequence will use `process.pid` as an optional field (`sequence by ?process.entity_id, ?process.pid`), as well, the sequence can now return join keys as `[null,123]`, `[null,null]`, `[512,null]`.

Adds new sections for optional fields and optional `by` fields. Also revises some existing content to define **join keys**. Closes #79910 Relates to #79677

Add optional fields

b30f439

Restrict joining keys to non-null values only for non-optional fields

elasticsearchmachine added the v8.0.0 label Oct 22, 2021

Fix tests

20e45e2

This was referenced Oct 22, 2021

EQL: Filter out null join keys in sequence queries #78195

Closed

EQL: Add optional fields #78769

Closed

astefan added :Analytics/EQL EQL querying >feature labels Oct 22, 2021

elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Oct 22, 2021

astefan requested review from costin, Luegg and bpintea October 22, 2021 15:21

costin requested changes Oct 23, 2021

View reviewed changes

Luegg reviewed Oct 25, 2021

View reviewed changes

costin approved these changes Oct 26, 2021

View reviewed changes

astefan added >bug v7.16.0 and removed >feature labels Oct 26, 2021

Add more tests

e0ab3bf

astefan added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) v7.16.1 labels Oct 26, 2021

elasticsearchmachine merged commit d4bf2dc into elastic:master Oct 26, 2021

astefan mentioned this pull request Oct 26, 2021

[7.16] EQL: Add optional fields and limit joining keys on non-null values only (#79677) #79807

Merged

bpintea approved these changes Oct 26, 2021

View reviewed changes

astefan deleted the eql_optional_fields2 branch October 26, 2021 14:38

jrodewig mentioned this pull request Oct 27, 2021

[DOCS] EQL: Document optional fields and null value restrictions #79910

Closed

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

danhermann removed the v7.16.1 label Oct 27, 2021

jrodewig mentioned this pull request Nov 1, 2021

[DOCS] EQL: Document optional fields #80150

Merged

jrodewig added a commit that referenced this pull request Nov 3, 2021

[DOCS] EQL: Document optional fields (#80150)

a509205

Adds new sections for optional fields and optional `by` fields. Also revises some existing content to define **join keys**. Closes #79910 Relates to #79677

brokensound77 mentioned this pull request Jan 10, 2022

Add support for optional fields endgameinc/eql#58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EQL: Add optional fields and limit joining keys on non-null values only #79677

EQL: Add optional fields and limit joining keys on non-null values only #79677

astefan commented Oct 22, 2021

elasticmachine commented Oct 22, 2021

costin left a comment

costin Oct 23, 2021

costin Oct 23, 2021

astefan Oct 23, 2021 •

edited

Loading

costin Oct 23, 2021

costin Oct 23, 2021

costin Oct 23, 2021

costin Oct 23, 2021

costin Oct 23, 2021

costin Oct 23, 2021

costin Oct 23, 2021

costin Oct 23, 2021

Luegg left a comment

astefan commented Oct 25, 2021

costin left a comment

elasticsearchmachine commented Oct 26, 2021

bpintea left a comment

astefan commented Oct 27, 2021

EQL: Add optional fields and limit joining keys on non-null values only #79677

EQL: Add optional fields and limit joining keys on non-null values only #79677

Conversation

astefan commented Oct 22, 2021

elasticmachine commented Oct 22, 2021

costin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

astefan Oct 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luegg left a comment

Choose a reason for hiding this comment

astefan commented Oct 25, 2021

costin left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Oct 26, 2021

💚 Backport successful

bpintea left a comment

Choose a reason for hiding this comment

astefan commented Oct 27, 2021

astefan Oct 23, 2021 •

edited

Loading