Add `indices` field to `_matchesPosition` to specify where in an array a match comes from #5005

LukasKalbertodt · 2024-10-14T10:31:25Z

Pull Request

Related issue

What does this PR do?

This adds an indices fields to the objects returned in _matchesPosition. The new field describes in what array element in the document the match was found. This was impossible before and users simply did not know where a match originated inside an array.

Example document:

{
  "id": "123",
  "names": ["foo", "bar", "catnip"],
  "noarray": "dog cat fox",
  "nested": [
    ["dog", "cat"],
    ["fox", "bear"]
  ]
}

Searching for cat now returns this:

"_matchesPosition": {
  "names": [
    {
      "start": 0,
      "length": 3,
      "indices": [2]
    }
  ],
  "nested": [
    {
      "start": 0,
      "length": 3,
      "indices": [0, 1]
    }
  ],
  "noarray": [
    {
      "start": 4,
      "length": 3
    }
  ]
}

Having indices be an array is required due to nested arrays, so one sometimes needs multiple indices to know what data the match comes from.

Alternative API designs

An alternative design would be to include the indices inside the key of _matchesPosition, e.g. foo.bar[2].baz. This is more intuitive to me and puts all "location inside document" information into one place, but has some disadvantages: the index needs to be parsed out of the key, which is annoying for end users. Also, JSON fields can have keys containing [2], so escaping would be necessary. Or one could insert . dots before the [2] (e.g. foo.bar.[2].baz) which might make parsing easier.

Another alternative could be to just include the full path inside the object, e.g.:

{
  "start": 4,
  "length": 3,
  "path": ["foo", "bar", 2, "baz"]
}

String elements would mean fields in an object, numbers would mean indices into an array. This is nice as all path information is in one place. This this, unlike with the current design (of this PR), you would also not need to know the document structure to understand at what levels the indices actually apply.

Of course, the disadvantage is that there is duplication with the keys inside _matchesPosition. One could also convert _matchesPosition to an array, but that's quite the breaking change and it would make some use cases more annoying.

In summary: I personally am fine with all three designs. The implicit "you have to know where the indices go" of the current design is not too bad; the parsing of the foo.bar[2].baz approach also seems ok; adding the nicely typed full path also probably doesn't hurt too much thanks to compression. Let me know what you think! I can change this PR to switch to another approach.

ManyTheFish · 2024-10-15T07:45:21Z

Hello @LukasKalbertodt,
is your PR ready for review? If it's not, could you please convert it as a draft PR?

Thanks!

For matches inside arrays, this field holds the indices of the array elements that matched. For example, searching for `cat` inside `{ "a": ["dog", "cat", "fox"] }` would return `indices: [1]`. For nested arrays, this contains multiple indices, starting with the one for the top-most array. For matches in fields without arrays, `indices` is not serialized (does not exist) to save space.

LukasKalbertodt · 2024-10-15T11:37:04Z

@ManyTheFish The PR is ready for review (now that CI should be fixed...). The alternatives in the descriptions are just considerations for you, to decide what path to choose.

Remove unreachable code

9bd43df

LukasKalbertodt force-pushed the array-indices-for-matches branch from 9fd1116 to 257a8f6 Compare October 15, 2024 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `indices` field to `_matchesPosition` to specify where in an array a match comes from #5005

Add `indices` field to `_matchesPosition` to specify where in an array a match comes from #5005

LukasKalbertodt commented Oct 14, 2024

ManyTheFish commented Oct 15, 2024

LukasKalbertodt commented Oct 15, 2024

Add indices field to _matchesPosition to specify where in an array a match comes from #5005

Are you sure you want to change the base?

Add indices field to _matchesPosition to specify where in an array a match comes from #5005

Conversation

LukasKalbertodt commented Oct 14, 2024

Pull Request

Related issue

What does this PR do?

Alternative API designs

ManyTheFish commented Oct 15, 2024

LukasKalbertodt commented Oct 15, 2024

Add `indices` field to `_matchesPosition` to specify where in an array a match comes from #5005

Add `indices` field to `_matchesPosition` to specify where in an array a match comes from #5005