Synthetic source #85649

nik9000 · 2022-04-01T19:48:18Z

This attempts to shrink the index by implementing a "synthetic _source" field.
You configure it by in the mapping:

{
  "mappings": {
    "_source": {
      "synthetic": true
    }
  }
}

And we just stop storing the _source field - kind of. When you go to access
the _source we regenerate it on the fly by loading doc values. Doc values
don't preserve the original structure of the source you sent so we have to
make some educated guesses. And we have a rule: the source we generate would
result in the same index if you sent it back to us. That way you can use it
for things like _reindex.

Fetching the _source from doc values does slow down loading somewhat. See
numbers further down.

Supported fields

This only works for the following fields:

boolean
byte
date
double
float
geo_point (with precision loss)
half_float
integer
ip
keyword
long
scaled_float
short
text (when there is a keyword sub-field that is compatible with this feature)

Educated guesses

The synthetic source generator makes _source fields that are:

sorted alphabetically
as "objecty" as possible
pushes all arrays to the "leaf" fields
sorts most array values
removes duplicate text and keyword values

These are mostly artifacts of how doc values are stored.

sorted alphabetically

{
  "b": 1,
  "c": 2,
  "a": 3
}

becomes

{
  "a": 3,
  "b": 1,
  "c": 2
}

as "objecty" as possible

{
  "a.b": "foo"
}

becomes

{
  "a": {
    "b": "foo"
  }
}

pushes all arrays to the "leaf" fields

{
  "a": [
    {
      "b": "foo",
      "c": "bar"
    },
    {
      "c": "bort"
    },
    {
      "b": "snort"
    }
}

becomes

{
  "a" {
    "b": ["foo", "snort"],
    "c": ["bar", "bort"]
  }
}

sorts most array values

{
  "a": [2, 3, 1]
}

becomes

{
  "a": [1, 2, 3]
}

removes duplicate text and keyword values

{
  "a": ["bar", "baz", "baz", "baz", "foo", "foo"]
}

becomes

{
  "a": ["bar", "baz", "foo"]
}

`_recovery_source`

Elasticsearch's shard "recovery" process needs _source sometimes. So does
cross cluster replication. If you disable source or filter it somehow we store
a _recovery_source field for as long as the recovery process might need it.
When everything is running smoothly that's generally a few seconds or minutes.
Then the fields is removed on merge. This synthetic source feature continues
to produce _recovery_source and relies on it for recovery. It's possible
to synthesize _source during recovery but we don't do it.

That means that synethic source doesn't speed up writing the index. But in the
future we might be able to turn this on to trade writing less data at index
time for slower recovery and cross cluster replication. That's an area of
future improvement.

perf numbers

I loaded the entire tsdb data set with this change and the size:

           standard -> synthetic
store size  31.0 GB ->  7.0 GB  (77.5% reduction)
_source  24695.7 MB -> 47.6 MB  (99.8% reduction - synthetic is in _recovery_source)

A second _forcemerge a few minutes after rally finishes should removes the
remaining 47.6MB of _recovery_source.

With this fetching source for 1,000 documents seems to take about 500ms. I
spot checked a lot of different areas and haven't seen any different hit. I
expect this performance impact is based on the number of doc values fields
in the index and how sparse they are.

nik9000 · 2022-04-05T14:31:13Z

I hacked together something to test the differences:

export from=$(curl -s -HContent-Type:application/json -uelastic:D2MVMBAUE0fUDu30A6yO -XPOST -k 'https://localhost:9201/tsdb/_search?size=0&pretty' -d'{
  "aggs": {
    "min": {
      "min": {
        "field": "@timestamp",
        "format": "epoch_millis"
      }
    }
  }
}' | jq -r .aggregations.min.value_as_string)
export to=$(curl -s -HContent-Type:application/json -uelastic:D2MVMBAUE0fUDu30A6yO -XPOST -k 'https://localhost:9201/tsdb/_search?size=0&pretty' -d'{
  "aggs": {
    "max": {
      "max": {
        "field": "@timestamp",
        "format": "epoch_millis"
      }
    }
  }
}' | jq -r .aggregations.max.value_as_string)
for date in $(seq $from 1000 $to); do
  for id in $(curl -s -HContent-Type:application/json -uelastic:D2MVMBAUE0fUDu30A6yO -XPOST -k 'https://localhost:9201/tsdb/_search?size=10000&pretty' -d'{
    "stored_fields": ["__none__"],
    "sort": {"@timestamp": "desc"},
    "query": {
      "range": {
        "@timestamp": {
          "gte": '$date',
          "format": "epoch_millis"
        }
      }
    }
  }' | jq -r .hits.hits[]._id); do
    echo $date $id
    diff \
      <(curl -s -HContent-Type:application/json -uelastic:D2MVMBAUE0fUDu30A6yO -XGET -k 'https://localhost:9201/tsdb/_doc/'$id | jq 'del(._seq_no)' -S) \
      <(curl -s -HContent-Type:application/json -uelastic:D2MVMBAUE0fUDu30A6yO -XGET -k 'https://localhost:9200/tsdb/_doc/'$id | jq 'del(._seq_no)' -S)
  done
done | tee diffs

Which spits out:

1619630303410 aJUt5LaG4q6Jz3sDAAABeR4M1KU
1619630303410 JFpDQXhLo-DNlumpAAABeR4M1KU
1619630303410 uwFiL6jzbCaBLd-LAAABeR4M1KU
1619630303410 DDQFZr8fpaviwfqcAAABeR4M1GI
36c36
<               "pct": 0.0076478614634146345
---
>               "pct": 0.008
40c40
<               "pct": 0.003919529
---
>               "pct": 0.004
72c72
<               "pct": 0.4777109375
---
>               "pct": 0.47800000000000004
75c75
<               "pct": 0.015884058091486793
---
>               "pct": 0.016
97c97
<         "start_time": "2021-04-29T08:18:44Z"
---
>         "start_time": "2021-04-29T08:18:44.000Z"
1619630303410 iHhCyXEez-elWsWXAAABeR4M1GI
36c36
<               "pct": 0.2534879995
---
>               "pct": 0.253
40c40
<               "pct": 0.2534879995
---
>               "pct": 0.253
72c72
<               "pct": 0.01896543080120626
---
>               "pct": 0.019
75c75
<               "pct": 0.01896543080120626
---
>               "pct": 0.019
97c97
<         "start_time": "2021-04-29T14:31:13Z"
---
>         "start_time": "2021-04-29T14:31:13.000Z"
1619630303410 wWEXn97ymH76klIbAAABeR4M1GI
36c36
<               "pct": 0.165416355
---
>               "pct": 0.165
40c40
<               "pct": 0.00827081775
---
>               "pct": 0.008
72c72
<               "pct": 0.619265625
---
>               "pct": 0.619
75c75
<               "pct": 0.010295400826531081
---
>               "pct": 0.01
97c97
<         "start_time": "2021-04-07T10:08:31Z"
---
>         "start_time": "2021-04-07T10:08:31.000Z"

The test data has values like "pct": 0.010295400826531081 and has configured the mapping to use a scaled float with a scaling factor of 1000 so the what we actually store in doc values is "pct": 0.01 - and that is what we put in synthetic source.

nik9000 · 2022-04-05T19:05:14Z

One thing I've noticed that we probably don't want but I don't know how to get rid of is copy_to - if you use copy_to to make an index of foo.bar.message at message then doc values for both will have doc values for it and the _source will contain it twice. This is ok for now, but would prevent us from using the source for recovery.

nik9000 · 2022-04-06T17:25:59Z

One thing I've noticed that we probably don't want but I don't know how to get rid of is copy_to - if you use copy_to to make an index of foo.bar.message at message then doc values for both will have doc values for it and the _source will contain it twice. This is ok for now, but would prevent us from using the source for recovery.

I've forbidden copy_to for synthetic source indices in this PR. We can figure out how to allow it later.

Two skipped

nik9000 · 2022-05-05T20:33:37Z

@romseygeek could you have another look at this? I've pushed some extra testing for round trips and it all passes. Well, sort of. I have to stub out a little of it because of mystery precision things. But I think we can get those in a follow up change.

romseygeek · 2022-05-06T11:09:45Z

...est/java/org/elasticsearch/xpack/constantkeyword/mapper/ConstantKeywordFieldMapperTests.java

@@ -203,4 +206,24 @@ protected void randomFetchTestFieldConfig(XContentBuilder b) throws IOException
 protected boolean allowsNullValues() {
 return false; // null is an error for constant keyword
 }
+


We have enough test cases that have to implement these four identical 'empty' methods, that maybe it's worth consolidating them into a NoSyntheticSourceTest interface with default methods and the test cases can just implement them?

nik9000 · 2022-05-09T11:48:32Z

For those following along at home this used to be activated with enabled: synthetic and but now it is activated with synthetic: true. I'm debating with a few folks about which is better. But, because this is behind a feature flag, I think it's safe to merge it either way. And, since the code currently supports synthetic: true, that's what I'd like to merge in the first cut.

I didn't need him.

nik9000 · 2022-05-09T18:45:28Z

Now that this is merged I've moved the follow up work to a meta issue: #86603

romseygeek

LGTM. Thanks for all the back and forth, let's get this merged and look at the follow-ups.

nik9000 · 2022-05-10T11:20:32Z

I have some perf numbers from a hack that turns off _recovery_source. This shows the potential indexing speed improvement we could get from using synthetic source on the recovery side:

|                 Min Throughput | 18024.6    | 19495.8     | 1471.23    | docs/s |  +8.16% |
|                Mean Throughput | 19426.9    | 21727.1     | 2300.2     | docs/s | +11.84% |
|              Median Throughput | 19169.7    | 21310.8     | 2141.08    | docs/s | +11.17% |
|                 Max Throughput | 22742.4    | 26960.5     | 4218.16    | docs/s | +18.55% |
|       Cumulative indexing time |   829.772  |   768.031   |  -61.741   |    min |  -7.44% |
|          Cumulative merge time |   235.641  |   230.36    |   -5.28152 |    min |  -2.24% |
| Cumulative merge throttle time |    39.8483 |    61.9718  |   22.1234  |    min | +55.52% |
|        Cumulative refresh time |    11.0899 |     7.02558 |   -4.06432 |    min | -36.65% |
|          Cumulative flush time |    41.3977 |    30.4483  |  -10.9495  |    min | -26.45% |

The short version is about 11% improvement in docs per second in TSDB, probably more in non-TSDB. Significantly faster merges, flushes, and refreshes - at least in TSDB, probably much faster in non-TSDB.

TSDB in it's current form has a somewhat inefficient indexing pipeline, mostly because it can never skip the _id lookup on write. We will fix that eventually, but for now TSDB is known slower to write. So the 11% speed boost on write here will likely jump once that slowness is resolved. I'm running a test against a non-TSDB index now to see.

The merge time is funny to read - it looks like a 2% speed up, but I believe a lot of that speed up is being throttled. See the 55% bump in merge throttling time. My guess is that we're looking at a reduction in load from merge in the 25% range, similar to flush and refresh.

Here's what the disk looks like with _recovery_source enabled:

Device   r/s     w/s  rMB/s     wMB/s  ... wareq-sz  svctm  %util
md0     0.00  176.67   0.00     30.34  ...   175.88   0.00   0.00
md0     0.00  157.33   0.00     31.66  ...   206.07   0.00   0.00
md0     0.00  196.67   0.00     35.70  ...   185.91   0.00   0.00
md0     0.00  288.00   0.00     62.00  ...   220.44   0.00   0.00
md0     0.00  185.00   0.00     41.12  ...   227.62   0.00   0.00
md0     0.00  126.33   0.00     27.97  ...   226.71   0.00   0.00
md0     0.00  192.00   0.00     25.95  ...   138.38   0.00   0.00
md0     0.00  208.64   0.00     47.55  ...   233.36   0.00   0.00
md0     0.00 1167.00   0.00    206.48  ...   181.18   0.00   0.00
md0     0.00  206.64   0.00     23.55  ...   116.69   0.00   0.00
md0     0.00  221.67   0.00     30.54  ...   141.06   0.00   0.00
md0     0.00  158.33   0.00     29.90  ...   193.36   0.00   0.00
md0     0.00  208.00   0.00     33.23  ...   163.59   0.00   0.00
md0     0.00  266.33   0.00     70.11  ...   269.54   0.00   0.00
md0     0.00  122.67   0.00     12.91  ...   107.77   0.00   0.00
md0     0.00  184.67   0.00     28.96  ...   160.57   0.00   0.00
md0     0.00  951.67   0.00    103.28  ...   111.13   0.00   0.00
md0     0.00  214.00   0.00     31.92  ...   152.72   0.00   0.00
md0     0.00  184.00   0.00     31.07  ...   172.93   0.00   0.00

Note the bursty writes. Here's what it looks like without _recovery_source:

Device   r/s     w/s  rMB/s     wMB/s   ... wareq-sz  svctm  %util
md0     0.00  252.00   0.00     43.25   ...   175.74   0.00   0.00
md0     0.00  703.00   0.00     51.49   ...    75.00   0.00   0.00
md0     0.00  250.00   0.00     41.11   ...   168.40   0.00   0.00
md0     0.00  194.67   0.00     44.45   ...   233.82   0.00   0.00
md0     0.00  192.00   0.00     44.57   ...   237.71   0.00   0.00
md0     0.00  176.00   0.00     41.98   ...   244.23   0.00   0.00
md0     0.00  157.67   0.00     26.27   ...   170.62   0.00   0.00
md0     0.00  854.67   0.00     77.48   ...    92.83   0.00   0.00
md0     0.00  174.67   0.00     38.78   ...   227.35   0.00   0.00
md0     0.00  186.67   0.00     40.56   ...   222.51   0.00   0.00
md0     0.00  174.67   0.00     36.90   ...   216.34   0.00   0.00
md0     0.00  219.67   0.00     44.61   ...   207.96   0.00   0.00
md0     0.00  187.67   0.00     41.46   ...   226.21   0.00   0.00
md0     0.00   79.33   0.00     13.53   ...   174.67   0.00   0.00
md0     0.00  670.33   0.00     71.00   ...   108.46   0.00   0.00
md0     0.00  307.67   0.00     33.58   ...   111.75   0.00   0.00
md0     0.00  182.00   0.00     44.03   ...   247.74   0.00   0.00
md0     0.00  219.00   0.00     46.32   ...   216.57   0.00   0.00
md0     0.00  210.67   0.00     41.19   ...   200.21   0.00   0.00
md0     0.00  204.00   0.00     48.01   ...   241.01   0.00   0.00
md0     0.00  101.33   0.00     18.07   ...   182.63   0.00   0.00
md0     0.00  773.33   0.00    105.11   ...   139.18   0.00   0.00
md0     0.00  287.67   0.00     28.89   ...   102.82   0.00   0.00
md0     0.00  176.00   0.00     40.39   ...   234.99   0.00   0.00
md0     0.00  209.67   0.00     40.71   ...   198.82   0.00   0.00
md0     0.00  209.67   0.00     41.95   ...   204.86   0.00   0.00
md0     0.00  145.67   0.00     25.39   ...   178.52   0.00   0.00
md0     0.00  203.33   0.00     38.60   ...   194.41   0.00   0.00
md0     0.00  737.00   0.00     66.14   ...    91.90   0.00   0.00

The writes are less bursty. Still bursty, but less so. I believe the infrastructure that I used to run this captured graphs of the this data over a longer period of time, but I don't know how to access it. I'm digging.

Edit:
Here is the indexing performance for non-tsdb indices:

|                 Min Throughput | 54252.1     | 63616.7     | 9364.55    | docs/s |  +17.26% |
|                Mean Throughput | 55221.9     | 64975.2     | 9753.27    | docs/s |  +17.66% |
|              Median Throughput | 55085.7     | 65084.5     | 9998.85    | docs/s |  +18.15% |
|                 Max Throughput | 56526.7     | 66064.2     | 9537.54    | docs/s |  +16.87% |
|       Cumulative indexing time |   266.365   |   223.578   |  -42.7869  |    min |  -16.06% |
|          Cumulative merge time |   110.918   |    90.5038  |  -20.4143  |    min |  -18.40% |
| Cumulative merge throttle time |     1.19758 |     0.76905 |   -0.42853 |    min |  -35.78% |
|        Cumulative refresh time |     1.71403 |     1.25598 |   -0.45805 |    min |  -26.72% |
|          Cumulative flush time |     7.67595 |     5.885   |   -1.79095 |    min |  -23.33% |

This one is better in the neighborhood of 17.5% rather than 11%.

nik9000 · 2022-05-10T13:35:22Z

I got charts! Here's disk write for non-tsdb indices:

It's a stacked line graph writes on all physical disks on the machine. So md0 above is basically the top most line. The first run has _recovery_source and the second one doesn't. It writes faster and hits the disk less hard.

nik9000 · 2022-05-10T13:38:37Z

Here's the TSDB run:

This time the second run has _recovery_source and the first one doesn't. It's the same picture thing - turning off _recovery_source hits the disk less hard and increases write speed.

ruslaniv · 2022-12-02T03:26:32Z

Is there any way to disable creation of _recovery_source because of this:
#82595 (comment)

nik9000 · 2022-12-02T21:43:48Z

We've talked a little about this - rebuilding the _source on the fly using synthetic _source. At the time we decided it wasn't worth it because folks were looking at doing other kinds of replication. I believe they are still working on that. In that replication mechanism we wouldn't need _recovery_source at all. That'd be lovely. No synthetic _source required. I still think that's a good plan.

ruslaniv · 2022-12-06T09:45:29Z

Nik, thank you for your answer!
Do you think the issue of "dangling" _recovery_source could be addressed in the near future? Right now this issue is causing our index to grow to 250Gb instead of 50Gb. Not only this is wasting 200Gb of disk space which is not critical, but the index no longer fits available RAM which is very critical.

nik9000 · 2022-12-06T12:40:09Z

Not only this is wasting 200Gb of disk space which is not critical, but the index no longer fits available RAM which is very critical.

Bleh. And _source is stored next to _id and friends so you'll end up paging it in even if you weren't intending to load it from disk. Lovely. It looks like @DaveCTurner is talking to you on the linked issue about the dangling _recovery_source which is a good sign. He should be able to figure out what's going on for you.

I do think _recovery_source is being used much more now - mostly because folks are removing dense vectors from the _source, but partly because of synthetic _source. I wouldn't be surprised if we found more "fun" things in it now - but it should work as he describes. I read a lot of that code when working on this. But computers are sneaky.

nik9000 added 12 commits March 30, 2022 10:31

Keyword

b324d95

Numbers

1570af9

Fetch

16f2250

Multi field tests

571b1b6

Text

63c3c73

Scaled float

3c07737

ip

9185bae

Wip2

4a1e015

Sorted

e872b47

nested and parent/child

84c7bde

Fixup

0a24a1d

Merge branch 'master' into synthetic_source_1

cf66fd7

elasticsearchmachine added the v8.3.0 label Apr 1, 2022

More paranoia

62d5ba2

nik9000 requested a review from romseygeek April 1, 2022 20:10

nik9000 added 2 commits April 1, 2022 16:22

Merge branch 'master' into synthetic_source_1

fe59f93

Malformed

da6daaf

Tests

eef696d

nik9000 added 7 commits April 5, 2022 15:17

fix test

01139af

Merge branch 'master' into synthetic_source_1

bc343eb

Update skip after versoin bump

d11e7af

reindex

85b9921

rolling upgrade

fb94d47

CCR

0413544

No copy to

b98424b

nik9000 added 2 commits April 6, 2022 15:03

Random scaling factor in float tests

1548dcd

Moar tests

b3986f0

Tests passing!

9e936f4

Two skipped

Merge branch 'master' into synthetic_source_1

df6c852

romseygeek reviewed May 6, 2022

View reviewed changes

nik9000 added 2 commits May 6, 2022 11:30

Merge branch 'master' into synthetic_source_1

314fe1d

Change tests

17d5d56

nik9000 added 2 commits May 9, 2022 14:08

Merge branch 'master' into synthetic_source_1

9bf5425

Remove empty buddy

03239bd

I didn't need him.

romseygeek approved these changes May 10, 2022

View reviewed changes

nik9000 merged commit a589456 into elastic:master May 10, 2022

nik9000 mentioned this pull request May 10, 2022

Synthetic Source #86603

Open

50 tasks

jsoriano mentioned this pull request May 10, 2022

event.duration takes a significative amount of disk space elastic/beats#31574

Open

jsoriano mentioned this pull request May 20, 2022

[Change Proposal] Add support for synthetic source elastic/package-spec#340

Closed

rockdaboot mentioned this pull request Jun 7, 2022

Enable prefix compression for arrays of values #85893

Closed

joshdover mentioned this pull request Jun 16, 2022

[Fleet] [Meta] Support for time series indexing, doc-value-only fields, and synthetic source elastic/kibana#132818

Closed

14 tasks

andresrc mentioned this pull request Jul 27, 2022

Fix synthetic property elastic/integrations#3848

Closed

dannycroft mentioned this pull request Aug 10, 2022

[APM] Make usage of _source compatible with synthetic source elastic/kibana#84507

Closed

neptunian mentioned this pull request Aug 24, 2022

[Infrastructure UI] investigate any impact of synthetic source in Infrastructure UI elastic/kibana#139391

Closed

kpollich mentioned this pull request Sep 6, 2022

[Fleet] Add UI toggle for synthetic _source to data streams elastic/kibana#140095

Closed

8 tasks

ruflin mentioned this pull request Dec 23, 2022

Remove event.duration and event.ingested from metric events elastic/integrations#4894

Open

reta mentioned this pull request May 8, 2024

Add capability to disable source recovery_source for an index opensearch-project/OpenSearch#13590

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic source #85649

Synthetic source #85649

nik9000 commented Apr 1, 2022 •

edited

Loading

nik9000 commented Apr 5, 2022

nik9000 commented Apr 5, 2022

nik9000 commented Apr 6, 2022

nik9000 commented May 5, 2022

romseygeek May 6, 2022

nik9000 commented May 9, 2022

nik9000 commented May 9, 2022 •

edited

Loading

romseygeek left a comment

nik9000 commented May 10, 2022 •

edited

Loading

nik9000 commented May 10, 2022

nik9000 commented May 10, 2022

ruslaniv commented Dec 2, 2022

nik9000 commented Dec 2, 2022

ruslaniv commented Dec 6, 2022

nik9000 commented Dec 6, 2022

Synthetic source #85649

Synthetic source #85649

Conversation

nik9000 commented Apr 1, 2022 • edited Loading

Supported fields

Educated guesses

sorted alphabetically

as "objecty" as possible

pushes all arrays to the "leaf" fields

sorts most array values

removes duplicate text and keyword values

_recovery_source

perf numbers

nik9000 commented Apr 5, 2022

nik9000 commented Apr 5, 2022

nik9000 commented Apr 6, 2022

nik9000 commented May 5, 2022

romseygeek May 6, 2022

Choose a reason for hiding this comment

nik9000 commented May 9, 2022

nik9000 commented May 9, 2022 • edited Loading

romseygeek left a comment

Choose a reason for hiding this comment

nik9000 commented May 10, 2022 • edited Loading

nik9000 commented May 10, 2022

nik9000 commented May 10, 2022

ruslaniv commented Dec 2, 2022

nik9000 commented Dec 2, 2022

ruslaniv commented Dec 6, 2022

nik9000 commented Dec 6, 2022

nik9000 commented Apr 1, 2022 •

edited

Loading

`_recovery_source`

nik9000 commented May 9, 2022 •

edited

Loading

nik9000 commented May 10, 2022 •

edited

Loading