Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Higher refresh count for segrep enabled indices #7296

Closed
dreamer-89 opened this issue Apr 24, 2023 · 3 comments
Closed
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making distributed framework Indexing Indexing, Bulk Indexing and anything related to indexing

Comments

@dreamer-89
Copy link
Member

As part of benchmarking for Segment replication GA #5147 release, the benchmark results show higher number for backgroud refreshes and merge count (possibly side-effect of first). This difference increases with more data ingestion

Post run completion
Primary shard: ~34gb
Total size: ~70gb

Infra setup
10 m5.xlarge data nodes, 200GB storage, 8 GB JVM
3 cluster manager nodes

   
Metric   Document Replication Segment Replication Diff %
         
Cumulative indexing time of primary shards min 178.935 206.136 13.20%
Min cumulative indexing time across primary shard min 1.49993 1.76382 14.96%
Median cumulative indexing time across primary shard min 1.74738 2.07351 15.73%
Max cumulative indexing time across primary shard min 2.0436 2.35758 13.32%
Cumulative indexing throttle time of primary shards min 0 0 #DIV/0!
Min cumulative indexing throttle time across primary shard min 0 0 #DIV/0!
Median cumulative indexing throttle time across primary shard min 0 0 #DIV/0!
Max cumulative indexing throttle time across primary shard min 0 0 #DIV/0!
Cumulative merge time of primary shards min 30.3864 119.524 74.58%
Cumulative merge count of primary shards   77 6751 98.86%
Min cumulative merge time across primary shard min 0 0.8594 100.00%
Median cumulative merge time across primary shard min 0.30523 1.12452 72.86%
Max cumulative merge time across primary shard min 0.85787 1.84643 53.54%
Cumulative merge throttle time of primary shards min 8.13938 11.7186 30.54%
Min cumulative merge throttle time across primary shard min 0 0.04043 100.00%
Median cumulative merge throttle time across primary shard min 0.10155 0.1112 8.68%
Max cumulative merge throttle time across primary shard min 0.20813 0.28073 25.86%
Cumulative refresh time of primary shards min 19.6532 83.47 76.45%
Cumulative refresh count of primary shards   1493 37564 96.03%
Min cumulative refresh time across primary shard min 0.15265 0.69105 77.91%
Median cumulative refresh time across primary shard min 0.19396 0.84068 76.93%
Max cumulative refresh time across primary shard min 0.27413 0.97848 71.98%
Cumulative flush time of primary shards min 2.50282 0.72735 -244.10%
Cumulative flush count of primary shards   200 200 0.00%
Min cumulative flush time across primary shard min 0.00327 0.00152 -115.38%
Median cumulative flush time across primary shard min 0.02454 0.00483 -408.64%
Max cumulative flush time across primary shard min 0.06605 0.04875 -35.49%
Total Young Gen GC time s 94.818 26.689 -255.27%
Total Young Gen GC count   2600 1657 -56.91%
Total Old Gen GC time s 0 0 #DIV/0!
Total Old Gen GC count   0 0 #DIV/0!
Store size GB 104.083 104.158 0.07%
Translog size GB 69.4723 73.6275 5.64%
Heap used for segments MB 0 0 #DIV/0!
Heap used for doc values MB 0 0 #DIV/0!
Heap used for terms MB 0 0 #DIV/0!
Heap used for norms MB 0 0 #DIV/0!
Heap used for points MB 0 0 #DIV/0!
Heap used for stored fields MB 0 0 #DIV/0!
Segment count   1713 1807 5.20%
Min Throughput docs/s 151404 167391 9.55%
Mean Throughput docs/s 158532 172685 8.20%
Median Throughput docs/s 154796 172995 10.52%
Max Throughput docs/s 166173 174655 4.86%
50th percentile latency ms 407.682 444.745 8.33%
90th percentile latency ms 844.261 556.603 -51.68%
99th percentile latency ms 1820.86 693.143 -162.70%
99.9th percentile latency ms 3080.93 825.769 -273.10%
99.99th percentile latency ms 3919.41 893.44 -338.69%
100th percentile latency ms 4200.82 897.265 -368.18%
50th percentile service time ms 407.682 444.745 8.33%
90th percentile service time ms 844.261 556.603 -51.68%
99th percentile service time ms 1820.86 693.143 -162.70%
99.9th percentile service time ms 3080.93 825.769 -273.10%
99.99th percentile service time ms 3919.41 893.44 -338.69%
100th percentile service time ms 4200.82 897.265 -368.18%
error rate % 0 0 0.00%
@dreamer-89 dreamer-89 added discuss Issues intended to help drive brainstorming and decision making untriaged distributed framework Indexing Indexing, Bulk Indexing and anything related to indexing labels Apr 24, 2023
@dreamer-89
Copy link
Member Author

The benchmark comparison for so dataset also shows p50, p90 indexing latency hit by ~20%, ~40% respectively for segment replication.

Cluster setup

  1. 10 data node
  2. 3 cluster manager node
  3. m5.xlarge instance type.

Steps:

  1. Create docrep index with below settings
 {
    "settings": {
        "index": {
            "number_of_shards": 40,
            "number_of_replicas": 1,
            "replication.type": "DOCUMENT"
        }
    }
}
  1. Ingest data using
opensearch-benchmark execute_test --pipeline benchmark-only --workload so --target-hosts "$host:80" --distribution-version=2.7.0 --include-tasks index-append --results-file=~/.benchmark/so_first_segrep --show-in-results=all --kill-running-processes;
  1. Delete index and repeat for segrep.

One sample comparison below, baseline: docrep, contender: segrep

ubuntu@ip-172-31-49-145:~/OpenSearch-Benchmark$ opensearch-benchmark compare --baseline=8c33cbc6-6f0b-4e77-840e-4b16a0aabcc8  --contender=f615b039-6598-44f5-aab6-467d2671a6ba

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


Comparing baseline
  TestExecution ID: 8c33cbc6-6f0b-4e77-840e-4b16a0aabcc8
  TestExecution timestamp: 2023-04-25 23:57:26
  TestProcedure: append-no-conflicts
  ProvisionConfigInstance: external

with contender
  TestExecution ID: f615b039-6598-44f5-aab6-467d2671a6ba
  TestExecution timestamp: 2023-04-26 00:13:05
  TestProcedure: append-no-conflicts
  ProvisionConfigInstance: external

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                        Metric |         Task |   Baseline |   Contender |     Diff |   Unit |
|--------------------------------------------------------------:|-------------:|-----------:|------------:|---------:|-------:|
|                    Cumulative indexing time of primary shards |              |    115.845 |     137.057 |  21.2119 |    min |
|             Min cumulative indexing time across primary shard |              |     2.4691 |     2.74135 |  0.27225 |    min |
|          Median cumulative indexing time across primary shard |              |    2.85135 |     3.51008 |  0.65873 |    min |
|             Max cumulative indexing time across primary shard |              |    3.56195 |     4.11007 |  0.54812 |    min |
|           Cumulative indexing throttle time of primary shards |              |          0 |           0 |        0 |    min |
|    Min cumulative indexing throttle time across primary shard |              |          0 |           0 |        0 |    min |
| Median cumulative indexing throttle time across primary shard |              |          0 |           0 |        0 |    min |
|    Max cumulative indexing throttle time across primary shard |              |          0 |           0 |        0 |    min |
|                       Cumulative merge time of primary shards |              |          0 |     103.464 |  103.464 |    min |
|                      Cumulative merge count of primary shards |              |          0 |        1421 |     1421 |        |
|                Min cumulative merge time across primary shard |              |          0 |     1.64495 |  1.64495 |    min |
|             Median cumulative merge time across primary shard |              |          0 |     2.45208 |  2.45208 |    min |
|                Max cumulative merge time across primary shard |              |          0 |      3.7272 |   3.7272 |    min |
|              Cumulative merge throttle time of primary shards |              |          0 |     37.8906 |  37.8906 |    min |
|       Min cumulative merge throttle time across primary shard |              |          0 |    0.315733 |  0.31573 |    min |
|    Median cumulative merge throttle time across primary shard |              |          0 |    0.837842 |  0.83784 |    min |
|       Max cumulative merge throttle time across primary shard |              |          0 |     1.71068 |  1.71068 |    min |
|                     Cumulative refresh time of primary shards |              |    27.2749 |      51.285 |  24.0101 |    min |
|                    Cumulative refresh count of primary shards |              |        459 |        7021 |     6562 |        |
|              Min cumulative refresh time across primary shard |              |   0.529367 |    0.871567 |   0.3422 |    min |
|           Median cumulative refresh time across primary shard |              |   0.693942 |     1.35739 |  0.66345 |    min |
|              Max cumulative refresh time across primary shard |              |   0.839817 |     1.61723 |  0.77742 |    min |
|                       Cumulative flush time of primary shards |              |     3.0851 |    0.900667 | -2.18443 |    min |
|                      Cumulative flush count of primary shards |              |         40 |          92 |       52 |        |
|                Min cumulative flush time across primary shard |              |     0.0184 |      0.0064 |   -0.012 |    min |
|             Median cumulative flush time across primary shard |              |  0.0753083 |     0.01775 | -0.05756 |    min |
|                Max cumulative flush time across primary shard |              |    0.14295 |     0.06245 |  -0.0805 |    min |
|                                       Total Young Gen GC time |              |     30.535 |        7.63 |  -22.905 |      s |
|                                      Total Young Gen GC count |              |       1011 |         531 |     -480 |        |
|                                         Total Old Gen GC time |              |          0 |           0 |        0 |      s |
|                                        Total Old Gen GC count |              |          0 |           0 |        0 |        |
|                                                    Store size |              |    69.5802 |     98.3018 |  28.7217 |     GB |
|                                                 Translog size |              |    30.4832 |     28.3262 | -2.15695 |     GB |
|                                        Heap used for segments |              |          0 |           0 |        0 |     MB |
|                                      Heap used for doc values |              |          0 |           0 |        0 |     MB |
|                                           Heap used for terms |              |          0 |           0 |        0 |     MB |
|                                           Heap used for norms |              |          0 |           0 |        0 |     MB |
|                                          Heap used for points |              |          0 |           0 |        0 |     MB |
|                                   Heap used for stored fields |              |          0 |           0 |        0 |     MB |
|                                                 Segment count |              |        647 |         915 |      268 |        |
|                                                Min Throughput | index-append |    65101.5 |     75422.1 |  10320.6 | docs/s |
|                                               Mean Throughput | index-append |    66831.3 |     78430.6 |  11599.4 | docs/s |
|                                             Median Throughput | index-append |    66893.6 |     78448.2 |  11554.6 | docs/s |
|                                                Max Throughput | index-append |    67664.2 |     80842.5 |  13178.3 | docs/s |
|                                       50th percentile latency | index-append |    400.337 |     465.148 |   64.811 |     ms |
|                                       90th percentile latency | index-append |    555.599 |     762.042 |  206.443 |     ms |
|                                       99th percentile latency | index-append |    7831.66 |     1478.02 | -6353.63 |     ms |
|                                     99.9th percentile latency | index-append |     9041.4 |     2005.12 | -7036.28 |     ms |
|                                      100th percentile latency | index-append |    9376.91 |     2674.49 | -6702.41 |     ms |
|                                  50th percentile service time | index-append |    400.337 |     465.148 |   64.811 |     ms |
|                                  90th percentile service time | index-append |    555.599 |     762.042 |  206.443 |     ms |
|                                  99th percentile service time | index-append |    7831.66 |     1478.02 | -6353.63 |     ms |
|                                99.9th percentile service time | index-append |     9041.4 |     2005.12 | -7036.28 |     ms |
|                                 100th percentile service time | index-append |    9376.91 |     2674.49 | -6702.41 |     ms |
|                                                    error rate | index-append |          0 |           0 |        0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

@mch2
Copy link
Member

mch2 commented May 22, 2023

The root cause here is the same as #7423 (comment). We intentionally disable shard idle with SR enabled indices because replicas cannot independently refresh themselves. The balance here is to bump up refresh interval with SR to slow refresh/merges and write larger segments per refresh cycle at a trade-off of replication lag.

@mch2
Copy link
Member

mch2 commented May 25, 2023

Closing this issue as we know why this is happening and is expected with lower refresh interval and segrep. Tracking zero replica optimization here.

@mch2 mch2 closed this as completed May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making distributed framework Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
Status: Done
Development

No branches or pull requests

3 participants