Skip to content
This repository has been archived by the owner on Jan 21, 2022. It is now read-only.

Performance

Hristo Iliev edited this page Apr 13, 2018 · 2 revisions

Overview

This page contains a summary of the Abacus scalability testing performed and the relevant conclusions and recommendations for running Abacus in production.

Test landscape

Abacus deployed on SAP Cloud Platform with MongoDB backend.

Functional correctness

  1. Setup current landscape
  2. Run perf test with:
    export DEBUG=abacus-perf-test
    export BATCH_SIZE=10
    export SECURED=true
    export OBJECT_STORAGE_CLIENT_ID=abacus-object-storage
    export OBJECT_STORAGE_CLIENT_SECRET=<secret>
    export SYSTEM_CLIENT_ID=abacus
    export SYSTEM_CLIENT_SECRET=<secret>
    
    cd cf-abacus/test/perf
    yarn run perf --collector https://abacus-usage-collector.<domain> --reporting https://abacus-usage-reporting.<domain> --auth-server https://api.<domain> --no-timestamp --num-executions 1 --limit 180 -x 7200000 --orgs 20000
  3. Scale out Abacus
  4. Run again the perf test to verify orgs reports are intact:
    yarn run perf --collector https://abacus-usage-collector.<domain> --reporting https://abacus-usage-reporting.<domain> --auth-server https://api.<domain> --no-timestamp --num-executions 2 --limit 180 -x 7200000 --orgs 20000

Note: Notice the --num-executions parameter values

Perf test

Running the test

unset DEBUG
export BATCH_SIZE=10
export SECURED=true
export OBJECT_STORAGE_CLIENT_ID=abacus-object-storage
export OBJECT_STORAGE_CLIENT_SECRET=<secret>
export SYSTEM_CLIENT_ID=abacus
export SYSTEM_CLIENT_SECRET=<secret>

cd cf-abacus/test/perf
yarn run perf --collector https://abacus-usage-collector.<domain> --reporting https://abacus-usage-reporting.<domain> --auth-server https://api.<domain> -x 720000-l 1980 --orgs 500 -i 4 -u 10

General observations

Submitting documents for one organization slows down the test since in aggregator we reduce organization data sequentially (we lock by orgId)

Large batch sizes (>10) reduces the number of parallel requests, since batches hit already loaded collectors and we cannot load-balance adequately. For example 2 requests with 100 batched docs are enough to produce 429 in the load test.

Compression reduces the docs/sec with 25%, so we disabled it.

The request size for collector:meter:accumulate:aggregate, grows in ratio 1:1.5:2:4 The CPU load is almost in the same ratio as the request size:

  • collector: 10.2-10.6%
  • meter: 25.6-27.5%
  • accumulate: 49.6-52%
  • aggregate: 66.5%-84.5%

CPU distribution after all optimization (disabled compression, reduced BATCH_SIZE, function caches) with 2200 parallel users and orgs/instances/docs 500/4/10 on 10-15-20-40 setup:

  • collector: 23%
  • meter: 34%
  • accumulator: 68%
  • aggregator: 18%

Increasing 2x the number of accumulators, decreases the CPU load also with a factor of 2.

MongoDB is not loaded:

  • 16xCPU loaded max 38%
  • 18% disk utilization
  • write locks average wait time 370 ms
  • read locks average wait time 171 ms

We normally see around 3x450 req/minute = 1250 req/min. This makes around 22.5 req/sec

The theoretical maximum is now 3 x 300 = 900 req/sec, however one request takes 11 seconds to be processed, so this reduces the throughput to 81 req/sec

Results

10 collector x 10 meter x 20 accumulator x 20 aggregator x 6 reporting apps x 6 provisioning x 6 account instances

orgs instances docs limit time [ms] doc/s Client remark Server remark
20000 1 1 180 152532 131,12 No batch
20000 1 1 180 162216 123.29 No batch
20000 1 1 180 166265 120.28 No batch
20000 1 1 180 372173 53.73 BATCH_SIZE=100
20000 1 1 180 366334 54.59 BATCH_SIZE=100
20000 1 1 180 372144 53.74 BATCH_SIZE=100
2000 2 5 180 370373 53.99 BATCH_SIZE=100
2000 2 5 180 379509 52.69 BATCH_SIZE=100
2000 2 5 180 367010 54.49 BATCH_SIZE=100
500 4 10 180 197879 101.07 BATCH_SIZE=10
500 4 10 180 201509 99.25 BATCH_SIZE=10
500 4 10 180 206702 96.75 BATCH_SIZE=10
500 4 10 180 377486 52.98 BATCH_SIZE=100
500 4 10 180 367390 54.43 BATCH_SIZE=100
500 4 10 180 369614 54.11 BATCH_SIZE=100
500 4 10 180 661005 30.25 BATCH_SIZE=200
500 4 10 180 641269 31.18 BATCH_SIZE=200
500 4 10 180 681034 29.36 BATCH_SIZE=200
500 4 10 180 175462 113.98 No batch
500 4 10 180 179036 111.70 No batch
500 4 10 180 175250 114.12 No batch
500 4 10 360 123700 161.68 No batch
500 4 10 360 138818 144.07 No batch
500 4 10 360 119805 166.93 No batch
500 4 10 720 99091 201.83 No batch
500 4 10 720 101584 196.88 No batch
500 4 10 720 109251 183.06 No batch
500 4 10 720 117555 170.13 BATCH_SIZE=10
500 4 10 720 102742 194.66 BATCH_SIZE=10
500 4 10 720 106917 187.06 BATCH_SIZE=10
500 4 10 720 BATCH_SIZE=100; status code 429
500 4 10 720 142010 140,83 BATCH_SIZE=10 THROTTLE=200
500 4 10 720 171417 116.67 BATCH_SIZE=10 THROTTLE=200
500 4 10 720 BATCH_SIZE=10 THROTTLE=200; OOM in meter, stack size exceeded in dataflow
500 4 10 720 146503 136.51 BATCH_SIZE=10 THROTTLE=200
500 4 10 900 111776 178.92 BATCH_SIZE=10
500 4 10 900 123097 162.47 BATCH_SIZE=10
500 4 10 900 112788 177.32 BATCH_SIZE=10

❗ limit: the number of parallel post & get requests to Abacus.

10 collector x 10 meter x 20 accumulator x 20 aggregator x 6 reporting apps x 10 provisioning x 10 account instances

orgs instances docs limit time [ms] doc/s Client remark Server remark
500 4 10 720 98666 202.70 BATCH_SIZE=1 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 720 84917 235.52 BATCH_SIZE=1 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 720 88237 226.66 BATCH_SIZE=1 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 720 111439 179.47 BATCH_SIZE=10 THROTTLE=50
500 4 10 720 124969 160.03 BATCH_SIZE=10 THROTTLE=50
500 4 10 720 117073 170.83 BATCH_SIZE=10 THROTTLE=50
500 4 10 720 98376 203.30 BATCH_SIZE=10 THROTTLE=100
500 4 10 720 108410 184.48 BATCH_SIZE=10 THROTTLE=100
500 4 10 720 108226 184.79 BATCH_SIZE=10 THROTTLE=100
500 4 10 720 108876 183.69 BATCH_SIZE=10 THROTTLE=100; BATCH_SIZE=10
500 4 10 720 112689 177.47 BATCH_SIZE=10 THROTTLE=100; BATCH_SIZE=10
500 4 10 720 157174 127.24 BATCH_SIZE=10 THROTTLE=100; BATCH_SIZE=10
500 4 10 720 113305 176.51 BATCH_SIZE=10 THROTTLE=200
500 4 10 720 115593 173.02 BATCH_SIZE=10 THROTTLE=200
500 4 10 720 117927 169.59 BATCH_SIZE=10 THROTTLE=200
500 4 10 720 157463 127.01 BATCH_SIZE=10 THROTTLE=100; DB_PARTITIONS=20
500 4 10 720 144918 138 BATCH_SIZE=10 THROTTLE=100; DB_PARTITIONS=20
500 4 10 360 131851 151.68 BATCH_SIZE=1 BATCH_SIZE=1; THROTTLE=100; DB_PARTITIONS=20
500 4 10 720 BATCH_SIZE=1 BATCH_SIZE=1; THROTTLE=100; DB_PARTITIONS=20 overload 429

BATCH_SIZE=10 improves the doc/sec
DB_PARTITIONS does not have effect over docs/sec

3 collector x 3 meter x 6 accumulator x 6 aggregator x 3 reporting apps x 2 provisioning x 2 account instances

orgs instances docs limit time [ms] doc/s Client remark Server remark
500 4 10 720 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1; overload: 429
500 4 10 180 266293 75.10 BATCH_SIZE=10 BATCH_SIZE=10, REPLAY=1, DB per layer
500 4 10 180 251337 79.57 BATCH_SIZE=10 BATCH_SIZE=10, REPLAY=1, DB per layer
500 4 10 180 266333 75.09 BATCH_SIZE=10 BATCH_SIZE=10, REPLAY=1, DB per layer
1500 4 10 180 784336 76.49 BATCH_SIZE=10 BATCH_SIZE=10, REPLAY=1, DB per layer
500 4 10 180 260056 76.90 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 180 257350 77.71 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 180 295680 67.64 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 360 230646 86.72 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 360 305125 65.54 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1
500 4 10 360 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1 overloaded 429
500 4 10 360 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1 overloaded 429
500 4 10 360 278069 71.92 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, DBOPTS={"poolSize": 2}
500 4 10 360 270547 73.92 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, DBOPTS={"poolSize": 2}
500 4 10 180 355448 56.26 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4
500 4 10 180 272724 73.33 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4
500 4 10 180 285557 70.03 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4
500 4 10 180 266749 74.97 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4
500 4 10 360 229161 87.27 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4
500 4 10 360 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4, overloaded 429
500 4 10 360 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4, overloaded 429
500 4 10 360 286702 69.75 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, MongoDB 3.4
500 4 10 180 218967 91.33 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
500 4 10 180 219563 91.09 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
500 4 10 180 218083 91.70 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
500 4 10 360 BATCH_SIZE=10 THROTTLE=100, BATCH_SIZE=10, REPLAY=1, no vm context (eval), overload (429)
500 4 10 180 229806 87.02 BATCH_SIZE=10 THROTTLE=200, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
500 4 10 180 224099 89.24 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
500 4 10 180 221548 90.27 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
500 4 10 180 220603 90.66 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, no vm context (eval)
20000 1 1 180 229375 87.19 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
20000 1 1 180 230327 86.83 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 180 246147 81.25 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 180 236740 84.48 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 360 246550 81.19 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 360 222361 89.94 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 720 228052 87.69 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 720 225333 88.75 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true
500 4 10 1440 BATCH_SIZE=10 THROTTLE=180, BATCH_SIZE=10, REPLAY=1, IGNORE_ORGANIZATION=true; overload (429)

Scaling out allows more parallel requests to wait for Abacus to finish
CPU usage:

  • collector: 10.2-10.6%
  • meter: 25.6-27.5%
  • accumulate: 49.6-52%
  • aggregate: 66.5%-84.5%

The response times for a single linux-container document / request on 3/3/6/6 conf:

  • collector
    • received: 550 bytes
    • sent: 0 bytes
    • time: 2.77777704s
  • meter:
    • received: 2279 bytes
    • sent: 471 bytes
    • time: 1.635493285s
  • accumulator:
    • received: 2464 bytes
    • sent: 480 bytes
    • time: 1.148580083s
  • aggregator:
    • received: 3040 bytes
    • sent: 398 bytes
    • time: 0.537826801s

For 3/3/6/6 without eval:

  • collector
    • received: 478 bytes
    • sent: 0 bytes
    • time: 2.456934781s
  • meter:
    • received: 2297 bytes
    • sent: 478 bytes
    • time: 1.532333325s
  • accumulator:
    • received: 2534 bytes
    • sent: 487 bytes
    • time: 0.955769924s
  • aggregator:
    • received: 3234 bytes
    • sent: 396 bytes
    • time: 0.528227499s

The relations of body size are 478:2297:2534:3234 = 1 : 4.8 : 1.1 : 1.27

For one submitted doc (object-store) we have 18 calls of the aggregation function:

agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 1 return value= 1
agrs =  0 0 100 return value= 100
agrs =  0 0 100 return value= 100
agrs =  0 0 100 return value= 100
agrs =  0 0 100 return value= 100
agrs =  0 0 100 return value= 100
agrs =  0 0 100 return value= 100

Caching the result of the computation gives us 3 sec improvement (16 sec over 19 sec) for 200 documents.

10 collector x 15 meter x 20 accumulator x 40 aggregator x 10 reporting apps x 10 provisioning x 10 account instances

orgs instances docs limit time [ms] doc/s Client remark Server remark
500 4 10 720 93615 213.64 BATCH_SIZE=10
500 4 10 720 111433 179.48 BATCH_SIZE=10
500 4 10 1440 BATCH_SIZE=10 overload: 429
500 4 10 1440 60270 330.84 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1440 63964 312.67 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1440 67759 295.16 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1620 69500 287.76 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1620 59768 334.62 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1800 69687 286.99 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1800 62333 320.85 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 1980 overload: 429
500 4 10 900 BATCH_SIZE=10 overload: 429
500 4 10 900 BATCH_SIZE=10 overload: 429
500 4 10 900 60252 331.93 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 900 69327 288.48 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 900 61787 323.69 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
500 4 10 900 137953 144.97 BATCH_SIZE=10 BATCH_SIZE=10
500 4 10 900 120780 165.59 BATCH_SIZE=10 BATCH_SIZE=10
20000 1 1 900 74439 268.67 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
20000 1 1 900 74661 267.87 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
20000 1 1 900 61930 322.94 BATCH_SIZE=10 BATCH_SIZE=10; MAX_INFLIGHT=400,300,200,100; OPTIMIZE_MEMORY=true; THROTTLE=100;
20000 1 1 900 106691 187.45 BATCH_SIZE=10 BATCH_SIZE=10
20000 1 1 900 87355 228.95 BATCH_SIZE=10 BATCH_SIZE=10, EVAL_VMTYPE=vm
20000 1 1 900 103616 193.02 BATCH_SIZE=10 BATCH_SIZE=10, EVAL_VMTYPE=vm
20000 1 1 900 79066 252.95 BATCH_SIZE=10 BATCH_SIZE=10, DBOPTS={"poolSize": 2}
20000 1 1 900 75954 263.31 BATCH_SIZE=10 BATCH_SIZE=10, DBOPTS={"poolSize": 2}
500 4 10 900 90990 219.80 BATCH_SIZE=10 BATCH_SIZE=10, DBOPTS={"poolSize": 2}
500 4 10 900 83552 239.37 BATCH_SIZE=10 BATCH_SIZE=10, DBOPTS={"poolSize": 2}
500 4 10 900 97367 205.40 BATCH_SIZE=10 BATCH_SIZE=10, DBOPTS={"poolSize": 5}
500 4 10 900 95553 209.30 BATCH_SIZE=10 BATCH_SIZE=10, DBOPTS={"poolSize": 5}
500 4 10 900 76387 261.82 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 900 105158 190.19 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 900 106043 188.60 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 900 80701 247.82 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, DBOPTS={"poolSize": 5}
500 4 10 1440 80701 232.19 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 1440 105597 189.39 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 1440 104675 191.06 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 1440 57675 346.77 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400,300,200,100; 2GB memory, IGNORE_ORGANIZATION=true
500 4 10 1440 54891 364.35 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400,300,200,100; 2GB memory, IGNORE_ORGANIZATION=true
500 4 10 2880 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, IGNORE_ORGANIZATION=true overload: 429
500 4 10 2880 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, IGNORE_ORGANIZATION=true overload: 429
500 4 10 2000 66993 298.53 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, IGNORE_ORGANIZATION=true
500 4 10 2000 58275 343.20 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, IGNORE_ORGANIZATION=true
500 4 10 2500 56811 353.04 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, IGNORE_ORGANIZATION=true
500 4 10 2500 57828 345.85 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, IGNORE_ORGANIZATION=true
500 4 10 2500 55186 362.41 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 2500 62319 320.92 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory
500 4 10 2500 48892 409.06 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache
500 4 10 2500 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache, overload: 429
500 4 10 2500 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache, overload: 429
500 4 10 2500 52167 383.38 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache
500 4 10 2000 50229 398.17 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache
500 4 10 2000 49978 400.17 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache
500 4 10 2200 49896 400.83 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache
500 4 10 2200 52465 381.20 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator cache
500 4 10 2500 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator & accumulator cache, overload (429)
500 4 10 2200 48566 411.81 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator & accumulator cache
500 4 10 2200 49957 400.34 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator & accumulator cache
500 4 10 2200 49622 403.04 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator & accumulator cache
500 4 10 2200 47464 421.37 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator, accumulator, meter cache
500 4 10 2200 47874 417.76 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator, accumulator, meter cache
500 4 10 2500 49985 400.12 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator, accumulator, meter cache
500 4 10 2500 47261 423.18 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator, accumulator, meter cache
500 4 10 2500 BATCH_SIZE=10 BATCH_SIZE=10, MAX_INFLIGHT=400, 2GB memory, aggregator, accumulator, meter cache, overload (429)

Aggregator is not a bottleneck

40 collector x 20 meter x 20 accumulator x 40 aggregator x 10 reporting apps x 10 provisioning x 10 account instances

orgs instances docs limit time [ms] doc/s Client remark Server remark
500 4 10 1440 96930 206.33 BATCH_SIZE=10
500 4 10 1440 78036 256.29 BATCH_SIZE=10
500 4 10 1440 100476 199.05 BATCH_SIZE=10
500 4 10 1440 73662 271.51 BATCH_SIZE=10 no compression
500 4 10 1440 67840 294.81 BATCH_SIZE=10 no compression
500 4 10 1440 77100 259.40 BATCH_SIZE=10 no compression
20000 1 1 1440 79866 250,42 BATCH_SIZE=10 no compression
20000 1 1 1440 79827 250,54 BATCH_SIZE=10 no compression
500 4 10 1960 76405 261,76 BATCH_SIZE=10 no compression
500 4 10 1960 BATCH_SIZE=10 overloaded 429 no compression
500 4 10 1960 BATCH_SIZE=10 overloaded 429 no compression

Mongo statistics:

  • 16xCPU loaded max 38%
  • 18% disk utilization
  • write locks average wait time 370 ms
  • read locks average wait time 171 ms

compression reduces the docs/sec with 25%

20 collector x 40 meter x 40 accumulator x 40 aggregator x 10 reporting apps x 10 provisioning x 10 account instances

orgs instances docs limit time [ms] doc/s Client remark Server remark
500 4 10 1440 76293 262.14 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1
500 4 10 1440 68662 291.28 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1
500 4 10 1440 71233 280.76 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1
500 4 10 2880 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1; overload 429
500 4 10 2880 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1; overload 429
500 4 10 2000 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1; overload 429
500 4 10 2000 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1; overload 429
500 4 10 1620 97010 206.16 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1
500 4 10 1620 97010 206.16 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1
500 4 10 1620 97010 206.16 BATCH_SIZE=10 BATCH_SIZE=10; REPLAY=1

db-common network traffic:

  • RX=2.18 MiB/s
  • TX=3.78 MiB/s

db-aggregator-0 traffic:

  • RX=1.71 MiB/s
  • TX=1.43 MiB/s

Bandwidth measured was 35 MiB/s: wget --output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

Clone this wiki locally