transaction logic improvement #12692

wilsonwang371 · 2021-02-16T20:31:48Z

server: change Txn() to use one transaction to avoid txn.End() extra overhead.

based on our observations, txn operation is taking a lot of cpu time compared with normal put operations. with some profiling, we found it is because of txn.End() operation. To avoid extra overhead, we should use one transaction inside Txn(). With this change, the benchmark txn-put operation performance can get a significant improvement.

codecov-io · 2021-02-16T21:57:45Z

Codecov Report

Merging #12692 (e177b82) into master (09f6b7f) will increase coverage by 1.08%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #12692      +/-   ##
==========================================
+ Coverage   63.18%   64.27%   +1.08%     
==========================================
  Files         449      409      -40     
  Lines       62349    32833   -29516     
==========================================
- Hits        39397    21104   -18293     
+ Misses      18381     9593    -8788     
+ Partials     4571     2136    -2435

Flag	Coverage Δ
all	`64.27% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
client/v2/auth_role.go	`0.00% <ø> (ø)`
client/v2/auth_user.go	`0.00% <ø> (ø)`
client/v2/cancelreq.go	`100.00% <ø> (ø)`
client/v2/client.go	`43.54% <ø> (ø)`
client/v2/cluster_error.go	`50.00% <ø> (ø)`
client/v2/curl.go	`10.00% <ø> (ø)`
client/v2/discover.go	`0.00% <ø> (ø)`
client/v2/json.go	`32.00% <ø> (ø)`
client/v2/keys.go	`64.77% <ø> (ø)`
client/v2/members.go	`43.39% <ø> (ø)`
... and 716 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a4570a6...e177b82. Read the comment docs.

ptabor · 2021-02-17T08:13:01Z

Thank you Wilson, the number that you posted on etcd-dev@ looks impressive.
Would be good to post them here as well.

I'm not familiar with this code, but from scanning the logic, I understand that
we change a pair of RO->RW transactions into just RW transactions if the transaction potentially mutates any resource.
The tradeoff is that RW transaction takes exclusive lock so blocks all concurrent RO transactions.
So please read: #10523
I'm not saying its not worthwhile, but we should also consider performance impact on read-intensive scenarios.

I also wonder whether this impact overlaps with: #12529.

wilsonwang371 · 2021-02-17T19:18:36Z

The original post is here:
https://groups.google.com/g/etcd-dev/c/qiQ1V_iMPKo/m/7OfHsr68BAAJ

We are working on k8s cluster scaling project. Apiserver uses etcd as the backend database. The apiserver storage Create/Update/Delete operations are based on Txn operation.

However, with some evaluation comparison between TXN-PUT and PUT using etcd benchmark tool, we see that the performance penalty for TXN-PUT is very significant.

Our machine's configuration is

S09-I6VD1-US

Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
2 Sockets, 64 VCPUs

256GB

NVMe SSD-2T* 2

Mellanox CX-5

With PUT benchmark, the QPS can reach as much as 30K. However, with TXN-PUT benchmark, the performance is only around 16K at maximum.

Here is a graph of the detail. X-axis is the number of connections and clients(they are using the same value). Y-axis is the size of the Value of the KV pair (Key is always 256bytes size).

I added a new bool flag in the etcd PUT operation, called "IsCreate". When it is TRUE, that means, the put operation will check if the key is already there. If the key is already there, the applier will fail the operation and return an error.

With this change, I can see significant performance improvement in Create operations. Here is another plot that demonstrated the performance improvement percentage after switching from TXN-PUT to PUT when doing a Create operation. We can see that we can get a performance improvement to as much as 160%.

With some further investigation, we found the overhead is at txn.End() in applier's Txn() function. In the latest pull request, the Txn() write operation speed can increase as much as around 160% too.

wilsonwang371 · 2021-02-17T20:12:54Z

Thank you Wilson, the number that you posted on etcd-dev@ looks impressive.
Would be good to post them here as well.

I'm not familiar with this code, but from scanning the logic, I understand that
we change a pair of RO->RW transactions into just RW transactions if the transaction potentially mutates any resource.
The tradeoff is that RW transaction takes exclusive lock so blocks all concurrent RO transactions.
So please read: #10523
I'm not saying its not worthwhile, but we should also consider performance impact on read-intensive scenarios.

I also wonder whether this impact overlaps with: #12529.

Hi Piotr,

Very good point, I think #12529 could be the root cause of the extra overhead in Txn(). But I didn't evaluate this yet.

At the same time, this again makes me wondering if some flags and arguments in GRPC call can make some very common db operations such as CREAT/UPDATE/DELETE can help user improve their application performance.

When we are using Txn(), we don't know in advance how many if-then-else are there. We cannot assume if user is using it in the way we expected or not.

Wilson

wilsonwang371 · 2021-04-06T21:49:58Z

@ptabor
Hi Piotr,

I am thinking if it is better to have another message called TxnRequestInternal which holds some preprocessed values so that depending on the preprocessed value we run an optimized path for the Txn().

Wilson

ptabor · 2021-04-06T22:03:51Z

What's the motivation to have another message ?

readability
avoiding computing the same intrinsic multiple times

Looking at the applierV3 interface there is no prior art of using different type than the 'public' ones, that
make the bar for precedence relatively high.

If the reason is readability, you can implement additional methods in api/etcdserverpb package directly, such that:
txt.isReadonly() has natural look & fill.

wilsonwang371 · 2021-04-06T22:43:11Z

What's the motivation to have another message ?

readability

avoiding computing the same intrinsic multiple times

Looking at the applierV3 interface there is no prior art of using different type than the 'public' ones, that
make the bar for precedence relatively high.

If the reason is readability, you can implement additional methods in api/etcdserverpb package directly, such that:
txt.isReadonly() has natural look & fill.

The reason I am suggesting a new type of message is because I want to do Txn preprocessing and in the current TxnRequest implementation there is no extra field to save the preprocessed information. If we process the Txn information when it arrives to applierV3, it might be too late to do TxnRequest optimizations.

Thanks

Wilson

ptabor · 2021-04-07T18:17:17Z

If we process the Txn information when it arrives to applierV3, it might be too late to do TxnRequest optimizations.

Do you have an actual scenario when its too late ? Are there missing data to make the decision ?

wilsonwang371 · 2021-04-07T19:46:21Z

If we process the Txn information when it arrives to applierV3, it might be too late to do TxnRequest optimizations.

Do you have an actual scenario when its too late ? Are there missing data to make the decision ?

Good point. This assumption is based on my knowledge that the requests are asynchronous and the applierv3 applies Txn synchronously. I assume some preprocessing inside the Txn rpc request context will be able to save more execution time than doing this inside applierv3.

What do you think?

Thanks

Wilson

wilsonwang371 · 2021-04-09T05:00:36Z

If we process the Txn information when it arrives to applierV3, it might be too late to do TxnRequest optimizations.

Do you have an actual scenario when its too late ? Are there missing data to make the decision ?

Hi Piotr,

I made some change to the code and I am going to get some new performance result and will later share with you about the result.

What I did was checking the number of operations in the Txn and if it is only one check and one real operations. We will use write txn instead of using read txn and later switch to write txn.

Wilson

ptabor · 2021-04-13T09:10:54Z

Thank you. It's a good comment that the more we check/precompute before the RAFT, the higher performance we can get during apply that's critical. Ideally internal requests should be different from the public to be able to easily evolve the system without fear of backward compatiblity / cross-version compatibility risks. But we are not (yet) there.

In practice: I think in-memory access to inspect proto content should be not meaningful time in contrast to Backend writes.

wilsonwang371 · 2021-04-13T16:47:25Z

With the latest change, I add some logic so that if the transaction has at most 1 operation in success, failure & compare, we will skip a read transaction and directly use a write transaction.

I evaluated the performance and we still can see a performance improvement.

I may later do some mixed read/write testing.

wilsonwang371 · 2021-04-17T04:04:52Z

I did some further evaluation. I modified txn-put so that it can run mixed transaction read and transaction write with a predefined ratio.
Ratio=Read/Write

I plotted the result with ratio ranging from 0.125(mostly write) to 8(mostly read).

The red & blue plot are the difference between my patch and the master branch. We can see that we get most of the performance improvement during mostly write case and in mostly read cases we didn't get a lot of improvement. (This is what we expected and thank god my patch did not affect the read transactions much.)

wilsonwang371 · 2021-04-19T16:54:48Z

@ptabor Piotr,

Can you take a look at the latest evaluation and give some suggestions?

Thanks

Wilson

server/etcdserver/v3_server.go

ptabor · 2021-04-19T17:28:40Z

Thank you, Wilson

Forgive me a question: What's the unit of the charts value ?
Also I'm curious: What tool was used to generate them and why the values are represented as triangles (instead of squares) ?

wilsonwang371 · 2021-04-19T18:06:03Z

Thank you, Wilson

Forgive me a question: What's the unit of the charts value ?
Also I'm curious: What tool was used to generate them and why the values are represented as triangles (instead of squares) ?

I was using python tripcolor to plot the graph.

Here is my code.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv ('mixed-master.csv')

df['avg'] = df.iloc[:, 3: ].sum(axis=1)/5.0
bigdf = df.iloc[:, [0,1,2,8]]

plt.figure(figsize=(16, 20)) 
count = 1
for val, df in bigdf.groupby('ratio'):
    plt.subplot(3, 2, count)
    count += 1
    plt.tripcolor(df.iloc[:, 1], df.iloc[:, 2], df.iloc[:, 3])
    plt.title('Master Branch Mixed-Txn Ratio {} Performance'.format(
        val))
    plt.yscale('log',base=2)
    plt.ylabel('Value Size')
    plt.xscale('log',base=2)
    plt.xlabel('Connections Amount')
    plt.colorbar()
plt.show()


df2 = pd.read_csv('mixed-txn-optimized.csv')

df2['avg'] = df2.iloc[:, 3: ].sum(axis=1)/5.0
bigdf2 = df2.iloc[:, [0,1,2,8]]

plt.figure(figsize=(16, 20)) 
count = 1
for val, df in bigdf2.groupby('ratio'):
    plt.subplot(3, 2, count)
    count += 1
    plt.tripcolor(df2.iloc[:, 1],df2.iloc[:, 2], df2.iloc[:, 3])
    plt.title('Optimized Mixed-Txn Ratio {} Performance'.format(
        val))
    plt.yscale('log',base=2)
    plt.ylabel('Value Size')
    plt.xscale('log',base=2)
    plt.xlabel('Connections Amount')
    plt.colorbar()
plt.show()


bigdf2['minus'] = bigdf2.iloc[:, 3] - bigdf.iloc[:, 3]

plt.figure(figsize=(16, 20)) 
count = 1
for val, df2 in bigdf2.groupby('ratio'):
    plt.subplot(3, 2, count)
    count += 1
    plt.tripcolor(df2.iloc[:, 1], df2.iloc[:, 2], df2.iloc[:, 4], cmap=plt.get_cmap('seismic'))
    plt.title('(Optimized Txn - Master) Mixed-Txn Ratio {} Performance'.format(val))
    plt.yscale('log',base=2)
    plt.ylabel('Value Size')
    plt.xscale('log',base=2)
    plt.xlabel('Connections Amount')
    plt.colorbar()
plt.show()

And here are the result files.

mixed-txn-optimized.csv
mixed-master.csv

ptabor · 2021-04-19T18:35:03Z

server/etcdserver/v3_server.go

@@ -203,18 +204,21 @@ func isTxnSerializable(r *pb.TxnRequest) bool {
 return true
 }

-func isTxnReadonly(r *pb.TxnRequest) bool {
+func isTxnReadonlyAndSimple(r *pb.TxnRequest) (bool, bool) {


thank you for the rename. Still please comment what does 'simple' mean.

I changed this function back and moved the logic out.

ptabor · 2021-04-19T18:38:32Z

Got it. Thank you

Could you please plot the relative improvement: optimized / master ?
bigdf2['minus'] = bigdf2.iloc[:, 3] / bigdf.iloc[:, 3]
Final check: what happens for read / write ratio = 100 [ so strongly read-dominated servers] ?

…overhead. Signed-off-by: Wilson Wang <[email protected]>

wilsonwang371 · 2021-04-19T19:58:00Z

Got it. Thank you

Could you please plot the relative improvement: optimized / master ?
bigdf2['minus'] = bigdf2.iloc[:, 3] / bigdf.iloc[:, 3]

Final check: what happens for read / write ratio = 100 [ so strongly read-dominated servers] ?

This is the result of

bigdf2['minus'] = bigdf2.iloc[:, 3] / bigdf.iloc[:, 3]

Regarding the 2nd question, I will post updates when new data is available.

wilsonwang371 · 2021-04-20T02:34:40Z

Got it. Thank you

Could you please plot the relative improvement: optimized / master ?
bigdf2['minus'] = bigdf2.iloc[:, 3] / bigdf.iloc[:, 3]

Final check: what happens for read / write ratio = 100 [ so strongly read-dominated servers] ?

Regarding the second question. I did the evaluation and plotted:

bigdf2.iloc[:, 3] / bigdf.iloc[:, 3]

Yes, we can have performance degradation in certain cases and the worst is around 90% percent of the master branch performance.

Btw, I forgot to mention that my mixed read/write testing is actually using range request to get a large range response in its read operations. This is the case where we are supposed to get the worst performance.

mixed-master-100.csv
mixed-txn-optimized-100.csv

ptabor · 2021-04-23T09:44:36Z

Thank you for checking, No clear win unfortunately. But from what I've seen K8s clusters have the Range/Txn ratio of ~4,
so for k8s workload positive.

That probably means we would need to flaggate the new behavior.
If we were to build recommendation when its worth to disable the flag, would be the read/write ratio >50 ?

wilsonwang371 · 2021-04-26T01:18:29Z

Thank you for checking, No clear win unfortunately. But from what I've seen K8s clusters have the Range/Txn ratio of ~4,
so for k8s workload positive.

That probably means we would need to flaggate the new behavior.
If we were to build recommendation when its worth to disable the flag, would be the read/write ratio >50 ?

Hi Piotr,

How do you think if we use ReadTx() instead of ConcurrentReadTx() in applier's Txn()?

In this way, we will not deepcopy the buffer and at the same time, we are not taking a write lock while doing read operations in applier's Txn().

ptabor requested a review from jingyih February 17, 2021 08:13

ptabor added the area/performance label Feb 25, 2021

ptabor added the rel/3.5-nice-to-have label Mar 30, 2021

wilsonwang371 force-pushed the profiling-txn branch from 69f2540 to 9e6ac4b Compare April 9, 2021 04:47

wilsonwang371 force-pushed the profiling-txn branch 2 times, most recently from d7ff8d0 to 49ffd6d Compare April 13, 2021 04:16

ptabor reviewed Apr 19, 2021

View reviewed changes

server/etcdserver/v3_server.go Outdated Show resolved Hide resolved

wilsonwang371 force-pushed the profiling-txn branch from 49ffd6d to da5f3f3 Compare April 19, 2021 18:14

ptabor reviewed Apr 19, 2021

View reviewed changes

server: change Txn() to use one transaction to avoid txn.End() extra …

87189b8

…overhead. Signed-off-by: Wilson Wang <[email protected]>

wilsonwang371 force-pushed the profiling-txn branch 2 times, most recently from b9a28b5 to 87189b8 Compare April 19, 2021 19:35

ptabor approved these changes Apr 19, 2021

View reviewed changes

wilsonwang371 mentioned this pull request Apr 26, 2021

server: make applier use ReadTx() in Txn() instead of ConcurrentReadTx() #12896

Merged

wilsonwang371 closed this May 6, 2021

wilsonwang371 deleted the profiling-txn branch May 6, 2021 21:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transaction logic improvement #12692

transaction logic improvement #12692

wilsonwang371 commented Feb 16, 2021

codecov-io commented Feb 16, 2021

ptabor commented Feb 17, 2021

wilsonwang371 commented Feb 17, 2021

wilsonwang371 commented Feb 17, 2021

wilsonwang371 commented Apr 6, 2021

ptabor commented Apr 6, 2021

wilsonwang371 commented Apr 6, 2021

ptabor commented Apr 7, 2021

wilsonwang371 commented Apr 7, 2021

wilsonwang371 commented Apr 9, 2021

ptabor commented Apr 13, 2021

wilsonwang371 commented Apr 13, 2021

wilsonwang371 commented Apr 17, 2021

wilsonwang371 commented Apr 19, 2021

ptabor commented Apr 19, 2021 •

edited

Loading

wilsonwang371 commented Apr 19, 2021

ptabor Apr 19, 2021

wilsonwang371 Apr 19, 2021

ptabor commented Apr 19, 2021 •

edited

Loading

wilsonwang371 commented Apr 19, 2021

wilsonwang371 commented Apr 20, 2021 •

edited

Loading

ptabor commented Apr 23, 2021

wilsonwang371 commented Apr 26, 2021

transaction logic improvement #12692

transaction logic improvement #12692

Conversation

wilsonwang371 commented Feb 16, 2021

codecov-io commented Feb 16, 2021

Codecov Report

ptabor commented Feb 17, 2021

wilsonwang371 commented Feb 17, 2021

wilsonwang371 commented Feb 17, 2021

wilsonwang371 commented Apr 6, 2021

ptabor commented Apr 6, 2021

wilsonwang371 commented Apr 6, 2021

ptabor commented Apr 7, 2021

wilsonwang371 commented Apr 7, 2021

wilsonwang371 commented Apr 9, 2021

ptabor commented Apr 13, 2021

wilsonwang371 commented Apr 13, 2021

wilsonwang371 commented Apr 17, 2021

wilsonwang371 commented Apr 19, 2021

ptabor commented Apr 19, 2021 • edited Loading

wilsonwang371 commented Apr 19, 2021

ptabor Apr 19, 2021

Choose a reason for hiding this comment

wilsonwang371 Apr 19, 2021

Choose a reason for hiding this comment

ptabor commented Apr 19, 2021 • edited Loading

wilsonwang371 commented Apr 19, 2021

wilsonwang371 commented Apr 20, 2021 • edited Loading

ptabor commented Apr 23, 2021

wilsonwang371 commented Apr 26, 2021

ptabor commented Apr 19, 2021 •

edited

Loading

ptabor commented Apr 19, 2021 •

edited

Loading

wilsonwang371 commented Apr 20, 2021 •

edited

Loading