-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transaction logic improvement #12692
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12692 +/- ##
==========================================
+ Coverage 63.18% 64.27% +1.08%
==========================================
Files 449 409 -40
Lines 62349 32833 -29516
==========================================
- Hits 39397 21104 -18293
+ Misses 18381 9593 -8788
+ Partials 4571 2136 -2435
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Thank you Wilson, the number that you posted on etcd-dev@ looks impressive. I'm not familiar with this code, but from scanning the logic, I understand that I also wonder whether this impact overlaps with: #12529. |
The original post is here: We are working on k8s cluster scaling project. Apiserver uses etcd as the backend database. The apiserver storage Create/Update/Delete operations are based on Txn operation. However, with some evaluation comparison between TXN-PUT and PUT using etcd benchmark tool, we see that the performance penalty for TXN-PUT is very significant. Our machine's configuration is S09-I6VD1-US Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz 256GB NVMe SSD-2T* 2 Mellanox CX-5 With PUT benchmark, the QPS can reach as much as 30K. However, with TXN-PUT benchmark, the performance is only around 16K at maximum. Here is a graph of the detail. X-axis is the number of connections and clients(they are using the same value). Y-axis is the size of the Value of the KV pair (Key is always 256bytes size). I added a new bool flag in the etcd PUT operation, called "IsCreate". When it is TRUE, that means, the put operation will check if the key is already there. If the key is already there, the applier will fail the operation and return an error. With this change, I can see significant performance improvement in Create operations. Here is another plot that demonstrated the performance improvement percentage after switching from TXN-PUT to PUT when doing a Create operation. We can see that we can get a performance improvement to as much as 160%. With some further investigation, we found the overhead is at txn.End() in applier's Txn() function. In the latest pull request, the Txn() write operation speed can increase as much as around 160% too. |
Hi Piotr, Very good point, I think #12529 could be the root cause of the extra overhead in Txn(). But I didn't evaluate this yet. At the same time, this again makes me wondering if some flags and arguments in GRPC call can make some very common db operations such as CREAT/UPDATE/DELETE can help user improve their application performance. When we are using Txn(), we don't know in advance how many if-then-else are there. We cannot assume if user is using it in the way we expected or not. Wilson |
@ptabor I am thinking if it is better to have another message called TxnRequestInternal which holds some preprocessed values so that depending on the preprocessed value we run an optimized path for the Txn(). Wilson |
What's the motivation to have another message ?
Looking at the If the reason is readability, you can implement additional methods in |
The reason I am suggesting a new type of message is because I want to do Txn preprocessing and in the current TxnRequest implementation there is no extra field to save the preprocessed information. If we process the Txn information when it arrives to applierV3, it might be too late to do TxnRequest optimizations. Thanks Wilson |
Do you have an actual scenario when its too late ? Are there missing data to make the decision ? |
Good point. This assumption is based on my knowledge that the requests are asynchronous and the applierv3 applies Txn synchronously. I assume some preprocessing inside the Txn rpc request context will be able to save more execution time than doing this inside applierv3. What do you think? Thanks Wilson |
69f2540
to
9e6ac4b
Compare
Hi Piotr, I made some change to the code and I am going to get some new performance result and will later share with you about the result. What I did was checking the number of operations in the Txn and if it is only one check and one real operations. We will use write txn instead of using read txn and later switch to write txn. Wilson |
d7ff8d0
to
49ffd6d
Compare
Thank you. It's a good comment that the more we check/precompute before the RAFT, the higher performance we can get during apply that's critical. Ideally internal requests should be different from the public to be able to easily evolve the system without fear of backward compatiblity / cross-version compatibility risks. But we are not (yet) there. In practice: I think in-memory access to inspect proto content should be not meaningful time in contrast to Backend writes. |
With the latest change, I add some logic so that if the transaction has at most 1 operation in success, failure & compare, we will skip a read transaction and directly use a write transaction. I evaluated the performance and we still can see a performance improvement. I may later do some mixed read/write testing. |
I did some further evaluation. I modified txn-put so that it can run mixed transaction read and transaction write with a predefined ratio. I plotted the result with ratio ranging from 0.125(mostly write) to 8(mostly read). The red & blue plot are the difference between my patch and the master branch. We can see that we get most of the performance improvement during mostly write case and in mostly read cases we didn't get a lot of improvement. (This is what we expected and thank god my patch did not affect the read transactions much.) |
@ptabor Piotr, Can you take a look at the latest evaluation and give some suggestions? Thanks Wilson |
Thank you, Wilson Forgive me a question: What's the unit of the charts value ? |
I was using python tripcolor to plot the graph. Here is my code. import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv ('mixed-master.csv')
df['avg'] = df.iloc[:, 3: ].sum(axis=1)/5.0
bigdf = df.iloc[:, [0,1,2,8]]
plt.figure(figsize=(16, 20))
count = 1
for val, df in bigdf.groupby('ratio'):
plt.subplot(3, 2, count)
count += 1
plt.tripcolor(df.iloc[:, 1], df.iloc[:, 2], df.iloc[:, 3])
plt.title('Master Branch Mixed-Txn Ratio {} Performance'.format(
val))
plt.yscale('log',base=2)
plt.ylabel('Value Size')
plt.xscale('log',base=2)
plt.xlabel('Connections Amount')
plt.colorbar()
plt.show()
df2 = pd.read_csv('mixed-txn-optimized.csv')
df2['avg'] = df2.iloc[:, 3: ].sum(axis=1)/5.0
bigdf2 = df2.iloc[:, [0,1,2,8]]
plt.figure(figsize=(16, 20))
count = 1
for val, df in bigdf2.groupby('ratio'):
plt.subplot(3, 2, count)
count += 1
plt.tripcolor(df2.iloc[:, 1],df2.iloc[:, 2], df2.iloc[:, 3])
plt.title('Optimized Mixed-Txn Ratio {} Performance'.format(
val))
plt.yscale('log',base=2)
plt.ylabel('Value Size')
plt.xscale('log',base=2)
plt.xlabel('Connections Amount')
plt.colorbar()
plt.show()
bigdf2['minus'] = bigdf2.iloc[:, 3] - bigdf.iloc[:, 3]
plt.figure(figsize=(16, 20))
count = 1
for val, df2 in bigdf2.groupby('ratio'):
plt.subplot(3, 2, count)
count += 1
plt.tripcolor(df2.iloc[:, 1], df2.iloc[:, 2], df2.iloc[:, 4], cmap=plt.get_cmap('seismic'))
plt.title('(Optimized Txn - Master) Mixed-Txn Ratio {} Performance'.format(val))
plt.yscale('log',base=2)
plt.ylabel('Value Size')
plt.xscale('log',base=2)
plt.xlabel('Connections Amount')
plt.colorbar()
plt.show() And here are the result files. |
49ffd6d
to
da5f3f3
Compare
server/etcdserver/v3_server.go
Outdated
@@ -203,18 +204,21 @@ func isTxnSerializable(r *pb.TxnRequest) bool { | |||
return true | |||
} | |||
|
|||
func isTxnReadonly(r *pb.TxnRequest) bool { | |||
func isTxnReadonlyAndSimple(r *pb.TxnRequest) (bool, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the rename. Still please comment what does 'simple' mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this function back and moved the logic out.
Got it. Thank you
|
…overhead. Signed-off-by: Wilson Wang <[email protected]>
b9a28b5
to
87189b8
Compare
This is the result of
Regarding the 2nd question, I will post updates when new data is available. |
Regarding the second question. I did the evaluation and plotted:
Yes, we can have performance degradation in certain cases and the worst is around 90% percent of the master branch performance. Btw, I forgot to mention that my mixed read/write testing is actually using range request to get a large range response in its read operations. This is the case where we are supposed to get the worst performance. |
Thank you for checking, No clear win unfortunately. But from what I've seen K8s clusters have the Range/Txn ratio of ~4, That probably means we would need to flaggate the new behavior. |
Hi Piotr, How do you think if we use ReadTx() instead of ConcurrentReadTx() in applier's Txn()? In this way, we will not deepcopy the buffer and at the same time, we are not taking a write lock while doing read operations in applier's Txn(). |
server: change Txn() to use one transaction to avoid txn.End() extra overhead.
based on our observations, txn operation is taking a lot of cpu time compared with normal put operations. with some profiling, we found it is because of txn.End() operation. To avoid extra overhead, we should use one transaction inside Txn(). With this change, the benchmark txn-put operation performance can get a significant improvement.