Trillian batch operations with > 1k entries fail on MariaDB >= 10.3 #1845

mhutchinson · 2019-09-18T09:54:43Z

go test ./testonly/hammer/...
--- FAIL: TestInProcessMapHammer (9.77s)
hammer_test.go:129: hammer failure: failed to verify inclusion proof for Index=922cab9b7de0da6c6b65792d3030303030303234000000000000000000000000: calculated root: 3af78a4fc25a48d822fd41eed4d56b3afff53cf6ac4f603427120b9127eece9c, want: 8192c68b3ec424a648ef33ae4d3b970b7ff870f12aa6fca03214652d25363e61
FAIL
FAIL github.com/google/trillian/testonly/hammer 10.203s
? github.com/google/trillian/testonly/hammer/maphammer [no test files]
? github.com/google/trillian/testonly/hammer/mapreplay [no test files]
FAIL

Whatever else happens, we need to be explicit about which versions of mysql we support.

It's also worth seeing whether this version of MySQL can be supported without breaking backwards compatibility.

gdbelvin · 2019-09-18T16:50:33Z

Occasionally passing for me

go test ./testonly/hammer/
ok  	github.com/google/trillian/testonly/hammer	9.462s

mysql -V
mysql  Ver 8.0.17 for osx10.14 on x86_64 (Homebrew)

go env

 go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/gbelvin/Library/Caches/go-build"
GOENV="/Users/gbelvin/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/gbelvin/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/gbelvin/go/src/github.com/google/trillian/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/xq/1dznj60166q1qlzqry5c9c6h006zkw/T/go-build226338431=/tmp/go-build -gno-record-gcc-switches -fno-common"

This ensures that issues such as #1845 can't be introduced and be unnoticed.

mhutchinson · 2019-10-02T12:58:18Z

This works reliably for me with a min/max leaf batch of 100, but 150 fails reliably (admittedly on a small sample size):

HAMMER_OPTS="--operations=500 --min_leaves=150 --max_leaves=150 --invalid_chance=0" ./integration/maphammer.sh

gdbelvin · 2019-10-02T14:49:38Z

Same error around the wrong roots?

mhutchinson · 2019-10-02T14:51:31Z

Still fails to validate inclusion proofs. I'm looking into it now.

There is no reason in the real world that the number of leaves fetched by clients in a batch would have any relationship to the number of leaves written at a time. The single parameter was an artefact of the way the hammer was written. Allowing these to be configured separately better approximates the real world and allows the hammer to hit the map more flexibly. Making either of the read or write flags be above 100 triggers the issue in google#1845, though it does so differently for each of the batch sizes.

There is no reason in the real world that the number of leaves fetched by clients in a batch would have any relationship to the number of leaves written at a time. The single parameter was an artefact of the way the hammer was written. Allowing these to be configured separately better approximates the real world and allows the hammer to hit the map more flexibly. Making either of the read or write flags be above 100 triggers the issue in #1845, though it does so differently for each of the batch sizes.

mhutchinson · 2019-10-23T11:59:50Z

Confirmed that batches of > 100 work on MariaDB 10.1 and 10.2, but fail on 10.3 and 10.4.

This doesn't trigger google#1845 but that in itself tells us something.

This allows for useful diagnostics when digging into issues such as google#1845, providing the correct commandline args are provided. e.g: `go test -v ./storage/mysql/... -count 1 -args -stderrthreshold=INFO`

gdbelvin · 2019-10-24T08:53:02Z

Thanks for doing all this research. Nailing this down to particular DB versions is a big step forward. Props.

mhutchinson · 2019-10-24T08:59:27Z

This may be useful to other people needing to switch between DB versions to isolate problems. So I'm not the only one that acquires the dubious honour of knowing how to do this, I'm doing this using homebrew with the following set of steps. Note that in all cases I don't care about the DB contents. These steps should not be followed if you want a data-preserving upgrade/downgrade procedure:

brew services list # shows that, e.g. 10.2 is running
brew services stop [email protected]
rm -rf /usr/local/var/mysql
brew install [email protected] # or brew reinstall [email protected] if you've been here before
brew services start [email protected]
mysql -u root -e 'SELECT @@GLOBAL.version;' # confirm that you're running the server you think

Edit
Note that MariaDB 10.4 changes the security options quite a bit, so when using this version you need to do the following to get the previous behaviour of a passwordless root login (I'm not suggesting this is a good idea, but pointing out that this is required to maintain old behaviour):

sudo mysql -e 'ALTER USER root@localhost IDENTIFIED VIA mysql_native_password USING PASSWORD("");'

mhutchinson · 2019-10-24T18:29:54Z

Here's a pretty key finding: this works on the newer versions of MariaDB if --large_preload_fix=false, but not if --large_preload_fix=true. Following the chain of code down, and the observation that revisions 0 and 1 are always calculated correctly, I'm going to go out on a limb and suggest that the queries downstream of the MySQL impl of GetMerkleNodes are broken for newer MariaDB versions when the batch size of the nodes is large. It's likely that skipping the preload means that many smaller fetches are performed instead of a giant batch. But this needs confirming.

@pavelkalinnikov how likely is this preload to be changed by your ongoing work? I feel that I'm getting closer to root causing this now, but if it's all about to be thrown away then I'll save my time for something more useful.

pav-kv · 2019-10-24T20:37:18Z

Wow, this is a great finding! My work largely overlaps with this option indeed:

The PR Replace legacy sparse Merkle tree writer #1902 reimplements it entirely. However, I believe at this point the underlying queries and the exact nodes/tiles that are fetched don't change.
Follow-up PRs will drop the SubtreeCache machinery sitting in between the Merkle tree code and Get/SetMerkleNodes. The cache can do sneaky reads while writing stuff, and I actually believe that the highest chance of a bug is in there.
I have an action item to look at removing this option, i.e. --large_preload_fix=false by default. My current belief is that after my refactoring the large preload won't be needed.

It definitely makes sense to look at why it doesn't work now to then demonstrate that the issue was fixed and how. I am happy to play with it, and see if my interference is needed there.

Also, I have a totally made-up idea on why this might be failing. Could be a combination of the following:

A +-1 error in the revision number that the SubtreeCache/SMT code fetches/writes.
A thing that changes between MariaDB versions has something to do with how it reads "dirty" rows, e.g. one version could read from a snapshot while another reads the "dirty" data. Maybe the default isolation level is different, or smth like that.
A race or other condition causing multiple reads/writes of the same tile.

Not sure what's so magical about this exact hundred threshold...

pav-kv · 2019-10-24T21:17:28Z

Check this thing out introduced in MariaDB 10.3: https://mariadb.com/kb/en/library/conversion-of-big-in-predicates-into-subqueries/. Note the condition enabling this optimization: "the IN list has more than 1000 elements".

Our subtrees query uses a large IN statement, which means it might be impacted by this change.

The map is split into 11 strata levels:

trillian/storage/mysql/map_storage.go

Line 44 in 10e7f25

var defaultMapStrata = []int{8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 176}

The topmost stratum is queried separately, which means the paths in the bottom "shards" span exactly 10 strata. Now 100 random leaf updates probably span a little under 100 shards (birthday paradox might subtract 1 or 2 from that). Multiply that by 10 strata, and maybe do +1 for the root stratum, and we get just above the magical 1000 threshold.

Printing len(nodeIDs) here could help confirming this idea:

trillian/storage/mysql/tree_storage.go

Line 207 in 10e7f25

for _, nodeID := range nodeIDs {

Also, changing the strata layout should result in a different "magic threshold", e.g. for

var defaultMapStrata = []int{8, 8, 8, 8, 8, 8, 8, 8, 8, 184}

it should be ~112.

pav-kv · 2019-10-24T21:49:25Z

Confirmed.

This allows for useful diagnostics when digging into issues such as #1845, providing the correct commandline args are provided. e.g: `go test -v ./storage/mysql/... -count 1 -args -stderrthreshold=INFO`

These were intended to show that google#1845 could be easily reproduced with batches over 1k but they run fine. More investigation is needed, but more tests is always good.

mhutchinson · 2019-10-25T13:02:18Z

Failing test case! #1930

This will flush out issues like google#1845 faster and more precisely than relying on the hammer.

This will flush out issues like #1845 faster and more precisely than relying on the hammer.

These were intended to show that google#1845 could be easily reproduced with batches over 1k but they run fine. More investigation is needed, but more tests is always good.

mhutchinson · 2019-10-28T15:26:31Z

Updated the title: as the tests in #1930 demonstrate, this affects the log API too. The log storage layer can be coerced into failing in multiple ways through batch operations, but only one of these is wired to the gRPC API: QueueLeaves (when >= 1k duplicate entries are queued).

The root cause appears to be https://jira.mariadb.org/browse/MDEV-12176 which is still attracting discussion from the community. It's unclear how this optimization seems to be a functional change for all of our lookups. There is one report, allegedly fixed in 10.3.3, of behavioural changes, but all ongoing discussions are around the performance impact.

Getting further into this may be a waste of our resources. As a strawman, I suggest we remove MariaDB 10.3+ from Trillian supported configurations, unless we believe there is sufficient motivation to continue supporting this. Any fix will require updating the SQL, which may impact performance/correctness in other situations.

These show that google#1845 can be reproduced with batches over 1k on newer versions of MariaDB. Some of these tests are map specific, but the QueueLeaves test can be triggered via the Trillian gRPC API.

These show that #1845 can be reproduced with batches over 1k on newer versions of MariaDB. Some of these tests are map specific, but the QueueLeaves test can be triggered via the Trillian gRPC API. Failing tests are all marked to be skipped until #1845 is resolved.

These tests would likely fail if they were run for more operations. We have found that batch sizes of < 100 is reliable, so we should stick to that for stability and clarity.

pgporada · 2020-01-31T18:49:42Z

According to a MariaDB developer comment,

in_predicate_conversion_threshold now works on release builds too, this was fixed in https://jira.mariadb.org/browse/MDEV-16871. Should be available with 10.3.18 onwards.
You can set in_predicate_conversion_threshold to 0 to disable this optimization, this would be
a temporary workaround.

Unfortunately for us, MariaDB 10.3.18 isn't yet available in Amazon RDS yet. 🤕
I've sent a support request off to Amazon asking for a newer stable version of MariaDB 10.3.x to be added.

pav-kv · 2021-01-22T18:01:11Z

@pgporada Have you migrated to a newer MariaDB?

pav-kv · 2021-01-22T18:01:40Z

@mhutchinson Are there any more action items here?

mhutchinson · 2021-01-25T09:56:31Z

This was only ever known to cause issues with the DB-backed Map, which was experimental is now EOL (#2284). So I'm not planning more work here. Any map work will be on the experimental/batchmap from now on.

pgporada · 2021-02-11T22:32:25Z

We did upgrade to MariaDB 10.3.18 prior to me leaving for paternity leave (October 2020). RDS didn't allow us to change the optimizers that we needed and per a support email they had no intention of ever letting us do so. It would take an application change to allow setting in_predicate_conversion_threshold on a per query basis. I could figure out how to implement that.

pphaneuf · 2021-02-11T23:24:36Z

@pgporada: Is this issue affecting your (non-map) deployment?

pav-kv · 2021-02-12T10:50:20Z

This issue is activated when a query has an IN-clause with >= 1k entries. It was discovered in map tests, and was failing in tree storage on fetching many tiles [query]. Logs don't fetch that many tiles under any circumstances, it's always O(log2(tree_size)).

However, I can see some queries in logs that do have a variable-length IN-clause:

GetLeavesByIndex with many indices in the request. CTFE doesn't use it, instead it uses a better GetLeavesByRange.
GetLeavesByHash with many hashes in the request. CTFE doesn't use this RPC.
getLeafDataByIdentityHash. This one is activated on QueueLeaves queries RPC. CTFE doesn't use it with more than 1 leaf, so is not affected.

I believe, CT logs are not affected by this bug. As for non-CT uses of the log, we could add a limit on the number of entries to submit/query, for all of the 3 cases above.

pphaneuf · 2021-02-12T10:54:06Z

Thanks for the explanation, Pavel!

mhutchinson self-assigned this Sep 18, 2019

This was referenced Sep 18, 2019

Better error message if inclusion proof fails #1844

Merged

Refactor: use the verifying client in the hammer #1847

Merged

mhutchinson mentioned this issue Sep 18, 2019

Reduce the min/max leaves in the hammer test #1849

Merged

mhutchinson added a commit that referenced this issue Sep 30, 2019

Run integration tests on MariaDB 10.3 in Travis (#1876)

df0b5af

This ensures that issues such as #1845 can't be introduced and be unnoticed.

mhutchinson mentioned this issue Oct 22, 2019

Read and write batch sizes are configured separately #1923

Merged

mhutchinson changed the title ~~./testonly/hammer_test.go fails on mariadb 10.3.12~~ Map read/write batches > 100 can fail on MariaDB >= 10.3 Oct 23, 2019

mhutchinson added a commit to mhutchinson/trillian that referenced this issue Oct 23, 2019

Added tests for getting and setting large batches

f349108

This doesn't trigger google#1845 but that in itself tells us something.

mhutchinson added a commit to mhutchinson/trillian that referenced this issue Oct 23, 2019

Added tests for getting and setting large batches

f6e5343

This doesn't trigger google#1845 but that in itself tells us something.

This was referenced Oct 23, 2019

Log DB version at info level in storage tests #1926

Merged

Include full details if leaf inclusion proof fails #1927

Closed

mhutchinson mentioned this issue Oct 25, 2019

Added tests for MySQL large batch operations #1930

Merged

mhutchinson added a commit to mhutchinson/trillian that referenced this issue Oct 25, 2019

Run the MySQL storage tests in MariaDB integration tests

1774d83

This will flush out issues like google#1845 faster and more precisely than relying on the hammer.

mhutchinson mentioned this issue Oct 25, 2019

Run the MySQL storage tests in MariaDB integration tests #1931

Merged

mhutchinson added a commit that referenced this issue Oct 25, 2019

Run the MySQL storage tests in MariaDB integration tests (#1931)

b2bdaf7

This will flush out issues like #1845 faster and more precisely than relying on the hammer.

mhutchinson changed the title ~~Map read/write batches > 100 can fail on MariaDB >= 10.3~~ Trillian batch operations with > 1k entries fail on MariaDB >= 10.3 Oct 28, 2019

mhutchinson mentioned this issue Nov 8, 2019

Document supported storage backends #1953

Open

pav-kv closed this as completed Jan 25, 2021

pav-kv mentioned this issue Jan 27, 2021

Migrate CI from Travis to Cloud Build #2298

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trillian batch operations with > 1k entries fail on MariaDB >= 10.3 #1845

Trillian batch operations with > 1k entries fail on MariaDB >= 10.3 #1845

mhutchinson commented Sep 18, 2019

gdbelvin commented Sep 18, 2019

mhutchinson commented Oct 2, 2019

gdbelvin commented Oct 2, 2019

mhutchinson commented Oct 2, 2019

mhutchinson commented Oct 23, 2019

gdbelvin commented Oct 24, 2019

mhutchinson commented Oct 24, 2019 •

edited

Loading

mhutchinson commented Oct 24, 2019

pav-kv commented Oct 24, 2019

pav-kv commented Oct 24, 2019 •

edited

Loading

pav-kv commented Oct 24, 2019

mhutchinson commented Oct 25, 2019

mhutchinson commented Oct 28, 2019

pgporada commented Jan 31, 2020 •

edited

Loading

pav-kv commented Jan 22, 2021

pav-kv commented Jan 22, 2021

mhutchinson commented Jan 25, 2021

pgporada commented Feb 11, 2021 •

edited

Loading

pphaneuf commented Feb 11, 2021

pav-kv commented Feb 12, 2021

pphaneuf commented Feb 12, 2021

Trillian batch operations with > 1k entries fail on MariaDB >= 10.3 #1845

Trillian batch operations with > 1k entries fail on MariaDB >= 10.3 #1845

Comments

mhutchinson commented Sep 18, 2019

gdbelvin commented Sep 18, 2019

mhutchinson commented Oct 2, 2019

gdbelvin commented Oct 2, 2019

mhutchinson commented Oct 2, 2019

mhutchinson commented Oct 23, 2019

gdbelvin commented Oct 24, 2019

mhutchinson commented Oct 24, 2019 • edited Loading

mhutchinson commented Oct 24, 2019

pav-kv commented Oct 24, 2019

pav-kv commented Oct 24, 2019 • edited Loading

pav-kv commented Oct 24, 2019

mhutchinson commented Oct 25, 2019

mhutchinson commented Oct 28, 2019

pgporada commented Jan 31, 2020 • edited Loading

pav-kv commented Jan 22, 2021

pav-kv commented Jan 22, 2021

mhutchinson commented Jan 25, 2021

pgporada commented Feb 11, 2021 • edited Loading

pphaneuf commented Feb 11, 2021

pav-kv commented Feb 12, 2021

pphaneuf commented Feb 12, 2021

mhutchinson commented Oct 24, 2019 •

edited

Loading

pav-kv commented Oct 24, 2019 •

edited

Loading

pgporada commented Jan 31, 2020 •

edited

Loading

pgporada commented Feb 11, 2021 •

edited

Loading