Improve online_delete configuration and DB tuning: #3429

ximinez · 2020-06-02T22:50:34Z

Document delete_batch, back_off_milliseconds, age_threshold_seconds.
Convert those time values to chrono types.
Fix bug that ignored age_threshold_seconds.
Add a "recovery buffer" to the config that gives the node a chance to
recover before aborting online delete.
Add begin/end log messages around the SQL queries.
Add a new configuration section: [sqlite] to allow tuning the sqlite
database operations. Ignored on full/large history servers.
Update documentation of [node_db] and [sqlite] in the
rippled-example.cfg file.
Resolves ripple#3321

I added the "recovery buffer" function that wasn't described in the original issue to scratch my own itch, because my local machine was losing sync for a fraction of a second while the ledger was being copied over. This resolved that issue. I defaulted it to not run, so it doesn't cause unexpected behavior.

Note this includes two commits, one from @mtrippled . Both need to be reviewed.

This change is

evaluations for validated ledger age.

* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. * Resolves XRPLF#3321

MarkusTeufelberger · 2020-06-02T23:31:32Z

cfg/rippled-example.cfg

-online_delete=2000
+type=NuDB
+path=/var/lib/rippled/db/nudb
+online_delete=512


If such a small value is the default, please consider also adding a shard_db section with a limit of a few dozen/hundred GiB by default.

@MarkusTeufelberger Thanks for that feedback. Those default settings are definitely not yet set in stone. My understanding is that the suggestion for the smaller default was to make it easier for more people to participate without requiring huge resources. (The original ticket didn't have a lot of detail.) Of course, there's a balance to be found, because we already have pretty high hardware requirements, and we do want as many nodes as possible to contribute to history storage as a way to help the network.

I'm not sure if shards are ready to be set by default. I'll defer to @miguelportilla on that question.

MarkusTeufelberger · 2020-06-02T23:32:10Z

cfg/rippled-example.cfg

-# delete old ledgers while maintaining at least 2000. Do not require an
-# external administrative command to initiate deletion.
+# found at https://xrpl.org/capacity-planning.html#node-db-type
+# type=NuDB is recommended for non-validators with fast SSDs. Validators or


I would use a none database backend on a validator. They don't need to concern themselves with historic data at all.

The linked capacity planning page recommends RocksDB, but I can definitely see your point about using none (or memory). My biggest concern with using those options on the individual node level, is that it means it will take much longer to start and restart, because the node will have to download the entire ledger every time. On the global level, I'm thinking about a failsafe / worst case situation. For example, if every validator shuts down simultaneously 😱, they will need to have some data on disk so they can bootstrap restarting the network.

Anecdotally, my full history node takes about half an hour to start with --load vs. a few minutes at most to get synced with --net.

Bootstrapping can be done safely with --ledgerfile (https://xrpl.org/commandline-usage.html#initial-ledger-options) and that's probably better anyways a better way forward than trusting that data was stored to disk correctly in such a catastrophic scenario. The file itself could be hashed and communicated out-of-band in a lot of different ways compared to a node database that's not even a full shard in most cases.

@MarkusTeufelberger: re: slow start times with --load: are you using SSDs?

Of course. Over a dozen TB of 'em.

I think that --net should become the default now, unless a different command line option (for which I don't have a good name) is specified.

@nbougalis I think making --net the default is out of scope for this PR. Do you agree?

cfg/rippled-example.cfg

src/ripple/core/impl/Config.cpp

cfg/rippled-example.cfg

(Include these items in the main commit when squashing.) * Update rippled URLs in documentation. * Full history and fetch depth use max() value instead of an arbitrary value of 1 billion.

nbougalis

Left some comments. Still thinking about this.

nbougalis · 2020-06-05T02:26:31Z

src/ripple/core/DatabaseCon.h

 };

 template <std::size_t N, std::size_t M>
 DatabaseCon(
 Setup const& setup,
 std::string const& DBName,
+ bool useCommonPragma,


I don't know if I like this; I'm generally allergic to functions that accept a bool and use it to do one or the other thing. Why can't we set the right values in setup directly?

It would be pretty simple to set the values in the Setup object, now that you mention it. I might need to define a member function or two, just to make the internals easier, and to make it easier switch between the options for Setup objects that are used to create multiple DatabaseCons

PS. I went down this path because I initially thought the other ctor override would need to use the "common pragma" but without a Setup object being passed in. I later realized none of them do, but didn't backtrack to see if the bool was still a good idea. So thanks for pointing this out!

nbougalis · 2020-06-05T02:29:43Z

src/ripple/core/DatabaseCon.h

+ static std::unique_ptr<std::vector<std::string> const> CommonPragma;
+ /// Shortcut used by the database connections that ignore the common
+ /// pragma strings
+ static const std::vector<std::string> NoCommonPragma;


μNit: weird case for the name(s) here. Also confused as to why two separate things are needed.

A lot of other static class variables have capitalized names, but I don't necessarily have to keep doing it.

Edit: I looked again, and didn't find any non-const examples. So 🤷‍♂️ I'll make it more sensible.

nbougalis · 2020-06-05T02:41:17Z

src/ripple/app/main/DBInit.h

+// full history nodes. This is because such a large amount of data will
+// be more difficult to recover if a rare failure occurs, which are
+// more likely with some of the other available tuning settings.
+inline constexpr std::uint32_t SQLITE_TUNING_CUTOFF = 100'000'000;


I understand the rationale behind this, but I'm not sure if it's a good idea. I think that it's reasonable to warn in the configuration file and even on every startup about these options, but making the options entirely unavailable is not. Ironically, the existing logic that enforces this will prevent choosing options that are safer than the default even if the server operator really wants them.

I'll give this some more thought, but right now, it seems like a bad idea.

I can change it to warning log message(s) pretty easily, and update the config file.

* changed the sqlite ledger_history cutoff to log a warning instead of ignoring sqlite settings. updated documentation. * integrated the commonpragma back in to the setup struct and restored the interface of the databasecon ctors.

* involved moving some test logging subclasses around to be more easily accessible, and some minor refactoring of Env.

src/ripple/core/impl/DatabaseCon.cpp

mtrippled · 2020-06-08T22:30:22Z

src/ripple/app/main/Main.cpp

@@ -555,7 +556,9 @@ run(int argc, char** argv)
 session << "PRAGMA temp_store_directory=\"" << tmpPath.string()
 << "\";";
 session << "VACUUM;";
- session << "PRAGMA journal_mode=WAL;";


The existing behavior is that the journal_mode=OFF during the VACUUM activity introduces risk of corruption. Instead, I think the behavior of VACUUM should reflect the new config options and defaults for the txdb. Namely, use dbSetup.usePragma() (not noPragma()) and set the configs, then add the "PRAGMA temp_store_directory=" line and then execute VACUUM. Basically, treat the vacuum with the same "safety_mode" as normal operation.

Turns out that temp_store_directory is deprecated: https://www.sqlite.org/pragma.html#pragma_temp_store_directory.

It looks like you're right about the journal_mode: https://www.sqlite.org/lang_vacuum.html

The VACUUM command works by copying the contents of the database into a temporary database file and then overwriting the original with the contents of the temporary file. When overwriting the original, a rollback journal or write-ahead log WAL file is used just as it would be for any other database transaction.

So I'll update that.

mtrippled · 2020-06-08T23:15:18Z

src/ripple/app/misc/SHAMapStoreImp.cpp

@@ -348,23 +359,14 @@ SHAMapStoreImp::run()

 // will delete up to (not including) lastRotated
 if (validatedSeq >= lastRotated + deleteInterval_ &&
- canDelete_ >= lastRotated - 1)
+ canDelete_ >= lastRotated - 1 && !health())


The health() check is also performed a few lines later, during clearPrior(). Why introduce it here as well?

Primarily, it's to prevent writing the log message if no actual work is going to get done. Compare to what it was doing before: log the "rotating" message, then do a health() check, then clearPrior().

If any change should be made, I'd think it would be to take the first health() check out of clearPrior(). What do you think?

Yes, removing the first health() from clearPrior() makes sense.

mtrippled · 2020-06-08T23:18:13Z

src/ripple/app/misc/SHAMapStoreImp.cpp

@@ -378,14 +380,13 @@ SHAMapStoreImp::run()
 default:;
 }

+ JLOG(journal_.trace()) << "copying ledger " << validatedSeq;


May as well make this debug(), so we can see how long it takes to copy ledgers without setting the level to trace(). This is a very low-frequency message.

Yeah, that makes sense.

mtrippled · 2020-06-08T23:18:56Z

src/ripple/app/misc/SHAMapStoreImp.cpp


+ JLOG(journal_.trace()) << "freshening caches";


This may as well be debug() for the same reason as above.

mtrippled · 2020-06-08T23:21:43Z

src/ripple/app/misc/SHAMapStoreImp.cpp

+ assert(false);
+ return false;
+ }
+ JLOG(journal_.trace()) << "End: Look up lowest value of: " << minQuery;


Why have this check? Does it catch a non-existent table or some other error that makes checking if m is unset insufficient?

I honestly don't remember for sure why I added the if (auto db = database.checkoutDb()) check... I think it was so that it would go out of scope and release the lock sooner, and then I added the else because it "wouldn't hurt". It would make more sense to rewrite it as a simple local block.

mtrippled · 2020-06-08T23:34:21Z

src/ripple/app/misc/SHAMapStoreImp.cpp

+ else
+ {
+ assert(false);
+ return false;


No caller of clearSql() checks the return value. This function can return void instead.

Good catch! Will fix.

src/ripple/app/misc/SHAMapStoreImp.cpp

* Make `safety_level` and other tuning options mutually exclusive * Clean up VACUUM functionality and tuning to use settings. * Fix bugs in the config setup where it still ignored tuning settings if full history. Add tests to catch that mistake. * Fix some bugs in the unit tests, and make them more robustly check error messages.

* Change some log message levels. * clearSql no longer returns a bool that every caller ignores. * Restructure a few unnecessary validity checks

seelabs

Left some minor comments, but overall this looks good.

src/ripple/app/misc/NetworkOPs.cpp

cfg/rippled-example.cfg

src/ripple/core/DatabaseCon.h

src/ripple/app/misc/SHAMapStoreImp.cpp

src/ripple/core/impl/DatabaseCon.cpp

src/ripple/nodestore/impl/Shard.cpp

src/test/jtx/CheckMessageLogs.h

mtrippled · 2020-06-18T18:36:25Z

src/ripple/app/main/Main.cpp

@@ -552,9 +549,6 @@ run(int argc, char** argv)
 std::cout << "VACUUM beginning. page_size: " << pageSize
 << std::endl;

- session << "PRAGMA journal_mode=OFF;";
- session << "PRAGMA temp_store_directory=\"" << tmpPath.string()


The purpose of the temp_store_directory is to make sure there's enough space to perform a VACUUM, which essentially requires as much as the entire database being VACUUMed. The temp_store in memory will likely cause a system to run out of RAM. Imagine transaction DB with 2TB+. (https://sqlite.org/tempfiles.html)

Good catch! I'll change that.

mDuo13

I looked over the documentation changes. Definitely a big improvement over the previous version! I left a few optional suggestions to tune up the language in a few places, but 👍 with or without those changes.

mDuo13 · 2020-06-22T21:34:00Z

cfg/rippled-example.cfg

@@ -892,6 +939,12 @@
 # [import_db] Settings for performing a one-time import (optional)
 # [database_path] Path to the book-keeping databases.
 #
+# There are 4 or 5 bookkeeping SQLite databases that the server creates and


Why is it "4 or 5"? Is one of them only added with certain configurations? Without more context, the phrase "4 or 5" feels uncertain and vague. Maybe "4 to 5" would carry the right implication.

I think state.db is created if you're using online_delete, so that would be the 5th one. I'll make the change.

mDuo13 · 2020-06-22T21:36:13Z

cfg/rippled-example.cfg

+# maintains. If you omit this configuration setting, it will default to
+# creating a directory called "db" located in the same place as your
+# rippled.cfg file. Partial pathnames will be considered relative to
+# the location of the rippled executable.


nit: instead of "it will default to creating a directory", just say, "the server creates a directory." Similarly, instead of "will be considered relative" just say "are relative".

Changed and reworded to be less passive tone.

mDuo13 · 2020-06-22T21:41:20Z

cfg/rippled-example.cfg

-# external administrative command to initiate deletion.
+# found at https://xrpl.org/capacity-planning.html#node-db-type
+# type=NuDB is recommended for non-validators with fast SSDs. Validators or
+# slow / spinning disks should use RocksDB.


Consider adding a note like this:

Caution: Spinning disks are not recommended. They do not perform well enough to consistently remain synced to the network.

mDuo13 · 2020-06-22T21:44:42Z

src/ripple/app/ledger/Ledger.cpp

@@ -228,14 +228,14 @@ Ledger::Ledger(
 !txMap_->fetchRoot(SHAMapHash{info_.txHash}, nullptr))
 {
 loaded = false;
- JLOG(j.warn()) << "Don't have TX root for ledger";
+ JLOG(j.warn()) << "Don't have TX root for ledger" << info_.seq;


Nit: as long as we're changing these messages, we could make them clearer by removing the abbreviations "TX" (here) and "AS" (below, line 238). Instead, just say "transaction" and "state data".

* Rename retry_buffer_seconds to retry_wait_seconds. * Improve documentation and examples in example.cfg file. * Simplify the DatabaseCon::Setup pragma handling. * Other misc cleanups

seelabs

👍 I did leave one nit that I think we should resolve (the Base parameter on the DatabaseCon constructor). However since that's so minor I'll give a thumbs up now.

src/ripple/core/DatabaseCon.h

mtrippled · 2020-06-24T16:11:04Z

src/ripple/app/main/Main.cpp

+ // Only the most trivial databases will fit in memory on typical
+ // (recommended) software. Force temp files to be written to disk
+ // regardless of the config settings.
+ session << boost::format(CommonDBPragmaTemp) % "file";


The original behavior needs to come back--namely, that the actual temp directory be specified. Otherwise, SQLIte will by default use the directory in which the database is.

Alternately, I think that getting rid of this feature altogether makes a lot of sense. If somebody really wants to VACUUM a database, it's easier to use the command line tool.

mtrippled

LGTM
👍

* Improved some of the documentation and log messages * Removed the unnecessary `DatabaseCon::Base` ctor helper class

mDuo13 · 2020-06-24T21:51:08Z

New changes 👍 from me.

* Document delete_batch, back_off_milliseconds, age_threshold_seconds. * Convert those time values to chrono types. * Fix bug that ignored age_threshold_seconds. * Add a "recovery buffer" to the config that gives the node a chance to recover before aborting online delete. * Add begin/end log messages around the SQL queries. * Add a new configuration section: [sqlite] to allow tuning the sqlite database operations. Ignored on full/large history servers. * Update documentation of [node_db] and [sqlite] in the rippled-example.cfg file. Resolves #3321

mtrippled and others added 2 commits June 2, 2020 18:33

Make server_info report consistent with internal

063c3b8

evaluations for validated ledger age.

ximinez added the Documentation README changes, code comments, etc. label Jun 2, 2020

ximinez requested review from thejohnfreeman and mtrippled June 2, 2020 22:50

ximinez assigned thejohnfreeman and mtrippled Jun 2, 2020

MarkusTeufelberger reviewed Jun 2, 2020

View reviewed changes

ximinez added 2 commits June 3, 2020 12:19

[FOLD] Small cleanup that I missed when preparing the PR

4b40877

[FOLD] Clarify the description of 'recovery_buffer_seconds'

e42c1ee

nbougalis reviewed Jun 4, 2020

View reviewed changes

cfg/rippled-example.cfg Outdated Show resolved Hide resolved

src/ripple/core/impl/Config.cpp Outdated Show resolved Hide resolved

cfg/rippled-example.cfg Outdated Show resolved Hide resolved

ximinez added 2 commits June 4, 2020 13:53

[FOLD] Remove unused catch variable

f10c335

[FOLD] Review feedback from @nbougalis:

088d0af

(Include these items in the main commit when squashing.) * Update rippled URLs in documentation. * Full history and fetch depth use max() value instead of an arbitrary value of 1 billion.

nbougalis reviewed Jun 5, 2020

View reviewed changes

ximinez added 2 commits June 5, 2020 19:02

[fold] review feeback from @nbougalis 2:

6b96876

* changed the sqlite ledger_history cutoff to log a warning instead of ignoring sqlite settings. updated documentation. * integrated the commonpragma back in to the setup struct and restored the interface of the databasecon ctors.

[fold] add unit tests for log messages added last commit

18f5d07

* involved moving some test logging subclasses around to be more easily accessible, and some minor refactoring of Env.

ximinez requested a review from nbougalis June 8, 2020 20:45

mtrippled requested changes Jun 9, 2020

View reviewed changes

mtrippled reviewed Jun 9, 2020

View reviewed changes

src/ripple/app/misc/SHAMapStoreImp.cpp Show resolved Hide resolved

carlhua assigned gregtatcam and unassigned thejohnfreeman Jun 15, 2020

ximinez added 2 commits June 16, 2020 14:50

[FOLD] Review feedback from @mtravis, part 2:

10e02e6

* Change some log message levels. * clearSql no longer returns a bool that every caller ignores. * Restructure a few unnecessary validity checks

[FOLD] Remove an unnecessary health check

e060249

carlhua assigned seelabs and unassigned gregtatcam Jun 17, 2020

seelabs reviewed Jun 17, 2020

View reviewed changes

mtrippled reviewed Jun 18, 2020

View reviewed changes

mDuo13 reviewed Jun 22, 2020

View reviewed changes

[FOLD] Review feedback from @seelabs and @mtrippled:

c1fff2e

* Rename retry_buffer_seconds to retry_wait_seconds. * Improve documentation and examples in example.cfg file. * Simplify the DatabaseCon::Setup pragma handling. * Other misc cleanups

seelabs approved these changes Jun 24, 2020

View reviewed changes

src/ripple/core/DatabaseCon.h Outdated Show resolved Hide resolved

src/ripple/core/DatabaseCon.h Outdated Show resolved Hide resolved

carlhua requested a review from mtrippled June 24, 2020 12:24

mtrippled requested changes Jun 24, 2020

View reviewed changes

mtrippled approved these changes Jun 24, 2020

View reviewed changes

[FOLD] Review feedback from @mDuo13 and @seelabs:

2b01158

* Improved some of the documentation and log messages * Removed the unnecessary `DatabaseCon::Base` ctor helper class

carlhua added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jun 24, 2020

This was referenced Jun 25, 2020

Proposed 1.6.0-b8 #3459

Closed

Proposed 1.6.0-b8 #3461

Closed

Proposed 1.6.0-b8 #3462

Closed

Proposed 1.6.0-b8 #3483

Merged

manojsdoshi closed this in #3483 Jun 30, 2020

ximinez deleted the onlinedeleteconfig branch June 30, 2020 22:26

Improve online_delete configuration and DB tuning: #3429

Improve online_delete configuration and DB tuning: #3429

Conversation

ximinez commented Jun 2, 2020 • edited Loading

Choose a reason for hiding this comment

ximinez Jun 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbougalis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ximinez Jun 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seelabs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mDuo13 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seelabs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtrippled left a comment

Choose a reason for hiding this comment

mDuo13 commented Jun 24, 2020

ximinez commented Jun 2, 2020 •

edited

Loading

ximinez Jun 3, 2020 •

edited

Loading

ximinez Jun 5, 2020 •

edited

Loading