Implement history shards (RIPD-1289) #2258

miguelportilla · 2017-11-04T01:13:00Z

Introduce ledger history shards.

Ripple sharding is a way of distributing historical data. Rippled can be easily configured to begin storing ledger history in shards. Shard data is automatically shared with peers via the ledger acquire process. It is worth noting that shards do not replace the node store. Although redundant, it is entirely possible to hold full history in the node store and the shard store. However, an effective configuration might limit the node store only to recent history. With the current implementation, each shard stores 2^14 ledgers, which may not be optimal. Determining a suitable value will involve carefully considering different aspects of the process. For one, the node store history size should at minimum be twice the ledgers per shard. Reason being that the current shard may be chosen to be stored and it would be wasteful to reacquire the data. Another consideration is the unit of work performed when the server decides to retain the current shard. The time to acquire, number of file handles, and memory cache usage is also affected.

Acquiring shards begins after synchronizing with the network and ledger backfilling. During this time of lower network activity, the shard database may be asked to select a shard to acquire. If a shard is selected, the ledger acquire process begins with the sequence of the last ledger in the shard and works backward to the first. Shards will continue to be acquired until the maximum allocated disk space for shards is reached at which point the current shard may replace an older shard.

Like the node store, the shard store derives from the base class Database. The common interface facilitates replacing the node store when storing or fetching from various places in the project. For instance, synchronization filters are used by the inbound ledger process to save data received from peers to a store. Specifying the database object for the filter allows us to choose where the data is stored. Similarly, SHAMaps are told through a Family, which database to use when fetching or storing. The derived stores themselves rely on the commonality when copying ledgers from one store to another or validating their state and transaction trees.

The following is a simplified high-level explanation of the process involved to store shards.

LedgerMaster asks DatabaseShard for a ledger to acquire. If a ledger is requested, InboundLedgers is asked to acquire it. If it is not stored locally, InboundLedgers asks the Overlay to request the ledger from peers. As data is received from peers, it is assembled and stored. Once all data pertaining to the ledger has been received, the DatabaseShard is notified.

The focus of this review should take place in the ‘Implement DatabaseShard’ commit. I apologize in advance for its size but slicing it smaller proved to be difficult as the dependencies are well interleaved. I will do my best to answer questions and provide clarity where needed.

sublimator · 2017-11-04T04:10:14Z

Interesting!

bachase

Partial review progress, focusing on Shard and DatabaseShard. Generally looks good, but it would be nice to have unit tests covering those classes.

Looks like the AppVeyor build failed due to missing boost serialization lib.

bachase · 2017-11-06T19:58:11Z

CMakeLists.txt

 else()
+ set(BOOST_ROOT ${Boost_INCLUDE_DIRS})


Was moving this set intentional?

I can't find a reason for this change, will revert.

bachase · 2017-11-07T01:09:05Z

src/ripple/nodestore/impl/Shard.cpp

+//------------------------------------------------------------------------------
+/*
+ This file is part of rippled: https:/ripple/rippled
+ Copyright (c) 2012, 2013 Ripple Labs Inc.


Nit: stale dates?

bachase · 2017-11-07T01:09:53Z

src/ripple/nodestore/impl/Shard.cpp

+ int cacheAge, beast::Journal& j)
+ : index_(index)
+ , firstSeq_(std::max(genesisSeq, detail::firstSeq(index)))
+ , lastSeq_(detail::lastSeq(index))


Should this std::max(genesisSeq, detail::lastSeq(index))?

yes, should be std::max(firstSeq_, detail::lastSeq(index))

bachase · 2017-11-07T01:11:59Z

src/ripple/nodestore/impl/Shard.cpp

+namespace NodeStore {
+
+Shard::Shard(std::uint32_t index, int cacheSz,
+ int cacheAge, beast::Journal& j)


Should cacheAge be chrono::duration?

The TaggedCache ctor takes a rep and converts it to a duration. I'll change shard to the same type but the TaggedCache ctor is beyond the PR scope.

bachase · 2017-11-07T01:14:55Z

src/ripple/nodestore/impl/Shard.cpp

+ return false;
+ }
+
+ if (backend_->fdlimit() == 0)


What does an fdlimit of 0 mean? Its not a failure in opening the shard?

Some of the backends do not use files and therefore have no file handle requirements. eg. MemoryBackend, NullBackend. They can be used in a unit test.

bachase · 2017-11-08T19:31:20Z

src/ripple/nodestore/impl/DatabaseShardImp.h

+
+private:
+ Application& app_;
+ mutable std::mutex m_;


I don't see all member functions taking this lock (init, prepare). Do only some members need protection?

prepare takes the lock. There are a couple of private functions that assume the lock is held and that is noted in the comments. init doesn't as it should only be called once and before anything else. I will lock and throw if erroneously called more than once. Similar thing for validate. Thanks!

👍 One option that I learned from @scottschurr is to have those other private functions take the lock by reference to make it clearer for readers that the lock was already held. I trust the code more than the comments ;). Not necessary, just for your consideration.

Thanks for the suggestion, will add.

bachase · 2017-11-08T19:32:22Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ return {};
+ }
+ ledger->setFull();
+ //ledger->setImmutable(app_.config());


Is this commented line as intended?

Unnecessary, removed.

bachase · 2017-11-08T19:38:52Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ return fetchInternal(hash, *backend);
+}
+
+// Lock must be held


I don't see the lock acquired prior to calling.

findShardIndexToAdd is called from prepare. The first line in prepare takes the lock.

bachase · 2017-11-08T19:47:31Z

src/ripple/nodestore/DatabaseShard.h

+std::uint32_t
+seqToShardIndex(std::uint32_t const seq)
+{
+ return (seq - 1) / ledgersPerShard;


Not a sequence number we should see, but if seq=0 this will not return the expected result.

True, I'll add an assert.

bachase · 2017-11-08T19:48:38Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ assert(numShards <= maxShardIndex + 1);
+
+ // If equal, have all the shards
+ if (numShards >= maxShardIndex + 1)


This comparison seems inconsistent with the assert above.

nbougalis

I’m still working through this. Left some minor comments, will focus on the logic next.

nbougalis · 2017-11-10T17:56:36Z

src/ripple/app/ledger/impl/LedgerMaster.cpp

- prevMissing(mCompleteLedgers, mPubLedger->info().seq);
+ ScopedLockType sl(mCompleteLock);
+ missing = prevMissing(mCompleteLedgers,
+ mPubLedger->info().seq, std::uint32_t(32600));


This should be a symbolic constant, like earliestAvailableLedger or somesuch.

nbougalis · 2017-11-10T21:54:35Z

src/ripple/app/main/Application.cpp

+ "Invalid [shard_db] configuration";
+ return false;
+ }
+ shardStore_->validate();


Shouldn’t this just return a true or false?

I am not sure a boolean return would be very useful. One or more shards may fail to validate and that is logged but it is up to the user to take action. In the unlikely event that a shard is corrupt, they must delete the shard, same as the node store. If it is missing a node (should never happen on a complete shard), it will auto heal through SHAMap triggering the acquire of the missing nodes.

nbougalis · 2017-11-10T21:59:10Z

src/ripple/nodestore/DatabaseNode.h

+namespace ripple {
+namespace NodeStore {
+
+class DatabaseNode : public Database


The purpose of this class is unclear to me. Why do we need it?

We don't, will fix.

nbougalis · 2017-11-10T22:02:16Z

src/ripple/nodestore/DatabaseShard.h

+ */
+ virtual
+ bool
+ hasLedger(std::uint32_t seq) = 0;


No change needed, just food for thought. Should we prefer generic names like contains over something like hasLedger?

Definitely.

bachase

Finished first pass on all the changes. Looks great overall.

bachase · 2017-11-13T16:22:10Z

src/ripple/app/ledger/LedgerMaster.h

@@ -180,7 +181,7 @@ class LedgerMaster
 LedgerIndex ledgerIndex);

 boost::optional <NetClock::time_point> getCloseTimeByHash (
- LedgerHash const& ledgerHash);
+ LedgerHash const& ledgerHash, std::uint32_t index);


Consider changing std::uint32_t index to LedgerIndex index to match the rest of this file.

bachase · 2017-11-13T16:29:32Z

src/ripple/app/ledger/impl/LedgerMaster.cpp

+ {
+ JLOG(m_journal.trace())
+ << "fetchForHistory want fetch pack " << missing;
+ fetch_seq_ = missing;


Just trying to follow the larger flow here, but does fetch_seq_ = missing imply only one ledger is being fetched at a time?

fetch_seq simply tracks the ledger sequence for the last fetch pack requested. Its purpose is to prevent requesting the same fetch more than once. Generally, more than one ledger is fetched at the same time. For instance, if the ledger isn't present locally and there isn't a fetch pack available, we request the prior ten ledgers at once.

So if someone calls

fetchForHistory(2); fetchForHistory(1); fetchForHistory(2); fetchForHistory(1);

Does that means multiple fetches will happen for the same ledger?

I am glad you asked as it made me realize fetchForHistory should be private! The code in the function was originally in doAdvance and I broke it out as it was too tall. DoAdvance is called from the jobQueue and it can and will call fetchForHistory with a prior sequence. That is by design and it won't hurt anything as inboundledgers::acquire will maintain only one fetch operation per ledger. So multiple fetches will not happen for the same ledger no matter how many times its called.

bachase · 2017-11-13T16:37:43Z

src/ripple/app/ledger/impl/InboundLedgers.cpp

- if (inbound && inbound->isComplete ())
- return inbound->getLedger();
- return {};
+ if (inbound->isFailed())


Previously there was a check that inbound was not null. No longer needed?

Not needed.

bachase · 2017-11-13T16:48:02Z

src/ripple/app/ledger/impl/InboundLedger.cpp

- mHash << " cannot be a ledger";
- mFailed = true;
- return true;
+ deserializeHeader(makeSlice(node->getData()), true),


This branch looks very similar to lines 293-307. Consider making a local lambda to combine them.

bachase · 2017-11-13T17:08:44Z

src/ripple/shamap/SHAMap.h

@@ -87,6 +87,8 @@ class SHAMap
 mutable SHAMapState state_;
 SHAMapType type_;
 bool backed_ = true; // Map is backed by the database
+ bool full_ = false; // Indicates all nodes should reside in a local store.


How should I think of full_ in relation to backed_?

If backed_ is true, then full_ determines whether or not every node pertaining to this SHAMap resides in a database.

The wording might be confusing. Maybe we can change "full_" to "Map is believed complete in database" or something.

bachase · 2017-11-13T17:09:25Z

src/ripple/shamap/SHAMap.h

@@ -150,6 +158,8 @@ class SHAMap
 Marked `const` because the data is not part of
 the map contents.
 */
+ void setFull ();


Is the comment above on L158-160 meant for some other member function?

Yes, will fix, thanks!

mellery451

👍

mellery451 · 2017-11-14T00:28:04Z

src/ripple/app/ledger/AccountStateSF.cpp

 }

 boost::optional<Blob>
-AccountStateSF::getNode(SHAMapHash const& nodeHash) const
+AccountStateSF::getNode(SHAMapHash const& nodeHash,
+ std::uint32_t ledgerSeq) const


nit: should we leave the second parameter unnamed here since it is not used ?

mellery451 · 2017-11-14T18:11:05Z

src/ripple/app/ledger/ConsensusTransSetSF.cpp

- Blob&& nodeData, SHAMapTreeNode::TNType type) const
+void
+ConsensusTransSetSF::gotNode(bool fromFilter, SHAMapHash const& nodeHash,
+ std::uint32_t ledgerSeq, Blob&& nodeData, SHAMapTreeNode::TNType type) const


same here - both ledgerSeq are unused

mellery451 · 2017-11-14T22:22:33Z

src/ripple/app/ledger/impl/InboundLedger.cpp

+ if (mLedger)
+ tryDB(mLedger->stateMap().family());
+ else if(mReason == Reason::SHARD)
+ tryDB(*app_.shardFamily());


do we need to check app_.shardFamily() is set/true or is that already guaranteed by callers?

yes, guaranteed by the caller.

mellery451 · 2017-11-14T22:55:11Z

src/ripple/app/ledger/impl/InboundLedger.cpp

+ {
+ case Reason::SHARD:
+ app_.getShardStore()->setStored(mLedger);
+ // fall through


consider wording like this //TODO c++17: [[fallthrough]]

mellery451 · 2017-11-15T17:13:25Z

src/ripple/basics/RangeSet.h

+void
+save(Archive& ar,
+ ripple::ClosedInterval<T> const& ci,
+ const unsigned int version)


version is unused - is it just an optional way to version your object internals?

mellery451 · 2017-11-15T19:24:15Z

src/ripple/nodestore/impl/Database.cpp

+ case ok:
+ ++fetchHitCount_;
+ if (nObj)
+ fetchSz_ += nObj->getData().size();


//TODO c++17: [[fallthrough]] ... or just add break

mellery451 · 2017-11-15T21:51:44Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ if (ledger->info().hash != hash || ledger->info().seq != seq)
+ {
+ JLOG(j_.error()) <<
+ "shard " << std::to_string(seqToShardIndex(seq)) <<


std::uint32_t is streamable, so is there any benefit in using std::to_string for these ?

mellery451 · 2017-11-15T22:07:23Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ s += std::to_string(incomplete_->index());
+ else
+ s.pop_back();
+ JLOG(j_.fatal()) << s;


is this a FATAL log because it's an error, or just to ensure it shows up in the log?

mellery451 · 2017-11-15T22:22:07Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+DatabaseShardImp::asyncFetch(uint256 const& hash,
+ std::uint32_t seq, std::shared_ptr<NodeObject>& object)
+{
+ TaggedCache<uint256, NodeObject>* pCache;


consider making these 18 or so lines a selectCache function.

mellery451 · 2017-11-15T22:47:39Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+int
+DatabaseShardImp::getDesiredAsyncReadCount(std::uint32_t seq)
+{
+ auto const shardIndex {seqToShardIndex(seq)};


that proposed selectCache function could be used here too

ximinez · 2017-11-19T22:31:47Z

The latest commit is causing all kinds of build errors.

bachase

👍 on the latest commits.

bachase · 2017-11-22T16:18:47Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ auto it = complete_.find(shardIndex);
+ if (it != complete_.end())
+ {
+ cache->first = it->second->pCache();


You need to initialize the optional before assigning to the pair members, or just do a cache = std::make_pair. This crashed for me when running with assert.

Good catch, fixed.

ximinez · 2017-11-29T20:32:51Z

src/ripple/nodestore/impl/DatabaseShardImp.cpp

+ cache = std::make_pair(incomplete_->pCache(),
+ incomplete_->nCache());
+ }
+ return std::move(cache);


You don't need the std::move here.

In file included from /home/eah/dev/rippled-merge/src/ripple/unity/nodestore.cpp:32: /home/eah/dev/rippled-merge/src/ripple/nodestore/impl/DatabaseShardImp.cpp:740:12: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move] return std::move(cache); ^ /home/eah/dev/rippled-merge/src/ripple/nodestore/impl/DatabaseShardImp.cpp:740:12: note: remove std::move call here return std::move(cache); ^~~~~~~~~~ ~ 1 warning generated.

codecov-io · 2017-12-01T19:17:14Z

Codecov Report

Merging #2258 into develop will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #2258   +/-   ##
========================================
  Coverage    69.64%   69.64%           
========================================
  Files          705      705           
  Lines        58426    58426           
========================================
  Hits         40691    40691           
  Misses       17735    17735

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc9e9f4...a75d4ea. Read the comment docs.

ximinez

I have not done a full review, but the issues I had with building and the move warning have been resolved. 👍

JoelKatz

👍 I believe these changes are safe and should be merged to allow work on shards to continue and to get the network supporting the protocol changes for shards. But I do believe we should also hold off advising people to enable sharding until we get some of the remaining missing features taken care off -- particularly ensuring that enabling sharding doesn't significantly increase resource consumption.

sublimator · 2017-12-15T20:17:27Z

Zarathustra?

ripplelabs-jenkins · 2017-12-19T17:58:43Z

Jenkins Build Summary

Built from this commit

Built at 20171219 - 22:52:17

Test Results

Build Type	Result	Status
clang.debug.unity	970 cases, 0 failed, t: 385s	PASS ✅
coverage	970 cases, 0 failed, t: 612s	PASS ✅
clang.debug.nounity	968 cases, 0 failed, t: 344s	PASS ✅
gcc.debug.unity	970 cases, 0 failed, t: 432s	PASS ✅
gcc.debug.nounity	968 cases, 0 failed, t: 357s	PASS ✅
clang.release.unity	969 cases, 0 failed, t: 467s	PASS ✅
gcc.release.unity	969 cases, 0 failed, t: 501s	PASS ✅

* Use the Visual Studio 2017 image * Update to rippled_deps17.01

scottschurr · 2018-01-19T18:21:33Z

Incorporated into 0.90.0-b4 as commits 819ea46, aeda243, and 718d217.

miguelportilla requested review from bachase, mellery451 and JoelKatz November 4, 2017 01:14

miguelportilla assigned bachase, mellery451 and JoelKatz Nov 4, 2017

bachase reviewed Nov 8, 2017

View reviewed changes

miguelportilla force-pushed the shard branch from 23ae954 to 773fe36 Compare November 10, 2017 00:28

nbougalis reviewed Nov 10, 2017

View reviewed changes

bachase reviewed Nov 13, 2017

View reviewed changes

mellery451 approved these changes Nov 15, 2017

View reviewed changes

miguelportilla force-pushed the shard branch from 3bdeb78 to 3e111eb Compare November 17, 2017 23:35

miguelportilla force-pushed the shard branch 5 times, most recently from 0b8ac7b to 48a64e3 Compare November 21, 2017 22:40

bachase approved these changes Nov 22, 2017

View reviewed changes

bachase reviewed Nov 22, 2017

View reviewed changes

miguelportilla force-pushed the shard branch 3 times, most recently from 44b5d3b to 19c1971 Compare November 28, 2017 21:15

ximinez requested changes Nov 29, 2017

View reviewed changes

miguelportilla force-pushed the shard branch from 19c1971 to f235eed Compare December 1, 2017 18:31

ximinez approved these changes Dec 4, 2017

View reviewed changes

JoelKatz approved these changes Dec 15, 2017

View reviewed changes

miguelportilla added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 19, 2017

Add RangeSet serialization

2756946

miguelportilla force-pushed the shard branch from f235eed to 149614f Compare December 19, 2017 17:36

miguelportilla force-pushed the shard branch 3 times, most recently from f46b7d3 to 35decbf Compare December 19, 2017 21:04

miguelportilla added 3 commits December 19, 2017 17:21

Normalize SHAMap visit functions

656d3cb

Implement Shards

0888782

Update Appveyor dependencies package:

a75d4ea

* Use the Visual Studio 2017 image * Update to rippled_deps17.01

miguelportilla force-pushed the shard branch from 35decbf to a75d4ea Compare December 19, 2017 22:22

scottschurr closed this Jan 19, 2018

miguelportilla deleted the shard branch February 23, 2018 23:41

Implement history shards (RIPD-1289) #2258

Implement history shards (RIPD-1289) #2258

Conversation

miguelportilla commented Nov 4, 2017 • edited Loading

sublimator commented Nov 4, 2017

bachase left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguelportilla Nov 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguelportilla Nov 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nbougalis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bachase left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguelportilla Nov 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mellery451 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ximinez commented Nov 19, 2017

bachase left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 1, 2017 • edited Loading

Codecov Report

ximinez left a comment

Choose a reason for hiding this comment

miguelportilla commented Nov 4, 2017 •

edited

Loading

miguelportilla Nov 10, 2017 •

edited

Loading

miguelportilla Nov 21, 2017 •

edited

Loading

miguelportilla Nov 14, 2017 •

edited

Loading

codecov-io commented Dec 1, 2017 •

edited

Loading

ripplelabs-jenkins commented Dec 19, 2017 •

edited

Loading