Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: overload the PosixFileSystemAdaptor interface to generate real snapshot when follower installs snapshot #245

Conversation

panlei-coder
Copy link
Collaborator

@panlei-coder panlei-coder commented Mar 30, 2024

(注:主要代码修改在最后一次提交中 feat: overload the PosixFileSystemAdaptor interface to generate real …

为了支持日常的快照仅仅只是推进日志的截断,避免raft日志过大,占用过多的磁盘空间,在braft代码中更改了Node::Snapshot接口,可以自定义设置快照的截断位置(参考pr:https:/pikiwidb/braft/pull/2)。
同时,需要满足在follower节点需要进行快照安装时能够把leader的快照发送给follower,有两种解决方案:
(1)直接在braft代码里修改(参考pr:https:/pikiwidb/braft/pull/3),不过这种方式不是很优雅。
(2)直接重载PosixFileSystemAdaptor::open接口,这个函数在下面图中的FileServiceImpl::get_file接口中调用(file_service.cpp),我们只需要在follower真正读取快照数据之前,同步生成好需要的快照数据即可。此外,还可以根据min(所有Column Family已经持久化到磁盘上的数据的最大SequenceNum对应的log index)设置快照的截断位置,只需要在pikiwidb层修改代码即可,完全不需要修改braft的代码,比较优雅。
leader:
image
follower
image

测试:
运行save_load.sh脚本,脚本中leader连续做了两次数据插入(每次10000条数据)和两次快照截断,第二次执行完成之后快照的截断点在20001,但快照数据是空的,follower节点加入集群之后,截断点为20001的快照被填充了真正的快照数据。
20099feedeb6616bca3ec96a99d4081

@github-actions github-actions bot added the ✏️ Feature New feature or request label Mar 30, 2024
@panlei-coder panlei-coder marked this pull request as draft March 30, 2024 09:03
};
node_options_.checkpoint_callback = checkpoint_callback;
snapshot_adaptor_ = new PPosixFileSystemAdaptor();
node_options_.snapshot_file_system_adaptor = &snapshot_adaptor_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node_options里有一个定时打快照的参数snapshot_interval_s,可以考虑置零。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经置0了

if (!node_) {
return ERROR_LOG_AND_STATUS("Node is not initialized");
}
braft::SynchronizedClosure done;
node_->snapshot(&done);
node_->snapshot(&done); // @todo self_snapshot_index
done.wait();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里考虑到我们想通过rocksdb触发event listerner去调用这个函数,不一定需要同步等待。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所以可以考虑多一个参数用来控制是否进行等待。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

braft::FileAdaptor* PPosixFileSystemAdaptor::open(const std::string& path, int oflag,
const ::google::protobuf::Message* file_meta, butil::File::Error* e) {
// checkpoint callback
PRAFT.GenerateRealSnapshot();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里会驱动on_snapshot_save去生成快照,如果系统一直向前运行,这时on_snapshot_save会生成在一个新的目录下而不是当前PPosixFileSystemAdaptor::open要找到的目录吧?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

或者说我们通过定制传进去的snapshot index这个接口中还能覆盖到上一次生成的目录里?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std::string prefix = "local://" + g_config.dbpath + "_praft";
node_options_.log_uri = prefix + "/log";
node_options_.raft_meta_uri = prefix + "/raft_meta";
node_options_.snapshot_uri = prefix + "/snapshot";
// node_options_.disable_cli = FLAGS_disable_cli;
snapshot_adaptor_ = new PPosixFileSystemAdaptor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里使用new申请了,虽然只申请了一次,但是最好有对应delete的操作。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scoped_refptrbraft::FileSystemAdaptor snapshot_adaptor_ = nullptr;
这样定义的,应该不需要主动调用delete吧


void PRaft::recursive_copy(const std::filesystem::path& source, const std::filesystem::path& destination) {
if (std::filesystem::is_regular_file(source)) {
if (source.filename() == "__raft_snapshot_meta") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个对应的宏是BRAFT_SNAPSHOT_META_FILE,按理应该使用宏比较好,不过它放在cpp里。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看后面自己实现了一个宏了,那就用起来。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯 这里应该是漏掉了


braft::FileAdaptor* PPosixFileSystemAdaptor::open(const std::string& path, int oflag,
const ::google::protobuf::Message* file_meta, butil::File::Error* e) {
if ((oflag & 0x01) == 0) { // This is a read operation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x01应该使用宏,看代码打开meta文件时 oflag & O_RDONLY | O_CLOEXEC 应该 == 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的 它这个应该是"#define O_RDONLY 00",&之后应该0吧

std::string snapshot_path;

// parse snapshot path
if (found_pos != std::string::npos) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brpc/src/butil/files/file_path.h 里有一个路径的util类,可以直接用来处理路径信息,还能跨平台。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

INFO("start generate snapshot");
braft::LocalSnapshotMetaTable snapshot_meta_memtable;
std::string meta_path = snapshot_path + "/" PBRAFT_SNAPSHOT_META_FILE;
braft::FileSystemAdaptor* fs = braft::default_file_system();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里好像this做个cast就行?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我试过使用this,运行的时候会有点问题

}

// check whether snapshots have been created
std::lock_guard guard(mutex_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

被braft调用的代码跑在bthread里可以考虑使用butex。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好嘞

@@ -36,6 +36,8 @@ void RaftNodeCmd::DoCmd(PClient* client) {
DoCmdAdd(client);
} else if (!strcasecmp(cmd.c_str(), "REMOVE")) {
DoCmdRemove(client);
} else if (!strcasecmp(cmd.c_str(), "DSS")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块这个 DSS 也可以改一下, 当时为了产生快照为了方便随便用了个缩写

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -132,7 +124,7 @@ Status Storage::CreateCheckpoint(const std::string& dump_path, int i) {

// 3) Create a checkpoint
std::unique_ptr<rocksdb::Checkpoint> checkpoint_guard(checkpoint);
s = checkpoint->CreateCheckpoint(tmp_dir, kNoFlush, nullptr);
s = checkpoint->CreateCheckpoint(tmp_dir, kFlush, nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还需要每次 checkpoint 的时候 flush 吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm 目前来看应该是不需要的了 这个快可以改掉其实

@Mixficsol Mixficsol mentioned this pull request Apr 15, 2024
24 tasks
@panlei-coder panlei-coder deleted the snapshot_before_reader branch May 13, 2024 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✏️ Feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants