Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sled database is not really closed after dropping sled::Db. #1234

Closed
vi opened this issue Jan 4, 2021 · 13 comments
Closed

Sled database is not really closed after dropping sled::Db. #1234

vi opened this issue Jan 4, 2021 · 13 comments

Comments

@vi
Copy link
Contributor

vi commented Jan 4, 2021

Example code snippet (non-standalone):

        let db: sled::Db = sled::open(&opts.database_path)?;
        db.insert(b"qqq2", b"www3")?;
        db.flush()?;
        drop(db);


        let db: sled::Db = sled::open(&opts.database_path)?;
        drop(db);
  1. expected result: always works. No extra threads or file descriptors or other remnants remain after dropping all sled::Db instances.
  2. actual result: sometimes fails to close the file descriptor on drop, then fails flock on the next open.
  3. sled version: 0.34.6
  4. rustc version: rustc 1.49.0-nightly (8dae8cdcc 2020-10-12)
  5. operating system: Linux
@vi vi added the bug label Jan 4, 2021
@spacejam
Copy link
Owner

spacejam commented Jan 5, 2021

Is this a sled issue or does this have to do with normal linux fd reaping behavior?

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

What do you mean by "fd reaping"? Releasing flock on descriptor close or closing descriptors on process exit?

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

Does Sled use any global singetons under the hood by the way?

@spacejam
Copy link
Owner

spacejam commented Jan 5, 2021

There is a choice about whether to fail if the advisory lock cannot be taken immediately, or to block indefinitely. When testing sled, we use the blocking approach because fd reaping is a lazy process and can take a few moments, and under test we know that we are the only ones using a database. But when running outside of test mode, we do not block indefinitely, because that would provide the user with no actionable information about what the problem is. The code you've listed in the example is expected to fail often, because reaping is lazy. Please keep the database open while using it, and avoid using it from multiple processes. The lazy reaping should almost always happen before you can get around to opening the database again in a new process.

@spacejam spacejam closed this as completed Jan 5, 2021
@spacejam spacejam removed the bug label Jan 5, 2021
@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

How do I wait for database to be free for reuse by external tools?

Maybe there should be some sled::wait_for_reaper_finish() or sled::Db::terminate() (+ async versions), so that user can know that the process is now in the same state as before using Sled.


Also sled::Db's documentation should then obviously mention that it starts some background threads and does not wait for them to stop on drop, as it is rather unintuitive.

README
philosophy
don't make the user think. the interface should be obvious.

From a distance Sled looks like a file-backed thread-safe BTreeMap. But that background activity that keeps the file busy makes it trickier to understand.

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

For example, how would a correct code for closing the database, then causing underlying filesystem unmount (and storage eject) would look like?

@divergentdave
Copy link
Collaborator

Sled closes the file before drop returns. The FD reaping process @spacejam refers to is a behavior of the Linux kernel. There isn't anything else sled can do during shutdown to affect this.

I have had need to repeatedly close and reopen a database repeatedly for crash testing, and this can be done with a blocking flock() call on the file before opening the database. (via fs2's FileExt::lock_exclusive()) However, I would not recommend this for normal use. Unnecessarily closing and reopening a database is expensive, as each time it will perform database recovery, reading the snapshot/log to load metadata.

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

Sled closes the file before drop returns.

If so, flock's lock gets released automatically and further open should succeed.

$ strace -fe flock,close,openat ../target/debug/qqqq '*' --dbtest |& grep 'close\|flock\|sled/db'
...
openat(AT_FDCWD, "/home/vi/code/ddb/db.sled/db", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 3
flock(3, LOCK_EX|LOCK_NB)               = 0
close(4)                                = 0
close(4)                                = 0
[pid  9163] close(5)                    = 0
[pid  9163] close(4)                    = 0
[pid  9163] close(4)                    = 0
[pid  9163] close(4)                    = 0
[pid  9163] close(4)                    = 0
[pid  9163] openat(AT_FDCWD, "/home/vi/code/ddb/db.sled/db", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 4
[pid  9163] flock(4, LOCK_EX|LOCK_NB)   = -1 EAGAIN (Resource temporarily unavailable)
[pid  9163] close(4)                    = 0
Error: IO error: could not acquire lock on "/home/vi/code/ddb/db.sled/db": Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }

But I don't see a close before the return from drop.

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

Simplified repro:

fn main() {
    let db: sled::Db = sled::open("/home/vi/code/ddb/db.sled").unwrap();
    db.insert(b"qqq2", format!("{:?}",std::time::Instant::now()).as_bytes()).unwrap();
    db.flush().unwrap();
    drop(db);
    eprintln!("Finished closing the database?");

    std::thread::sleep(std::time::Duration::from_secs(1000000));
}

clearly shows that database's primary file descriptor (3) is closed way after it returns from the drop.

strace run:

$ strace -ttfe flock,close,openat,write ../target/debug/sleddebug '*' --dbtest |& grep 'close\|flock\|sled/db\|write(2,'
...
18:44:54.078089 openat(AT_FDCWD, "/home/vi/code/ddb/db.sled/db", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 3
18:44:54.078169 flock(3, LOCK_EX|LOCK_NB) = 0
18:44:54.641858 close(4)                = 0
18:44:55.223718 close(4)                = 0
[pid 12739] 18:44:55.657225 close(5)    = 0
[pid 12739] 18:44:55.658770 close(4)    = 0
[pid 12739] 18:44:55.659233 close(4)    = 0
[pid 12739] 18:44:58.617484 write(2, "Finished closing the database?\n", 31Finished closing the database?
[pid 12742] 18:45:00.338616 close(3)    = 0
^C

@divergentdave
Copy link
Collaborator

I cannot reproduce your results with sled 0.34.6. Perhaps you are using an older version with a bug in a Drop impl?

09:47:29.800481 openat(AT_FDCWD, "db.sled/db", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 3
09:47:29.800571 flock(3, LOCK_EX|LOCK_NB) = 0
09:47:29.998396 openat(AT_FDCWD, "/home/david/scratch/sled-vitaly-drop/db.sled", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
09:47:29.998730 close(4)                = 0
09:47:29.998817 openat(AT_FDCWD, "/home/david/scratch/sled-vitaly-drop/db.sled/snap.00000000000000AE", O_RDONLY|O_CLOEXEC) = 4
09:47:29.999346 close(4)                = 0
strace: Process 538 attached
strace: Process 539 attached
strace: Process 540 attached
strace: Process 541 attached
strace: Process 542 attached
strace: Process 543 attached
strace: Process 544 attached
[pid   537] 09:47:30.021952 openat(AT_FDCWD, "db.sled/snap.00000000000000E9.generating", O_WRONLY|O_CREAT|O_CLOEXEC, 0666) = 4
[pid   537] 09:47:30.022215 write(4, "\352\0\0\0\0\0\0\0\1\1G\0\0\0\0\0\0\0\0G\31\1\34\0\0\0\0\0\0\0\0\34"..., 90) = 90
[pid   537] 09:47:30.022584 write(4, "Z\0\0\0\0\0\0\0", 8) = 8
[pid   537] 09:47:30.022755 write(4, "\325\311\254]", 4) = 4
[pid   537] 09:47:30.025462 openat(AT_FDCWD, "/home/david/scratch/sled-vitaly-drop/db.sled", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 5
[pid   537] 09:47:30.025725 close(5)    = 0
[pid   537] 09:47:30.025893 close(4)    = 0
[pid   537] 09:47:30.025970 openat(AT_FDCWD, "db.sled/blobs", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
[pid   537] 09:47:30.026161 close(4)    = 0
strace: Process 545 attached
[pid   545] 09:47:30.030565 +++ exited with 0 +++
[pid   537] 09:47:30.033094 close(3)    = 0
[pid   537] 09:47:30.033289 write(2, "Finished closing the database?\n", 31Finished closing the database?

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

Cargo.lock shows that the sled is version = "0.34.6".

Shall I try with Sled from master branch?

@vi
Copy link
Contributor Author

vi commented Jan 5, 2021

Note: the problem is not reproducible with initially empty database (neither with 0.34.6 nor with master commit aa3fcf3).

I use a database that takes about 55MB on disk.

@spacejam
Copy link
Owner

spacejam commented Jan 5, 2021

Again, this is expected behavior due to the way that advisory locking works on linux. Even though there may be a backing IO thread that holds the file open, advisory locking does not always immediately release the file when a process calls close. Especially on other systems like macos, there are a lot of reasons why this would not immediately be openable without using a blocking lock request, which is not an option for us because it would cause locking that the user would get no feedback about, and a lot of issues would be opened about the database stalling for no reason.

Again, please do not rapidly open and close the same database. This is not supported behavior. Keep a single open database.

Repository owner locked as resolved and limited conversation to collaborators Jan 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants