Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc pull crashing on a FSx Lustre file system #10502

Open
rrazavipour opened this issue Aug 5, 2024 · 2 comments
Open

dvc pull crashing on a FSx Lustre file system #10502

rrazavipour opened this issue Aug 5, 2024 · 2 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push awaiting response we are waiting for your reply, please respond! :) triage Needs to be triaged

Comments

@rrazavipour
Copy link

Bug Report

dvc pull

Description

dvc pull crashes with sqlite3.OperationError: disk I/O error

Reproduce

this happens trying to pull a 420G of data on an Amazon FSx Lustre filesystem.
I complete the git clone
I only do a dvc pull, after many hours of operation. I get
the mentioned error.

Expected

dvc pull to complete

Environment information

[ec2-user@ip-10-0-1-122 ~]$ dvc doctor
DVC version: 3.53.0 (pip)

Platform: Python 3.9.16 on Linux-6.1.97-104.177.amzn2023.x86_64-x86_64-with-glibc2.34
Subprojects:
dvc_data = 3.15.1
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.6
Supports:
http (aiohttp = 3.10.0, aiohttp-retry = 2.8.3),
https (aiohttp = 3.10.0, aiohttp-retry = 2.8.3),
s3 (s3fs = 2024.6.1, boto3 = 1.34.131)
Config:
Global: /home/ec2-user/.config/dvc
System: /etc/xdg/dvc

Output of dvc doctor:

$ dvc doctor

Additional Information (if any):
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 211, in main
ret = cmd.do_run()
File "/usr/local/lib/python3.9/site-packages/dvc/cli/command.py", line 27, in do_run
return self.run()
File "/usr/local/lib/python3.9/site-packages/dvc/commands/data_sync.py", line 35, in run
stats = self.repo.pull(
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dvc/repo/pull.py", line 42, in pull
stats = self.checkout(
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dvc/repo/checkout.py", line 142, in checkout
diff = compare(old, new, relink=relink, delete=True, callback=pb.as_callback())
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/checkout.py", line 315, in compare
ret = _compare(
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/checkout.py", line 243, in _compare
for change in idiff(
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 320, in diff
yield from changes
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 230, in _diff
new_dir_items, new_unknown = _get_items(new, key, new_entry, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/diff.py", line 152, in _get_items
items = dict(index.ls(key, detail=True))
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/view.py", line 128, in ls
self._index._ensure_loaded(root_key)
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/index.py", line 759, in _ensure_loaded
entry = self.get(prefix)
File "/usr/lib64/python3.9/_collections_abc.py", line 763, in get
return self[key]
File "/usr/local/lib/python3.9/site-packages/dvc_data/index/index.py", line 671, in getitem
item = self._trie.get(key)
File "/usr/lib64/python3.9/_collections_abc.py", line 763, in get
return self[key]
File "/usr/local/lib/python3.9/site-packages/sqltrie/serialized.py", line 58, in getitem
raw = self._trie[key]
File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 266, in getitem
row = self._get_node(key)
File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 202, in _get_node
rows = list(self._traverse(key))
File "/usr/local/lib/python3.9/site-packages/sqltrie/sqlite/sqlite.py", line 191, in _traverse
self._conn.executescript(STEPS_SQL.format(path=path, root=self._root_id))
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/bin/dvc", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 236, in main
ret = _log_exceptions(exc) or 255
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 147, in _log_exceptions
_log_unknown_exceptions()
File "/usr/local/lib/python3.9/site-packages/dvc/cli/init.py", line 49, in _log_unknown_exceptions
logger.debug("Version info for developers:\n%s", get_dvc_info())
File "/usr/local/lib/python3.9/site-packages/dvc/info.py", line 38, in get_dvc_info
with Repo() as repo:
File "/usr/local/lib/python3.9/site-packages/dvc/repo/init.py", line 209, in init
self.state = State(self.root_dir, self.site_cache_dir, self.dvcignore)
File "/usr/local/lib/python3.9/site-packages/dvc_data/hashfile/state.py", line 92, in init
self.links = Cache(links_dir)
File "/usr/local/lib/python3.9/site-packages/dvc_data/hashfile/cache.py", line 59, in init
super().init(directory=directory, timeout=timeout, disk=disk, **settings)
File "/usr/local/lib/python3.9/site-packages/diskcache/core.py", line 478, in init
self.reset(key, value, update=False)
File "/usr/local/lib/python3.9/site-packages/diskcache/core.py", line 2431, in reset
((old_value,),) = sql(
sqlite3.OperationalError: disk I/O error

@shcheklein shcheklein added A: data-sync Related to dvc get/fetch/import/pull/push triage Needs to be triaged awaiting response we are waiting for your reply, please respond! :) labels Aug 6, 2024
@shcheklein
Copy link
Member

@rrazavipour is there something specific to the structure of this data (e.g. very nested, or too many directories, etc). How many files overall? Is it happening only on this FSx Lustre? What instance size are you using on AWS?

@rrazavipour
Copy link
Author

rrazavipour commented Aug 6, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push awaiting response we are waiting for your reply, please respond! :) triage Needs to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants