Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't view tensorboard during training #7

Open
GeoffNN opened this issue Sep 29, 2020 · 0 comments
Open

Can't view tensorboard during training #7

GeoffNN opened this issue Sep 29, 2020 · 0 comments

Comments

@GeoffNN
Copy link
Contributor

GeoffNN commented Sep 29, 2020

I'm trying to view results during training, by running cox-tensorboard --logdir OUTDIR --format-str param-{param}, while training using the robustness repopython -m robustness.main ..., and I'm getting the following error, which seems to indicate that cox can't read while tensorboard is writing the logs:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/pandas/io/pytables.py", line 697, in open
    self._handle = tables.open_file(self._path, self._mode, **kwargs)
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/tables/file.py", line 315, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/tables/file.py", line 778, in __init__
    self._g_new(filename, mode, **params)
  File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5F.c", line 509, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1400, in H5F__open
    unable to open file
  File "H5Fint.c", line 1615, in H5F_open
    unable to lock the file
  File "H5FD.c", line 1640, in H5FD_lock
    driver lock request failed
  File "H5FDsec2.c", line 941, in H5FD_sec2_lock
    unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'

End of HDF5 error back trace

Unable to open/create file '/home/ubuntu/logs/a69229d4-56c6-421d-bccb-73cc1d21b0d5/store.h5'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/tensorboard_view.py", line 59, in <module>
    main()
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/tensorboard_view.py", line 24, in main
    reader = CollectionReader(args.logdir)
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/readers.py", line 53, in __init__
    raise e
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/readers.py", line 42, in __init__
    store = Store(self.directory, exp_id, new=False, mode='r')
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/cox/store.py", line 90, in __init__
    self.store = pd.HDFStore(os.path.join(exp_path, STORE_BASENAME), mode=mode)
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/pandas/io/pytables.py", line 553, in __init__
    self.open(mode=mode, **kwargs)
  File "/home/ubuntu/anaconda3/envs/rrat/lib/python3.8/site-packages/pandas/io/pytables.py", line 729, in open
    raise IOError(str(err)) from err
OSError: HDF5 error back trace

  File "H5F.c", line 509, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1400, in H5F__open
    unable to open file
  File "H5Fint.c", line 1615, in H5F_open
    unable to lock the file
  File "H5FD.c", line 1640, in H5FD_lock
    driver lock request failed
  File "H5FDsec2.c", line 941, in H5FD_sec2_lock
    unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'

End of HDF5 error back trace

Unable to open/create file '/home/ubuntu/logs/a69229d4-56c6-421d-bccb-73cc1d21b0d5/store.h5'

Is this expected behavior? It seems like it goes against the typical tensorboard use cases. Using tensorboard directly doesn't allow to view which curve corresponds to which parameters. It would be nice to have read-only access to the tables, just to change the names of the curves in the tensorboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant