-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CometLogger for metrics logging via comet.ml #1221
Changes from 20 commits
bd90921
fb86fef
d241cd7
93d75f6
a5d253e
14eadcd
edb00cf
a3cff9e
6b39e87
ecd1ec3
2adccc7
c79f86c
7fd3419
d3d84c4
cda7ebc
e5a6106
6da5f5f
65e8c86
1a4e1f1
4abe775
3195e78
089adad
e930526
68b1f7b
025ad3a
0b61dbc
48a70f8
bfde03b
4c94eb1
27c90e3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,53 @@ | ||||||||
.. _comet_logging: | ||||||||
|
||||||||
================ | ||||||||
Logging to Comet | ||||||||
================ | ||||||||
|
||||||||
This deep-dive will guide you through how to set up logging to Comet in torchtune. | ||||||||
|
||||||||
.. grid:: 1 | ||||||||
|
||||||||
.. grid-item-card:: :octicon:`mortar-board;1em;` What this deep-dive will cover | ||||||||
|
||||||||
* How to get started with Comet | ||||||||
* How to use the :class:`~torchtune.utils.metric_logging.CometLogger` | ||||||||
* How to log configs, metrics, and model checkpoints to Comet | ||||||||
|
||||||||
Torchtune supports logging your training runs to `Comet <https://www.comet.com/site/?utm_source=torchtune&utm_medium=docs&utm_content=docs>`_. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: we lowercase torchtune everywhere, even at the start of a sentence :) |
||||||||
An example Comet workspace from a torchtune fine-tuning run can be seen in the screenshot below. | ||||||||
|
||||||||
.. image:: ../_static/img/comet_torchtune_project.png | ||||||||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
:alt: torchtune workspace in Comet | ||||||||
:width: 100% | ||||||||
:align: center | ||||||||
|
||||||||
.. note:: | ||||||||
|
||||||||
You will need to install the :code:`comet_ml` package to use this feature. | ||||||||
You can install it via pip: | ||||||||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
.. code-block:: bash | ||||||||
|
||||||||
comet login | ||||||||
|
||||||||
Metric Logger | ||||||||
------------- | ||||||||
|
||||||||
The only change you need to make is to add the metric logger to your config. Comet will log the metrics and model checkpoints for you. | ||||||||
|
||||||||
.. code-block:: yaml | ||||||||
|
||||||||
# enable logging to the built-in CometLogger | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
metric_logger: | ||||||||
_component_: torchtune.utils.metric_logging.CometLogger | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😍 |
||||||||
# the Comet project to log to | ||||||||
project: comet-examples-torchtune | ||||||||
experiment_name: my-experiment-name | ||||||||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
We automatically grab the config from the recipe you are running and log it to Comet. You can find it in the Comet Hyperparameters tab and the actual file in the :code:`Assets & Artifacts` tab. | ||||||||
|
||||||||
.. note:: | ||||||||
|
||||||||
Click on this sample `Comet project to see how it will looks like after fine-tuning <https://www.comet.com/examples/comet-example-torchtune-mistral/>`_. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
The config used to train the models can be found `here <https://www.comet.com/examples/comet-example-torchtune-mistral/0aabcd062de548bbbd30912544aaa41a?experiment-tab=params>`_. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -17,6 +17,7 @@ | |||||
from tests.test_utils import assert_expected, captured_output | ||||||
|
||||||
from torchtune.utils.metric_logging import ( | ||||||
CometLogger, | ||||||
DiskLogger, | ||||||
StdoutLogger, | ||||||
TensorBoardLogger, | ||||||
|
@@ -165,3 +166,36 @@ def test_save_config(self) -> None: | |||||
expected_config_path = "torchtune_config.yaml" | ||||||
mock_save.assert_called_once_with(cfg, expected_config_path) | ||||||
mock_wandb_save.assert_called_once_with(expected_config_path) | ||||||
|
||||||
|
||||||
class TestCometLogger: | ||||||
def test_log(self) -> None: | ||||||
with patch("comet_ml.Experiment") as mock_experiment: | ||||||
logger = CometLogger(project_name="test_project") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
for i in range(5): | ||||||
logger.log("test_log", float(i) ** 2, i) | ||||||
logger.close() | ||||||
|
||||||
assert mock_experiment.return_value.log_metric.call_count == 5 | ||||||
for i in range(5): | ||||||
mock_experiment.return_value.log_metric.assert_any_call( | ||||||
"test_log", float(i) ** 2, step=i | ||||||
) | ||||||
|
||||||
def test_log_dict(self) -> None: | ||||||
with patch("comet_ml.Experiment") as mock_experiment: | ||||||
logger = CometLogger(project_name="test_project") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
metric_dict = {f"log_dict_{i}": float(i) ** 2 for i in range(5)} | ||||||
logger.log_dict(metric_dict, 1) | ||||||
logger.close() | ||||||
|
||||||
mock_experiment.return_value.log_metrics.assert_called_with( | ||||||
metric_dict, step=1 | ||||||
) | ||||||
|
||||||
def test_log_config(self) -> None: | ||||||
with patch("comet_ml.Experiment") as mock_experiment: | ||||||
logger = CometLogger(project_name="test_project") | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
cfg = OmegaConf.create({"a": 1, "b": 2}) | ||||||
logger.log_config(cfg) | ||||||
mock_experiment.return_value.log_parameters.assert_called_with(cfg) |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -8,7 +8,7 @@ | |||||||||||||
import time | ||||||||||||||
from pathlib import Path | ||||||||||||||
|
||||||||||||||
from typing import Mapping, Optional, Union | ||||||||||||||
from typing import List, Mapping, Optional, Union | ||||||||||||||
|
||||||||||||||
from numpy import ndarray | ||||||||||||||
from omegaconf import DictConfig, OmegaConf | ||||||||||||||
|
@@ -317,3 +317,127 @@ def close(self) -> None: | |||||||||||||
if self._writer: | ||||||||||||||
self._writer.close() | ||||||||||||||
self._writer = None | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
class CometLogger(MetricLoggerInterface): | ||||||||||||||
"""Logger for use w/ Comet (https://www.comet.com/site/). | ||||||||||||||
SalmanMohammadi marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
For more information about arguments expected by Comet, see | ||||||||||||||
https://www.comet.com/docs/v2/guides/experiment-management/configure-sdk/#for-the-experiment. | ||||||||||||||
|
||||||||||||||
Args: | ||||||||||||||
api_key (Optional[str]): Comet API key. It's recommended to configure the API Key from the environment. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we just encourage users to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes we should encourage users to use Saving your Comet API Key in a git tracked file is a bad practice and we are of course not recommending it. Having the possibility to set your Comet API Key in your configuration is useful when you are rendering your configuration file and injecting the Comet API Key automatically from a secret manager place. I do not know if that's something that's frequent when using torchtune. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually I expect people will mostly just use the API key? By exporting it before running |
||||||||||||||
workspace (Optional[str]): Comet workspace name. If not provided, uses the default workspace. | ||||||||||||||
project (Optional[str]): Comet project name. Defaults to Uncategorized. | ||||||||||||||
experiment_name (Optional[str]): The name for comet experiment to be used for logging. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's still unhappy about this @Lothiraldan There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I'm working on understanding why is it complaining. It only is complaining with the latest version of pydocstring. |
||||||||||||||
experiment_key (Optional[str]): The key for comet experiment to be used for logging. Must be an alphanumeric | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this needed for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
|
||||||||||||||
string whose length is between 32 and 50 characters. | ||||||||||||||
mode (Optional[str]): Control how the Comet experiment is started. "get": Continue logging to an existing | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can make a bulleted list here
Suggested change
also minor nit, can we make the default option the first bullet? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I give it a try but I cannot seems to have found the right format. Here is how it looks if I align it with the arguments list And I indent them even by one space, sphinx-build is complaining:
Do you know what is the proper way to format it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I re-ordered the options but couldn't find the right format to render an indented list. I'm not sure if the Google docstring support it. |
||||||||||||||
experiment identified by the `experiment_key` value. "create": Always creates of a new experiment, useful | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
for HPO sweeps. "get_or_create" (default): Starts a fresh experiment if required, or persists logging to | ||||||||||||||
an existing one. | ||||||||||||||
online (Optional[bool]): If True, the data will be logged to Comet server, otherwise it will be stored locally | ||||||||||||||
in offline experiment. Default is `True`. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
experiment_name (Optional[str]): Name of the experiment. If not provided, Comet will auto-generate a name. | ||||||||||||||
tags (Optional[List[str]]): Tags to associate with the experiment. | ||||||||||||||
log_code (bool): Whether to log the source code. Defaults to True. | ||||||||||||||
**kwargs: additional arguments to pass to Comet.start. See | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might need to type this (annoyingly)
Suggested change
|
||||||||||||||
https://www.comet.com/docs/v2/api-and-sdk/python-sdk/reference/Experiment-Creation/#comet_ml.ExperimentConfig | ||||||||||||||
|
||||||||||||||
Example: | ||||||||||||||
>>> from torchtune.utils.metric_logging import CometLogger | ||||||||||||||
>>> logger = CometLogger(project_name="my_project", workspace="my_workspace") | ||||||||||||||
>>> logger.log("my_metric", 1.0, 1) | ||||||||||||||
>>> logger.log_dict({"my_metric": 1.0}, 1) | ||||||||||||||
>>> logger.close() | ||||||||||||||
|
||||||||||||||
Raises: | ||||||||||||||
ImportError: If `comet_ml` package is not installed. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
Note: | ||||||||||||||
This logger requires the comet_ml package to be installed. | ||||||||||||||
You can install it with ``pip install comet_ml``. | ||||||||||||||
You need to set up your Comet.ml API key before using this logger. | ||||||||||||||
You can do this by setting the COMET_API_KEY environment variable | ||||||||||||||
or by calling ``comet_ml.login()`` with your API key. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. personally I find comet login the easiest to do and is more comfortable for folks who don't know much code. What are your thoughts on just always recommending to setup via comet login so folks do not get confused with the different options? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as in #1221 (comment), if we expect users to be setting up a config and doing As far as I can tell torchtune seems to heavily favor the CLI There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I was under the impression you can run Either way I agree with @dzheng256, since CLI is the primary entry point for most users via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RdoubleA @dzheng256 Sorry about the confusion, there is bot a |
||||||||||||||
""" | ||||||||||||||
|
||||||||||||||
def __init__( | ||||||||||||||
self, | ||||||||||||||
api_key: Optional[str] = None, | ||||||||||||||
workspace: Optional[str] = None, | ||||||||||||||
project: Optional[str] = None, | ||||||||||||||
experiment_key: Optional[str] = None, | ||||||||||||||
mode: Optional[str] = None, | ||||||||||||||
online: Optional[bool] = None, | ||||||||||||||
experiment_name: Optional[str] = None, | ||||||||||||||
tags: Optional[List[str]] = None, | ||||||||||||||
log_code: bool = True, | ||||||||||||||
**kwargs, | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and here |
||||||||||||||
): | ||||||||||||||
try: | ||||||||||||||
import comet_ml | ||||||||||||||
except ImportError as e: | ||||||||||||||
raise ImportError( | ||||||||||||||
"``comet_ml`` package not found. Please install comet_ml using `pip install comet_ml` to use CometLogger." | ||||||||||||||
"Alternatively, use the ``StdoutLogger``, which can be specified by setting metric_logger_type='stdout'." | ||||||||||||||
) from e | ||||||||||||||
|
||||||||||||||
_, self.rank = get_world_size_and_rank() | ||||||||||||||
|
||||||||||||||
# Declare it early so further methods don't crash in case of | ||||||||||||||
# Experiment Creation failure due to mis-named configuration for | ||||||||||||||
# example | ||||||||||||||
self.experiment = None | ||||||||||||||
|
||||||||||||||
if self.rank == 0: | ||||||||||||||
self.experiment = comet_ml.start( | ||||||||||||||
api_key=api_key, | ||||||||||||||
workspace=workspace, | ||||||||||||||
project=project, | ||||||||||||||
experiment_key=experiment_key, | ||||||||||||||
mode=mode, | ||||||||||||||
online=online, | ||||||||||||||
experiment_config=comet_ml.ExperimentConfig( | ||||||||||||||
log_code=log_code, tags=tags, name=experiment_name, **kwargs | ||||||||||||||
), | ||||||||||||||
) | ||||||||||||||
|
||||||||||||||
def log(self, name: str, data: Scalar, step: int) -> None: | ||||||||||||||
if self.experiment is not None: | ||||||||||||||
self.experiment.log_metric(name, data, step=step) | ||||||||||||||
|
||||||||||||||
def log_dict(self, payload: Mapping[str, Scalar], step: int) -> None: | ||||||||||||||
if self.experiment is not None: | ||||||||||||||
self.experiment.log_metrics(payload, step=step) | ||||||||||||||
|
||||||||||||||
def log_config(self, config: DictConfig) -> None: | ||||||||||||||
if self.experiment is not None: | ||||||||||||||
resolved = OmegaConf.to_container(config, resolve=True) | ||||||||||||||
self.experiment.log_parameters(resolved) | ||||||||||||||
|
||||||||||||||
# Also try to save the config as a file | ||||||||||||||
try: | ||||||||||||||
self._log_config_as_file(config) | ||||||||||||||
except Exception as e: | ||||||||||||||
log.warning(f"Error saving Config to disk.\nError: \n{e}.") | ||||||||||||||
return | ||||||||||||||
|
||||||||||||||
def _log_config_as_file(self, config: DictConfig): | ||||||||||||||
output_config_fname = Path( | ||||||||||||||
os.path.join( | ||||||||||||||
config.checkpointer.checkpoint_dir, | ||||||||||||||
"torchtune_config.yaml", | ||||||||||||||
) | ||||||||||||||
) | ||||||||||||||
OmegaConf.save(config, output_config_fname) | ||||||||||||||
|
||||||||||||||
self.experiment.log_asset( | ||||||||||||||
output_config_fname, file_name="torchtune_config.yaml" | ||||||||||||||
) | ||||||||||||||
|
||||||||||||||
def close(self) -> None: | ||||||||||||||
if self.experiment is not None: | ||||||||||||||
self.experiment.end() | ||||||||||||||
|
||||||||||||||
def __del__(self) -> None: | ||||||||||||||
self.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last minor nit, I promise