Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CometLogger for metrics logging via comet.ml #1221

Merged
merged 30 commits into from
Aug 12, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
bd90921
Add comet_ml dep
dzheng256 Jul 25, 2024
fb86fef
Add CometLogger
dzheng256 Jul 25, 2024
d241cd7
Add TestCometLogger
dzheng256 Jul 25, 2024
93d75f6
Update docs
dzheng256 Jul 25, 2024
a5d253e
Formatting
dzheng256 Jul 25, 2024
14eadcd
Prefer comet_ml.login to comet_ml.init
dzheng256 Jul 25, 2024
edb00cf
Using typing.List for older python
dzheng256 Jul 25, 2024
a3cff9e
empty commit
dzheng256 Jul 29, 2024
6b39e87
Update pyproject.toml
dzheng256 Aug 8, 2024
ecd1ec3
Update torchtune/utils/metric_logging.py
dzheng256 Aug 8, 2024
2adccc7
Switch to new experiment creation API
dzheng256 Aug 8, 2024
c79f86c
Add deep dive
dzheng256 Aug 8, 2024
7fd3419
Rename project_name -> project, double backticks
dzheng256 Aug 8, 2024
d3d84c4
Rename project_name -> project
dzheng256 Aug 8, 2024
cda7ebc
Expose all of comet_ml.start parameters
Lothiraldan Aug 8, 2024
e5a6106
Add config logging as a file.
Lothiraldan Aug 8, 2024
6da5f5f
Fix mistake
Lothiraldan Aug 8, 2024
65e8c86
Fix another typo in _log_config_as_file
Lothiraldan Aug 8, 2024
1a4e1f1
Fix lints and doc issues
Lothiraldan Aug 8, 2024
4abe775
Add missing screenshot and add Comet Logging to the Nav
Lothiraldan Aug 8, 2024
3195e78
Fix docstring format
Lothiraldan Aug 8, 2024
089adad
Add Comet description to the docstring
Lothiraldan Aug 8, 2024
e930526
Point to comet login for setting up Comet authorization
Lothiraldan Aug 9, 2024
68b1f7b
Fix docstring lint issue
Lothiraldan Aug 9, 2024
025ad3a
Re-order valid values for mode parameter
Lothiraldan Aug 9, 2024
0b61dbc
Fix unit tests
Lothiraldan Aug 9, 2024
48a70f8
Update Comet links and screenshot
Lothiraldan Aug 12, 2024
bfde03b
Try to clarify experiment_key docstring.
Lothiraldan Aug 12, 2024
4c94eb1
Update mode argument docstring with proper format
Lothiraldan Aug 12, 2024
27c90e3
Update Comet name and doc link in the Readme
Lothiraldan Aug 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ torchtune focuses on integrating with popular tools and libraries from the ecosy
- [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html) for distributed training
- [torchao](https:/pytorch-labs/ao) for lower precision dtypes and [post-training quantization](recipes/quantize.py) techniques
- [Weights & Biases](https://wandb.ai/site) for [logging](https://pytorch.org/torchtune/main/deep_dives/wandb_logging.html) metrics and checkpoints, and tracking training progress
- [CometML](https://www.comet.com/site/) as another option for [logging](https://pytorch.org/torchtune/stable/api_ref_utilities.html#metric-logging)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [CometML](https://www.comet.com/site/) as another option for [logging](https://pytorch.org/torchtune/stable/api_ref_utilities.html#metric-logging)
- [CometML](https://www.comet.com/site/) as another option for [logging](https://pytorch.org/torchtune/main/deep_dives/comet_logging.html)

one last minor nit, I promise

- [ExecuTorch](https://pytorch.org/executorch-overview) for [on-device inference](https:/pytorch/executorch/tree/main/examples/models/llama2#optional-finetuning) using fine-tuned models
- [bitsandbytes](https://huggingface.co/docs/bitsandbytes/main/en/index) for low memory optimizers for our [single-device recipes](recipes/configs/llama2/7B_full_low_memory.yaml)

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/api_ref_utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ Various logging utilities.
:toctree: generated/
:nosignatures:

metric_logging.CometLogger
metric_logging.WandBLogger
metric_logging.TensorBoardLogger
metric_logging.StdoutLogger
Expand Down
53 changes: 53 additions & 0 deletions docs/source/deep_dives/comet_logging.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
.. _comet_logging:

================
Logging to Comet
================

This deep-dive will guide you through how to set up logging to Comet in torchtune.

.. grid:: 1

.. grid-item-card:: :octicon:`mortar-board;1em;` What this deep-dive will cover

* How to get started with Comet
* How to use the :class:`~torchtune.utils.metric_logging.CometLogger`
* How to log configs, metrics, and model checkpoints to Comet

Torchtune supports logging your training runs to `Comet <https://www.comet.com/site/?utm_source=torchtune&utm_medium=docs&utm_content=docs>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we lowercase torchtune everywhere, even at the start of a sentence :)

An example Comet workspace from a torchtune fine-tuning run can be seen in the screenshot below.

.. image:: ../_static/img/comet_torchtune_project.png
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved
:alt: torchtune workspace in Comet
:width: 100%
:align: center

.. note::

You will need to install the :code:`comet_ml` package to use this feature.
You can install it via pip:
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

comet login

Metric Logger
-------------

The only change you need to make is to add the metric logger to your config. Comet will log the metrics and model checkpoints for you.

.. code-block:: yaml

# enable logging to the built-in CometLogger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# enable logging to the built-in CometLogger
# enable logging to the built-in CometLogger

metric_logger:
_component_: torchtune.utils.metric_logging.CometLogger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

# the Comet project to log to
project: comet-examples-torchtune
experiment_name: my-experiment-name
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved

We automatically grab the config from the recipe you are running and log it to Comet. You can find it in the Comet Hyperparameters tab and the actual file in the :code:`Assets & Artifacts` tab.

.. note::

Click on this sample `Comet project to see how it will looks like after fine-tuning <https://www.comet.com/examples/comet-example-torchtune-mistral/>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Click on this sample `Comet project to see how it will looks like after fine-tuning <https://www.comet.com/examples/comet-example-torchtune-mistral/>`_.
Click on this sample `Comet project to see the logged metrics after fine-tuning <https://www.comet.com/examples/comet-example-torchtune-mistral/>`_.

The config used to train the models can be found `here <https://www.comet.com/examples/comet-example-torchtune-mistral/0aabcd062de548bbbd30912544aaa41a?experiment-tab=params>`_.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ torchtune tutorials.
:hidden:

deep_dives/checkpointer
deep_dives/comet_logging
deep_dives/configs
deep_dives/recipe_deepdive
deep_dives/wandb_logging
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ tune = "torchtune._cli.tune:main"
[project.optional-dependencies]
dev = [
"bitsandbytes>=0.43.0",
"comet_ml>=3.44.2",
dzheng256 marked this conversation as resolved.
Show resolved Hide resolved
"pre-commit",
"pytest==7.4.0",
"pytest-cov",
Expand Down
34 changes: 34 additions & 0 deletions tests/torchtune/utils/test_metric_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from tests.test_utils import assert_expected, captured_output

from torchtune.utils.metric_logging import (
CometLogger,
DiskLogger,
StdoutLogger,
TensorBoardLogger,
Expand Down Expand Up @@ -165,3 +166,36 @@ def test_save_config(self) -> None:
expected_config_path = "torchtune_config.yaml"
mock_save.assert_called_once_with(cfg, expected_config_path)
mock_wandb_save.assert_called_once_with(expected_config_path)


class TestCometLogger:
def test_log(self) -> None:
with patch("comet_ml.Experiment") as mock_experiment:
logger = CometLogger(project_name="test_project")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger = CometLogger(project_name="test_project")
logger = CometLogger(project="test_project")

for i in range(5):
logger.log("test_log", float(i) ** 2, i)
logger.close()

assert mock_experiment.return_value.log_metric.call_count == 5
for i in range(5):
mock_experiment.return_value.log_metric.assert_any_call(
"test_log", float(i) ** 2, step=i
)

def test_log_dict(self) -> None:
with patch("comet_ml.Experiment") as mock_experiment:
logger = CometLogger(project_name="test_project")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger = CometLogger(project_name="test_project")
logger = CometLogger(project="test_project")

metric_dict = {f"log_dict_{i}": float(i) ** 2 for i in range(5)}
logger.log_dict(metric_dict, 1)
logger.close()

mock_experiment.return_value.log_metrics.assert_called_with(
metric_dict, step=1
)

def test_log_config(self) -> None:
with patch("comet_ml.Experiment") as mock_experiment:
logger = CometLogger(project_name="test_project")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger = CometLogger(project_name="test_project")
logger = CometLogger(project="test_project")

cfg = OmegaConf.create({"a": 1, "b": 2})
logger.log_config(cfg)
mock_experiment.return_value.log_parameters.assert_called_with(cfg)
126 changes: 125 additions & 1 deletion torchtune/utils/metric_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import time
from pathlib import Path

from typing import Mapping, Optional, Union
from typing import List, Mapping, Optional, Union

from numpy import ndarray
from omegaconf import DictConfig, OmegaConf
Expand Down Expand Up @@ -317,3 +317,127 @@ def close(self) -> None:
if self._writer:
self._writer.close()
self._writer = None


class CometLogger(MetricLoggerInterface):
"""Logger for use w/ Comet (https://www.comet.com/site/).
SalmanMohammadi marked this conversation as resolved.
Show resolved Hide resolved
For more information about arguments expected by Comet, see
https://www.comet.com/docs/v2/guides/experiment-management/configure-sdk/#for-the-experiment.

Args:
api_key (Optional[str]): Comet API key. It's recommended to configure the API Key from the environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just encourage users to use comet login?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should encourage users to use comet login.

Saving your Comet API Key in a git tracked file is a bad practice and we are of course not recommending it. Having the possibility to set your Comet API Key in your configuration is useful when you are rendering your configuration file and injecting the Comet API Key automatically from a secret manager place.

I do not know if that's something that's frequent when using torchtune.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I expect people will mostly just use the API key? By exporting it before running tune run .... Otherwise, where would they call comet.login()?

workspace (Optional[str]): Comet workspace name. If not provided, uses the default workspace.
project (Optional[str]): Comet project name. Defaults to Uncategorized.
experiment_name (Optional[str]): The name for comet experiment to be used for logging.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
experiment_name (Optional[str]): The name for comet experiment to be used for logging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still unhappy about this @Lothiraldan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm working on understanding why is it complaining. It only is complaining with the latest version of pydocstring.

experiment_key (Optional[str]): The key for comet experiment to be used for logging. Must be an alphanumeric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this needed for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experiment_key is used in two situations:

  • When resuming a Comet Experiment, it is the ID of the Experiment to resume logging data to.
  • When creating a new Comet Experiment, we generate a random ID by default. In various cases, it is useful to control the new ID (like having the same ID for the Comet Experiment and the Cloud provider training job).

string whose length is between 32 and 50 characters.
mode (Optional[str]): Control how the Comet experiment is started. "get": Continue logging to an existing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make a bulleted list here

Suggested change
mode (Optional[str]): Control how the Comet experiment is started. "get": Continue logging to an existing
mode (Optional[str]): Control how the Comet experiment is started.
- ``"get"``: Continue logging to an existing [...]
- ``"create"``: [...]
- ``"get_or_create"``: [...]

also minor nit, can we make the default option the first bullet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give it a try but I cannot seems to have found the right format. Here is how it looks if I align it with the arguments list
Screenshot 2024-08-08 at 21-36-06 CometLogger — torchtune main documentation

And I indent them even by one space, sphinx-build is complaining:

.../torchtune/torchtune/utils/metric_logging.py:docstring of torchtune.utils.metric_logging.CometLogger:20:Unexpected indentation.

Do you know what is the proper way to format it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ordered the options but couldn't find the right format to render an indented list. I'm not sure if the Google docstring support it.

experiment identified by the `experiment_key` value. "create": Always creates of a new experiment, useful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
experiment identified by the `experiment_key` value. "create": Always creates of a new experiment, useful
experiment identified by the ``experiment_key`` value. "create": Always creates of a new experiment, useful

for HPO sweeps. "get_or_create" (default): Starts a fresh experiment if required, or persists logging to
an existing one.
online (Optional[bool]): If True, the data will be logged to Comet server, otherwise it will be stored locally
in offline experiment. Default is `True`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in offline experiment. Default is `True`.
in an offline experiment. Default is ``True``.

experiment_name (Optional[str]): Name of the experiment. If not provided, Comet will auto-generate a name.
tags (Optional[List[str]]): Tags to associate with the experiment.
log_code (bool): Whether to log the source code. Defaults to True.
**kwargs: additional arguments to pass to Comet.start. See
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**kwargs: additional arguments to pass to Comet.start. See
**kwargs: additional arguments to pass to ``Comet.start``. See

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to type this (annoyingly)

Suggested change
**kwargs: additional arguments to pass to Comet.start. See
**kwargs: additional arguments to pass to Comet.start. See

https://www.comet.com/docs/v2/api-and-sdk/python-sdk/reference/Experiment-Creation/#comet_ml.ExperimentConfig

Example:
>>> from torchtune.utils.metric_logging import CometLogger
>>> logger = CometLogger(project_name="my_project", workspace="my_workspace")
>>> logger.log("my_metric", 1.0, 1)
>>> logger.log_dict({"my_metric": 1.0}, 1)
>>> logger.close()

Raises:
ImportError: If `comet_ml` package is not installed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ImportError: If `comet_ml` package is not installed.
ImportError: If ``comet_ml`` package is not installed.


Note:
This logger requires the comet_ml package to be installed.
You can install it with ``pip install comet_ml``.
You need to set up your Comet.ml API key before using this logger.
You can do this by setting the COMET_API_KEY environment variable
or by calling ``comet_ml.login()`` with your API key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I find comet login the easiest to do and is more comfortable for folks who don't know much code. What are your thoughts on just always recommending to setup via comet login so folks do not get confused with the different options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comet login works great in interactive situations but won't help in automated setups. I'm happy to highlight the comet login option and links to other options in the Comet docs if that works for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in #1221 (comment), if we expect users to be setting up a config and doing tune run ... how are they supposed to use comet.login()? We would have to call that inside the logger, and incorporate some API key via a secrets manager solution which would be different for every user.

As far as I can tell torchtune seems to heavily favor the CLI tune run style usage, rather than importing it and calling tune run programmatically. If this is indeed the case, I think we should actually remove the mention to comet.login() and have people export the api key, what do you guys think?

Copy link
Contributor

@RdoubleA RdoubleA Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was under the impression you can run comet login from CLI, is this not the case?

Either way I agree with @dzheng256, since CLI is the primary entry point for most users via tune run I would lean towards suggesting whatever fits that workflow best whether that's setting environment variables or a CLI command. I suppose for automated setups environment variable is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RdoubleA @dzheng256 Sorry about the confusion, there is bot a comet login command-line script and comet_ml.login that one can call in their Python script. I will clean up and double-check everything to point people to use the CLI comet login.

"""

def __init__(
self,
api_key: Optional[str] = None,
workspace: Optional[str] = None,
project: Optional[str] = None,
experiment_key: Optional[str] = None,
mode: Optional[str] = None,
online: Optional[bool] = None,
experiment_name: Optional[str] = None,
tags: Optional[List[str]] = None,
log_code: bool = True,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

):
try:
import comet_ml
except ImportError as e:
raise ImportError(
"``comet_ml`` package not found. Please install comet_ml using `pip install comet_ml` to use CometLogger."
"Alternatively, use the ``StdoutLogger``, which can be specified by setting metric_logger_type='stdout'."
) from e

_, self.rank = get_world_size_and_rank()

# Declare it early so further methods don't crash in case of
# Experiment Creation failure due to mis-named configuration for
# example
self.experiment = None

if self.rank == 0:
self.experiment = comet_ml.start(
api_key=api_key,
workspace=workspace,
project=project,
experiment_key=experiment_key,
mode=mode,
online=online,
experiment_config=comet_ml.ExperimentConfig(
log_code=log_code, tags=tags, name=experiment_name, **kwargs
),
)

def log(self, name: str, data: Scalar, step: int) -> None:
if self.experiment is not None:
self.experiment.log_metric(name, data, step=step)

def log_dict(self, payload: Mapping[str, Scalar], step: int) -> None:
if self.experiment is not None:
self.experiment.log_metrics(payload, step=step)

def log_config(self, config: DictConfig) -> None:
if self.experiment is not None:
resolved = OmegaConf.to_container(config, resolve=True)
self.experiment.log_parameters(resolved)

# Also try to save the config as a file
try:
self._log_config_as_file(config)
except Exception as e:
log.warning(f"Error saving Config to disk.\nError: \n{e}.")
return

def _log_config_as_file(self, config: DictConfig):
output_config_fname = Path(
os.path.join(
config.checkpointer.checkpoint_dir,
"torchtune_config.yaml",
)
)
OmegaConf.save(config, output_config_fname)

self.experiment.log_asset(
output_config_fname, file_name="torchtune_config.yaml"
)

def close(self) -> None:
if self.experiment is not None:
self.experiment.end()

def __del__(self) -> None:
self.close()
Loading