-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify logging #906
Comments
This is small but a simple starting point would be to move to the LoggingMixin instead of a global logger. This would still be backwards compatible I believe, wouldn't change any behaviors, but at least provides opportunities to users to override logging behaviors, and is overall more idiomatic in Airflow world. |
This issue was reported again by the community: This was the Slack message body in case we lose the above thread data
Temptively adding this to the 1.5 milestone |
Big post incoming IssueUsers have been seeing duplicate logs and other logging related issues (see #639, #747. #906) for a while. In my experience, logging issues have gotten a little out of hand. When running the dev container, I noticed that the logs were being tripled. That was new to me, and I hadn’t recalled that occurring the last time I worked with Cosmos. After running the
Python logging idiomsThere are a handful of issues causing this, but I think the simplest way to make some progress toward addressing the root problem would be to note how this stuff is normally done. Let's say you do this: # a/b/c.py
logger = logging.getLogger(__name__)
logger.info("Hello, world!") The logger's namespace is This means that the ... Except, what I just said is not entirely true! Because of The reason Full demo: >>> import logging
>>> import httpx
>>> root = logging.getLogger()
>>> root.info("Hello world")
None
>>> # Nothing happens
>>> res = httpx.get("https://google.com/")
None
>>> root.addHandler(logging.StreamHandler())
>>> root.info("Hello world")
None
>>> root.setLevel(logging.INFO)
>>> root.info("Hello world")
Hello world
>>> # httpx use a logging.Logger w/o a handler, so it relies on our root logger's handler to print
>>> res = httpx.get("https://google.com/")
HTTP Request: GET https://google.com/ "HTTP/1.1 301 Moved Permanently"
>>> # Here we are intentionally creating a separate logger instance
>>> # With 2x stream handlers, this causes doubling of logs
>>> root_v2 = logging.getLogger()
>>> root_v2.addHandler(logging.StreamHandler())
>>> res = httpx.get("https://google.com/")
HTTP Request: GET https://google.com/ "HTTP/1.1 301 Moved Permanently"
HTTP Request: GET https://google.com/ "HTTP/1.1 301 Moved Permanently"
>>> # Note that if we assign a namespace to the logger, logs don't _triple_ though.
>>> namespaced_logger = logging.getLogger("a")
>>> namespaced_logger.addHandler(logging.StreamHandler())
>>> res = httpx.get("https://google.com/")
HTTP Request: GET https://google.com/ "HTTP/1.1 301 Moved Permanently"
HTTP Request: GET https://google.com/ "HTTP/1.1 301 Moved Permanently"
>>> # However in this case it will triple, since it uses the StreamHandler inside the above logger.
>>> namespaced_logger_v2 = logging.getLogger("a.b")
>>> namespaced_logger_v2.info("Hello world")
Hello world
Hello world
Hello world Issues with Cosmos's loggingFor context: the justification for the way Cosmos's logging is currently set up is to add a colored Under the hood, I would argue the reason Cosmos's logging is a little strange is because it doesn't seem to adhere to the idioms I've mentioned above in a few ways:
SolutionsI'm sorting these in order from least controversial to most controversial. They're also ordered in terms of "steps". 1. All
|
Do you guys have a suggested workaround or fix we can implement for the time being? |
@gbatiz #1047 puts a bandaid on the issue. In addition to setting That said, I feel like it is still a bandaid. The more I think about it, the more I am in favor of scrapping the custom logging entirely, and ditching the whole |
Thanks! |
I'd +1 removing it entirely too. |
This PR addresses #906 and fixes issues in the Cosmos logging once and for all.\* > \* Actually, there is another issue with logs being polluting with warnings about "source" nodes in 1.5.0, but that is a separate matter! I have a long explanation of how the `logging` module in Python works, and the sort of idioms it expects of end users of the module, here: #906 (comment) The choices I made, explained: - Although I don't know that I entirely agree with adding `(astronomer-cosmos)` to all the logs, clearly at least one user, and possibly many more, want it, and I don't believe we should remove it. The objective of this PR was therefore to preserve the feature while future-proofing against future issues. - Why I can't say I'm a fan of it: It seems that adding `(astronomer-cosmos)` to logs is a symptom of other problems with the Cosmos library, specifically how it impacts performance when users do not set it up effectively. And the prefix was added as a way to assist people in diagnosing these issues. I think ultimately we want to move away from this. Other components of the Airflow ecosystem do not feel compelled to do things like this. Also, the module path is something that can be handled in the `log_format` if users really want it. - How I future-proofed: As per the long post I link above, basically the issue is that there should not be tons of StreamHandlers being created. The proper and typical use of the logging module, with few exceptions, is to allow for logs to propagate upwards to a root logger. The reason the Cosmos logs presented issues for so long was because it deviated a lot from this. - I think default behavior being the "least astonishing" means making no modifications to the base logging behavior whatsoever. This is also less likely to morph into future issues if any further changes are made to the custom logging. - One thing I never mentioned: I found it odd that by default Cosmos did not "work out of the box" and that, despite using Astronomer's own Airflow platform (!), I had to set a config option that made Cosmos logging not be a nightmare (i.e. set `propagate_logs` = false). Previous logs referenced the Celery Executor as having issues, even though this is one of 2 of the most popular production ways to run Airflow. Something like this should just work out of the box for a majority of users! - For task execution, Cosmos should make use of the more Airflow-idiomatic `LoggingMixin` class whenever appropriate. This can also be used in scheduler / webserver related logging contexts but I think it is less out-of-place there to use globally scoped loggers. - These will not use the `get_logger()` implementation. That is intentional and probably desirable. These logs do not need to be "enriched" because they are isolated in the task execution logs. Oh also, I fixed an issue in the `project.entry_points` in the `pyproject.toml` while I was at it. ## Breaking Change? - Removes `propagate_logging` conf option, although removing this will not break users' builds. There is now a `rich_logging` conf option instead, which by default is disabled.
Closed in #1108 |
We add colors and other formatting to the logs that make it very difficult to read. This came up in the #airflow-dbt channel on the Airflow Slack: https://apache-airflow.slack.com/archives/C059CC42E9W/p1711624722823329
We should look at how we do logging today and potentially think about an overhaul. We should be able to remove the colors, duplicate timestamps, etc.
The text was updated successfully, but these errors were encountered: