Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annoying duplicate celery logging in Airflow task logs when running in celery worker #639

Closed
agreenburg opened this issue Nov 1, 2023 · 8 comments · Fixed by #648
Closed
Labels
area:logging Related to logging, like log levels, log formats, error logging, etc

Comments

@agreenburg
Copy link
Contributor

agreenburg commented Nov 1, 2023

As of astronomer-cosmos version 1.2.1:

When running tasks in a managed Astronomer cluster (using Celery workers), each log message from the astronomer-cosmos custom logger is being re-logged as a warning level message by the Celery logger. This is very annoying and makes the logs difficult to read:

[2023-11-01, 20:19:54 UTC] {local.py:194} INFO - Cloning project to writable temp directory /tmp/tmpy62_c947 from /usr/local/airflow/include/analytics_dbt
[2023-11-01, 20:19:54 UTC] {log.py:232} WARNING - [�[34m2023-11-01, 20:19:54 UTC�[0m] {�[34mlocal.py:�[0m194} INFO�[0m - �[35m(astronomer-cosmos)�[0m - Cloning project to writable temp directory /tmp/tmpy62_c947 from /usr/local/airflow/include/analytics_dbt�[0m
[2023-11-01, 20:19:54 UTC] {log.py:232} WARNING - [�[34m2023-11-01, 20:19:54 UTC�[0m] {�[34msubprocess.py:�[0m60} INFO�[0m - �[35m(astronomer-cosmos)�[0m - Tmp dir root location: 
 /tmp�[0m
[2023-11-01, 20:19:54 UTC] {subprocess.py:60} INFO - Tmp dir root location: 
 /tmp
[2023-11-01, 20:19:54 UTC] {log.py:232} WARNING - [�[34m2023-11-01, 20:19:54 UTC�[0m] {�[34msubprocess.py:�[0m73} INFO�[0m - �[35m(astronomer-cosmos)�[0m - Running command: ['/usr/local/airflow/dbt_venv/bin/dbt', 'deps']�[0m
[2023-11-01, 20:19:54 UTC] {subprocess.py:73} INFO - Running command: ['/usr/local/airflow/dbt_venv/bin/dbt', 'deps']
[2023-11-01, 20:19:54 UTC] {log.py:232} WARNING - [�[34m2023-11-01, 20:19:54 UTC�[0m] {�[34msubprocess.py:�[0m84} INFO�[0m - �[35m(astronomer-cosmos)�[0m - Command output:�[0m
[2023-11-01, 20:19:54 UTC] {subprocess.py:84} INFO - Command output:

I think this may be because of the way that stdout/stderr are redirected by Celery. Is it possible to make the custom logging implementation more compatible with Airflow's built-in logging/Celery?

@tatiana
Copy link
Collaborator

tatiana commented Nov 2, 2023

Hi @agreenburg, duplicate logs are indeed annoying.

Giving some context as to why they are duplicated:

  • There was a request for us to have an identifier in Cosmos logs: Improve logs to include astronomer-cosmos identifier #450. We managed to accomplish this by using a new logger handler. Attempts to change the Airflow log formatter failed since we didn't want to change all Airflow logs to contain this identifier.
  • initially, we disabled log propagation - so there was no log duplication. However, this created issues for users who were running in providers such as GCP because their logs would not be forwarded, which led to this: Fix log propagation issue #498

How could we improve this? If we can't find a way to better use the Airflow logging system, we could look into exposing an Airflow config that could be used by Cosmos to disable log propagation, for instance.

@agreenburg
Copy link
Contributor Author

agreenburg commented Nov 2, 2023

Thanks for the context @tatiana that's helpful.

It seems like adding a parameter when constructing the Dag might be a simple approach so that we can override the default behavior.

Something like this maybe? adding propagate_logs=True|False to either the dag init or the ExecutionConfig or something like that:

my_cosmos_dag = DbtDag(
    project_config=ProjectConfig(
        "/usr/local/airflow/dags/my_dbt_project",
    ),
    profile_config=profile_config,
    execution_config=ExecutionConfig(
        dbt_executable_path=f"{os.environ['AIRFLOW_HOME']}/dbt_venv/bin/dbt",
    ),
    propagate_logs=False,
    # normal dag parameters
    schedule_interval="@daily",
    start_date=datetime(2023, 1, 1),
    catchup=False,
    dag_id="my_cosmos_dag",
    default_args={"retries": 2},
)

@jlaneve
Copy link
Collaborator

jlaneve commented Nov 2, 2023

I wonder if this should be set with an environment variable / Airflow config too? Just thinking that if you have this issue in one of your DAGs, you'll likely have them in all of your DAGs within the same environment.

I think it's still worth being able to override the env var / config with an explicit parameter, so maybe a precedence order like so:

  1. If you have the explicit propagate_logs param, use that value
  2. If you have the env var / Airflow config set (and not the explicit param), use that value
  3. If you don't have anything set, default to propagate logs (current behavior)

What do you think @agreenburg @tatiana?

@tatiana
Copy link
Collaborator

tatiana commented Nov 2, 2023

@jlaneve I agree - I wonder if it wouldn't be simpler just to do (2) and (3)!

@agreenburg The challenge with the approach you suggested is that we'd have to make this parameter available wherever we're instantiating the logger handler, which can get a bit messy.

We could introduce the Airflow config:

AIRFLOW__COSMOS__PROPAGATE_LOGS = "True"

This would require. a change only on the logger module itself (and Cosmos docs):

from airflow.configuration import conf
conf.getboolean("cosmos", "propage_logs")

@agreenburg
Copy link
Contributor Author

Agreed that the Airflow config approach would be the simplest. I can't think of a use case where someone would want different behavior in different Dags.

@tatiana
Copy link
Collaborator

tatiana commented Nov 2, 2023

We could implement the Airflow config/default value - that would satisfy the (2) and (3) scenarios described by @jlaneve - and, in future - if necessary to control per DAG - we can implement (1).

@agreenburg would you be interested in making this contribution?

@agreenburg
Copy link
Contributor Author

@tatiana sure thing. I will put together a submission.

@agreenburg
Copy link
Contributor Author

It looks like in order to register default airflow configurations for cosmos we would need to set this package up as a provider, which it currently is not. Are you comfortable with me adding a get_provider_info for cosmos? Any risks there?

@tatiana tatiana added the area:logging Related to logging, like log levels, log formats, error logging, etc label Nov 8, 2023
tatiana pushed a commit that referenced this issue Nov 9, 2023
…opagation if desired (#648)

Add Airflow config check for cosmos/propagate_logs to allow override of
default propagation behavior.
Expose entry-point so that Airflow can theoretically detect
configuration default.

Closes #639 

## Breaking Change?

This is backward-compatible as it falls back to default behavior if the
`cosmos` section or `propagate_logs` option don't exist.

## Checklist

- [X] I have made corresponding changes to the documentation (if
required)
- [X] I have added tests that prove my fix is effective or that my
feature works

---------

Co-authored-by: Andrew Greenburg <[email protected]>
tatiana pushed a commit that referenced this issue Nov 9, 2023
…opagation if desired (#648)

Add Airflow config check for cosmos/propagate_logs to allow override of
default propagation behavior.
Expose entry-point so that Airflow can theoretically detect
configuration default.

Closes #639 

## Breaking Change?

This is backward-compatible as it falls back to default behavior if the
`cosmos` section or `propagate_logs` option don't exist.

## Checklist

- [X] I have made corresponding changes to the documentation (if
required)
- [X] I have added tests that prove my fix is effective or that my
feature works

---------

Co-authored-by: Andrew Greenburg <[email protected]>
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this issue Jul 14, 2024
…opagation if desired (astronomer#648)

Add Airflow config check for cosmos/propagate_logs to allow override of
default propagation behavior.
Expose entry-point so that Airflow can theoretically detect
configuration default.

Closes astronomer#639 

## Breaking Change?

This is backward-compatible as it falls back to default behavior if the
`cosmos` section or `propagate_logs` option don't exist.

## Checklist

- [X] I have made corresponding changes to the documentation (if
required)
- [X] I have added tests that prove my fix is effective or that my
feature works

---------

Co-authored-by: Andrew Greenburg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging Related to logging, like log levels, log formats, error logging, etc
Projects
None yet
3 participants