Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FacebookAdsReportToGcsOperator method _flush_rows() infers field names from first data point instead of declared fields #34173

Open
1 of 2 tasks
Taishan314 opened this issue Sep 7, 2023 · 7 comments
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@Taishan314
Copy link

Taishan314 commented Sep 7, 2023

Apache Airflow version

2.5.3+composer

What happened

I created a task to retrieve insight level ad data using the FacebookAdsReportToGcsOperator. Whilst running the pipeline, the dag failed with the following response:

[2023-09-07, 11:18:38 UTC] {taskinstance.py:1778} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/facebook_ads_to_gcs.py", line 151, in execute
    total_row_count = self._decide_and_flush(converted_rows_with_action=converted_rows_with_action)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/facebook_ads_to_gcs.py", line 183, in _decide_and_flush
    self._flush_rows(
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/transfers/facebook_ads_to_gcs.py", line 213, in _flush_rows
    writer.writerows(converted_rows)
  File "/opt/python3.8/lib/python3.8/csv.py", line 157, in writerows
    return self.writer.writerows(map(self._dict_to_list, rowdicts))
  File "/opt/python3.8/lib/python3.8/csv.py", line 149, in _dict_to_list
    raise ValueError("dict contains fields not in fieldnames: "
ValueError: dict contains fields not in fieldnames: 'action_values'

The field 'action_values' was in my requested fields, but I found that it didn't appear in all data points in the data set. Upon inspecting the code, I found that the __flush_rows() method infers the fields (denoted as headers) using the first data point.

Is it possible to get this method amended to infer headers from all requested fields?

What you think should happen instead

The __flush_rows() method shouldn't get the headers (fields) from the first data point, it should get them from the requested fields, or at least view all data points and use the one with the most fields in.

How to reproduce

Create and run a task using the FacebookAdsReportToGcsOperator.

api_version = v17.0
fields =["account_name","estimated_ad_recall_rate","video_avg_time_watched_actions","video_p100_watched_actions","video_p95_watched_actions","video_p25_watched_actions","video_play_actions","account_id","account_currency","campaign_name","campaign_id","objective","adset_name","adset_id","ad_name","ad_id","reach","impressions","clicks","spend","actions","action_values"]
params={"level": "ad","time_range": {"since": "2023-09-23", "until": "2023-09-29"},"breakdowns": ["age", "gender"],"action_breakdowns": ["action_type"],"action_report_time": "conversion","time_increment": 1}

Operating System

Windows 10

Versions of Apache Airflow Providers

apache-airflow-providers-google==10.7.0
apache-airflow-providers-facebook==3.2.1

Deployment

Google Cloud Composer

Deployment details

image version: composer-2.4.1-airflow-2.5.3
python version: 3 


Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Taishan314 Taishan314 added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Sep 7, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 7, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@potiuk potiuk added good first issue and removed needs-triage label for new issues that we didn't triage yet labels Sep 7, 2023
@eladkal
Copy link
Contributor

eladkal commented Sep 7, 2023

Is it a bug?
Please see #24758 (comment)

@eladkal eladkal added provider:google Google (including GCP) related issues area:providers pending-response and removed area:core labels Sep 8, 2023
@Taishan314
Copy link
Author

Taishan314 commented Sep 8, 2023

It is a bug, the flush_rows method accesses the first data points fields as the headers variable, instead of using 'fields'.

 def _flush_rows(self, converted_rows: list[Any] | None, object_name: str):
        if converted_rows:
            headers = converted_rows[0].keys()
            with tempfile.NamedTemporaryFile("w", suffix=".csv") as csvfile:
                writer = csv.DictWriter(csvfile, fieldnames=headers)
                writer.writeheader()
                writer.writerows(converted_rows)
                csvfile.flush()
                hook = GCSHook(
                    gcp_conn_id=self.gcp_conn_id,
                    impersonation_chain=self.impersonation_chain,
                )
                hook.upload(
                    bucket_name=self.bucket_name,
                    object_name=object_name,
                    filename=csvfile.name,
                    gzip=self.gzip,
                )
                self.log.info("%s uploaded to GCS", csvfile.name)

(Apologies for closing the issue, it was an accident)

@Taishan314 Taishan314 reopened this Sep 8, 2023
@eladkal
Copy link
Contributor

eladkal commented Sep 8, 2023

If you found the problem maybe you can open a PR with the fix? :)

@Taishan314
Copy link
Author

Yeah sure, I'll give it a go

@abinaya-sh
Copy link

Hey Guys, I would like to get an update on this because I am facing the same issue! Has this been resolved in any versions?

@Taishan314
Copy link
Author

No, apologies I have not had time to do it. I'll remove myself from the task so someone else can pick it up.

@Taishan314 Taishan314 removed their assignment Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
Development

No branches or pull requests

4 participants