[CT-985] [Feature] on-run-end hook and results variable support for source freshness command #5609

oravi · 2022-08-03T16:47:57Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently dbt source freshness does not support running an on-run-end hook.
Other dbt commands like dbt run, dbt test, dbt snapshot and more already support running an on-run-end hook and getting results jinja variable as their context.
Using an on-run-end hook and the results variable in the dbt source freshness command, we can upload the source freshness results to the data warehouse. This could be used for displaying source freshness results on a dashboard, monitoring, and alerting.

Describe alternatives you've considered

The dbt source freshness command already writes the freshness results to a file called sources.json.
A possible alternative is to process this json file to get the source freshness results.
This solution requires implementing an external solution that will process this json and automatically upload the results to the data warehouse after every execution of the dbt source freshness command.
This alternative is harder to implement and requires implementing code outside of the dbt project.
Additionally, this solution requires orchestrating this new tool and giving it access to the same project so it can extract the source freshness results from the sources.json file.
Support for on-run-end and results variable will simplify this dramatically, enabling it to be implemented in the dbt project itself.

Who will this benefit?

dbt users who rely on on-run-end hook and the results variable to monitor their dbt commands, executions and results.
dbt users who want to build dashboards to present their source freshness results.
dbt users who rely on dbt packages like Elementary to upload their dbt artifacts and monitor their dbt projects.

Are you interested in contributing this feature?

Yes

Anything else?

No response

The text was updated successfully, but these errors were encountered:

jtalmi · 2022-08-08T22:28:04Z

@lostmygithubaccount is something I can contribute to?

lostmygithubaccount · 2022-08-11T20:37:29Z

@jtalmi apologies I missed that comment! I've been messing around with how I check my GitHub notifications and I think it fell through the cracks. We would welcome contribution on this issue and can likely provide some guidance if needed.

@oravi thank you for opening. dbt source freshness is a bit different than the other commands that support on-run-end hooks, but we don't see any reason not to allow this. That said, it's likely not something we can prioritize in the near future and so we would be looking for help with the community (it sounds like @jtalmi might be up for it!). We'll leave this issue open on the backlog and mark it as help wanted.

jtcohen6 · 2022-08-12T13:24:20Z

The on-run-* hook + context behavior is defined in the safe_run_hooks, before_run, and after_run methods of the RunTask:

dbt-core/core/dbt/task/run.py

Lines 318 to 393 in a3b018f

 def run_hooks(self, adapter, hook_type: RunHookType, extra_context): 

 ordered_hooks = self.get_hooks_by_type(hook_type) 

 # on-run-* hooks should run outside of a transaction. This happens 

 # b/c psycopg2 automatically begins a transaction when a connection 

 # is created. 

 adapter.clear_transaction() 

 if not ordered_hooks: 

 return 

 num_hooks = len(ordered_hooks) 

 with TextOnly(): 

 fire_event(EmptyLine()) 

 fire_event(HooksRunning(num_hooks=num_hooks, hook_type=hook_type)) 

 startctx = TimestampNamed("node_started_at") 

 finishctx = TimestampNamed("node_finished_at") 

 for idx, hook in enumerate(ordered_hooks, start=1): 

 hook._event_status["started_at"] = datetime.utcnow().isoformat() 

 hook._event_status["node_status"] = RunningStatus.Started 

 sql = self.get_hook_sql(adapter, hook, idx, num_hooks, extra_context) 

 hook_text = "{}.{}.{}".format(hook.package_name, hook_type, hook.index) 

 hook_meta_ctx = HookMetadata(hook, self.index_offset(idx)) 

 with UniqueID(hook.unique_id): 

 with hook_meta_ctx, startctx: 

 fire_event( 

 PrintHookStartLine( 

 statement=hook_text, 

 index=idx, 

 total=num_hooks, 

 node_info=hook.node_info, 

 ) 

 ) 

 with Timer() as timer: 

 if len(sql.strip()) > 0: 

 response, _ = adapter.execute(sql, auto_begin=False, fetch=False) 

 status = response._message 

 else: 

 status = "OK" 

 self.ran_hooks.append(hook) 

 hook._event_status["finished_at"] = datetime.utcnow().isoformat() 

 with finishctx, DbtModelState({"node_status": "passed"}): 

 hook._event_status["node_status"] = RunStatus.Success 

 fire_event( 

 PrintHookEndLine( 

 statement=hook_text, 

 status=status, 

 index=idx, 

 total=num_hooks, 

 execution_time=timer.elapsed, 

 node_info=hook.node_info, 

 ) 

 ) 

 # `_event_status` dict is only used for logging. Make sure 

 # it gets deleted when we're done with it 

 del hook._event_status["started_at"] 

 del hook._event_status["finished_at"] 

 del hook._event_status["node_status"] 

 self._total_executed += len(ordered_hooks) 

 with TextOnly(): 

 fire_event(EmptyLine()) 

 def safe_run_hooks( 

 self, adapter, hook_type: RunHookType, extra_context: Dict[str, Any] 

 ) -> None: 

 try: 

 self.run_hooks(adapter, hook_type, extra_context) 

 except RuntimeException: 

 fire_event(DatabaseErrorRunning(hook_type=hook_type.value)) 

 raise

dbt-core/core/dbt/task/run.py

Lines 418 to 439 in a3b018f

 def after_run(self, adapter, results): 

 # in on-run-end hooks, provide the value 'database_schemas', which is a 

 # list of unique (database, schema) pairs that successfully executed 

 # models were in. For backwards compatibility, include the old 

 # 'schemas', which did not include database information. 

 database_schema_set: Set[Tuple[Optional[str], str]] = { 

 (r.node.database, r.node.schema) 

 for r in results 

 if r.node.is_relational 

 and r.status not in (NodeStatus.Error, NodeStatus.Fail, NodeStatus.Skipped) 

 } 

 self._total_executed += len(results) 

 extras = { 

 "schemas": list({s for _, s in database_schema_set}), 

 "results": results, 

 "database_schemas": list(database_schema_set), 

 } 

 with adapter.connection_named("master"): 

 self.safe_run_hooks(adapter, RunHookType.End, extras)

The reason FreshnessTask doesn't run hooks today is that it inherits from the GraphRunnableTask (which doesn't know about on-run-* hooks), rather than the RunTask. To turn on hooks for source freshness, that hook code might need to move back onto GraphRunnableTask to accommodate. If that's the approach taken, CompileTask would need to disable those hook methods in order to preserve existing behavior (= not running on-run-* hooks).

oravi · 2022-08-19T06:01:31Z

Thanks @jtcohen6 ! could you please share a bit more context about GraphRunnableTask and RunTask? What's the unique purpose of each one of them? Specifically curious to learn why FreshnessTask does not inherit from RunTask.

Also in case we will go with the suggested approach - as you mentioned I saw that CompileTask inherits from GraphRunnableTask and RunTask inherits from CompileTask. When you say that we need to disable those hooks to avoid running them during compile - you mean we just need to override them inside CompileTask and do nothing inside their implementation (just return). Then in RunTask should we specifically call the implementation of GraphRunnableTask?

Thanks for all your help here!

elongl · 2023-01-06T00:16:48Z

Hi @jtcohen6 @lostmygithubaccount,
I'm thinking of picking this task up soon.
I would appreciate a confirmation that this feature could be merged once implemented and reviewed.
Thanks.

lostmygithubaccount · 2023-01-06T00:52:53Z

hi @elongl, Jeremy or one of the engineers should be able to answer for sure -- I think so, though there are some ongoing internal API refactors for v1.5 that could affect this

jtcohen6 · 2023-02-25T09:41:36Z

@oravi @elongl Sorry for the long delay getting back to you!

Our current task logic is implemented with a lot of class inheritance, rather than composition, which makes this logic much trickier to follow than it should be. That is something we'd be hoping to improve as part of our API refactors over the next several months, but we haven't touched this part of the codebase just yet.

I think the desirable outcomes here are:

Move logic on running hooks up into GraphRunnableTask
Use that same logic in FreshnessTask, which should run hooks (this issue)
Disable that logic in CompileTask, which shouldn't run hooks

If one of you wanted to open a PR for that, I'd be happy to see it merged!

CompileTask, which shouldn't run hooks

This is status quo & my opinion — does anyone have a different opinion?

ofek1weiss · 2023-12-31T14:20:28Z

Hi @jtcohen6 👋
I'd be happy to contribute this feature, tho i have a question about the solution you suggested:
As i look at the code now, it seems that more classes inherit from GraphRunnableTask (like CloneTask and ListTask), so your suggested solution of moving the logic to it and disabling it in the inheriting classes sounds a bit messy, so i thought of a few solutions:

Make FreshnessTask inherit from RunTask as well, tho there might be a reason i do not fully understand that explains why this cannot be the case.
Add a new class to the hierarchy, something like HooksTask that exists lower in the hierarchy tree, and both RunTask and FreshnessTask inherit.

ofek1weiss · 2023-12-31T16:07:17Z

Went ahead and opened this PR that implements this suggested solution:

Make FreshnessTask inherit from RunTask as well, tho there might be a reason i do not fully understand that explains why this cannot be the case.

Functionally it seems to work fine.

oravi added enhancement New feature or request triage labels Aug 3, 2022

github-actions bot changed the title ~~[Feature] on-run-end hook and results variable support for source freshness command~~ [CT-985] [Feature] on-run-end hook and results variable support for source freshness command Aug 3, 2022

lostmygithubaccount self-assigned this Aug 3, 2022

jtcohen6 added the Team:Execution label Aug 9, 2022

lostmygithubaccount added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed triage labels Aug 11, 2022

lostmygithubaccount removed their assignment Aug 11, 2022

NiallRees mentioned this issue Aug 17, 2022

Add source freshness status to sources table brooklyn-data/dbt_artifacts#166

Open

Maayan-s mentioned this issue Aug 21, 2022

Support source-freshness results and Slack alerts elementary-data/elementary#257

Closed

lostmygithubaccount added the triage label Jan 6, 2023

jtcohen6 self-assigned this Jan 9, 2023

jtcohen6 removed the triage label Feb 25, 2023

jtcohen6 removed their assignment Feb 25, 2023

glsdown mentioned this issue Mar 31, 2023

Feature request: Add source/fact table for source-freshness executions brooklyn-data/dbt_artifacts#272

Closed

glsdown mentioned this issue Apr 26, 2023

[Feature]: Freshness executions brooklyn-data/dbt_artifacts#292

Closed

3 tasks

jtcohen6 removed the Team:Execution label Jul 19, 2023

ofek1weiss mentioned this issue Dec 31, 2023

added hook support for dbt source freshness #9322

Closed

5 tasks

This was referenced Jan 12, 2024

Feature/source freshness hooks #9366

Merged

[CT-3394] [Epic] Stabilize External Interfaces #9099

Closed

MichelleArk self-assigned this Jan 16, 2024

MichelleArk closed this as completed in #9366 Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-985] [Feature] on-run-end hook and results variable support for source freshness command #5609

[CT-985] [Feature] on-run-end hook and results variable support for source freshness command #5609

oravi commented Aug 3, 2022

jtalmi commented Aug 8, 2022

lostmygithubaccount commented Aug 11, 2022

jtcohen6 commented Aug 12, 2022

oravi commented Aug 19, 2022

elongl commented Jan 6, 2023

lostmygithubaccount commented Jan 6, 2023

jtcohen6 commented Feb 25, 2023

ofek1weiss commented Dec 31, 2023

ofek1weiss commented Dec 31, 2023 •

edited

Loading

[CT-985] [Feature] on-run-end hook and results variable support for source freshness command #5609

[CT-985] [Feature] on-run-end hook and results variable support for source freshness command #5609

Comments

oravi commented Aug 3, 2022

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

jtalmi commented Aug 8, 2022

lostmygithubaccount commented Aug 11, 2022

jtcohen6 commented Aug 12, 2022

oravi commented Aug 19, 2022

elongl commented Jan 6, 2023

lostmygithubaccount commented Jan 6, 2023

jtcohen6 commented Feb 25, 2023

ofek1weiss commented Dec 31, 2023

ofek1weiss commented Dec 31, 2023 • edited Loading

ofek1weiss commented Dec 31, 2023 •

edited

Loading