Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For source freshness, allow null error_after with non-null warning_after #3874

Closed
lchoward opened this issue Sep 13, 2021 · 2 comments · Fixed by #3955
Closed

For source freshness, allow null error_after with non-null warning_after #3874

lchoward opened this issue Sep 13, 2021 · 2 comments · Fixed by #3955
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!

Comments

@lchoward
Copy link

Describe the feature

I have a few sources where it would be great to have:

  • non-null warning_after
  • null error_after

Today, if you leave error_after empty while specifying a warning_after value or if you set error_after: null, the error_after value defaults to the parent value.

Describe alternatives you've considered

Not the same, but a hacky workaround is to set a high error_after value that should never hit.

Additional context

Who will this benefit?

I haven't seen too many people ask about this, but I'd imagine there must be many that would use this! Especially if highlighted in the docs as a use case.

Are you interested in contributing this feature?

No. But if I'm the only one asking for it, I'll reconsider lol.

@lchoward lchoward added enhancement New feature or request triage labels Sep 13, 2021
@jtcohen6 jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! and removed triage labels Sep 19, 2021
@jtcohen6
Copy link
Contributor

@lchoward Thanks for opening—totally fair thing to want. I think there's a bit of plumbing required to make this possible, but it is possible. Since I haven't seen too many folks asking for this, and since the "hacky" workaround (error_after: {period: day, count: 100000}) actually feels like a fairly reasonable one, I'm going to leave this as an exercise to the community contributor who wants to see this through. It could be you :)

Basically, we'd want to take the current merge/override behavior of the freshness dict, and extend that same behavior to the lower-level warn_after + error_after dicts. Here's the desirable behavior, adapting the example in the docs:

sources:
  - name: top_level_source
    freshness: # default freshness
      warn_after: {count: 12, period: hour}
      error_after: {count: 24, period: hour}
    loaded_at_field: loaded_at

    tables:

      - name: src_a
        freshness:
          warn_after: {count: 6, period: hour}
          # use the default error_after defined above

      - name: src_b
        freshness:
          warn_after: {count: 6, period: hour}
          error_after: {} # use the default error_after defined above

      - name: src_c
        freshness:
          warn_after: {count: 6, period: hour}
          error_after: null # override: disable error_after for this table

      - name: src_d
        freshness: null # override: disable freshness for this table

In order to achieve that behavior, here's a rough overview of the steps involved:

  • Time class: Change from Replaceable to Mergeable. This is important later, in order to achieve the desirable outcome of returning different results when a child source table defines error_after: {} (pass-through) vs. error_after: None (override/disable). Incidentally, this would also make it possible for a specific source table to override just the count or period of its parent source's warn_after or error_after.
  • Change count and period to be optional types with default None (i.e. Optional[<type>] = None). This is necessary to enable the {} option in the example above, which should have different behavior from the actual null case above.
  • Define a __bool__ magic method that returns True if both count and period are not None—this will be important for checking that a source actually has freshness defined, if freshness looks like error_after: {}, warn_after: {}. Currently, this raises a validation error, but it won't if the Time properties become optional.
  • FreshnessThreshold class: Change warn_after and error_after defaults from None to field(default_factory=Time). Then, change the __bool__ magic method to leverage the __bool__ methods defined in Time above, i.e. bool(self.warn_after) rather than warn_after is not None, now that they aren't None by default.
  • Finally, merge_freshness method: Instead of a simple base.merged(update) when both the parent + child source have freshness defined, we need to perform the same conditional checks one level deeper. So: If both parent error_after and child error_after are not None, take the merged value; if the parent is None and the child isn't, take the child's; if the child's is None, return None.

I don't know if that's an exhaustive list. There's also tests, of course: unit tests that would need updating to handle the changed classes, plus integration tests for the net-new functionality. I do think this is doable, and fair game for a motivated contributor, so I'd welcome a PR for it—it's just not high on the team's priority list.

@kadero
Copy link
Contributor

kadero commented Sep 26, 2021

Hello @jtcohen6 @ichoward 👋,

Indeed, this feature could be great for us too.
Happy to propose a PR for this feature if it's ok for you 🙂.

@jtcohen6 thanks for the implementation plan, it seems ok to me 👍.

Thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants