Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Generic way to handle nested events on Event Source Data Classes #2678

Open
2 tasks done
rafaelgsr opened this issue Jul 5, 2023 · 5 comments
Open
2 tasks done
Labels
help wanted Could use a second pair of eyes/hands need-customer-feedback Requires more customers feedback before making or revisiting a decision RFC

Comments

@rafaelgsr
Copy link
Contributor

rafaelgsr commented Jul 5, 2023

Is this related to an existing feature request or issue?

#2348

Which Powertools for AWS Lambda (Python) utility does this relate to?

Event Source Data Classes

Summary

This RFC proposes the implementation of a generic way to decode/unwrap nested events in Event Source Data Classes. When handling events, it may happen that the event passed into the lambda_handler wraps another event that was originated on a different resource. For example, an SQSEvent that wraps an event originated on Amazon S3.

The goal is to simplify how developers access and handle nested events, even when they are multi-level.

Use case

At this point, when the developer decorates the lambda_handler with @event_source, they declare the data_class for the event that invoked the Lambda function. If this event wraps another event - or a chain of events originated somewhere else, the developer has to manually instantiate new objects using the appropriate Event Source Data Classes.

This was partially addressed by #2348, but it covers only SQSEvents wrapping S3Events and SNSEvents. In practice, there can be dozens of different combinations of events, with more to come as AWS adds new features and use cases to the services. Moreover, nested events as such may be multi-level: S3 -> SNS -> SQS.

Implementing on Powertools each possible combination is not feasible, therefore I'm proposing a generic implementation that would also facilitate how developers are using this utility. With this approach, the developer would be able to traverse the chain of events until the root event, and still have access to relevant attributes from the parent i.e., the event wrapper.

Proposal

The proposal is to have a class called EventWrapper extending DictWrapper. This new class has the generic implementation to decode/unwrap nested events. Each relevant Event Source Data Class, such as SNSEvent, extends EventWrapper instead of DictWrapper. With that, SNSEvent (and the other relevant data classes) inherits a method called decode_nested_events (or decode_nested_event in singular - I'll elaborate on the singular vs. plural further down on the Potential Challenges section).

The method decode_nested_events accepts nested_event_class (the wrapped data class), and optionally the nested_event_content_deserializer (the implementation of the nested content deserializer, that can be in the format of JSON, Base64, GZIP, etc.). If no nested_event_content_deserializer is passed, the default is self._json_deserializer. For the latter, I intend to add implementations for the most common deserializers in EventWrapper. An example of that is compressed JSON CloudWatch Logs data coming in KinesisStreamEvents.

Here is a mock implementation (with no types yet):

def decode_nested_events(
        self, 
        nested_event_class, 
        nested_event_content_deserializer = None):
    
    if nested_event_content_deserializer is None:
        nested_event_content_deserializer = self._json_deserializer

    for content in self.nested_event_contents:
        yield nested_event_class(nested_event_content_deserializer(content))

I'm also planning to add a reference to the parent event into the nested event, so that the chain of events can be traversed back and forth. Another shortcut that I'm planning to add is a direct access to the root event. For the chain S3 -> SNS -> SQS, there would be a method decode_root_event to return an S3Event directly from the SQSEvent.

Here is a mock implementation of nested_event_contents:

@property
def nested_event_contents(self):
    for record in self["Records"]:
        yield record["body"]

The implementation of nested_event_contents assumes that self has an attribute called Records, and each Record has the attribute body with the actual content of the nested event. At this point, I assume this is the most common use case, but I know there are exceptions, hence this mock implementation may change. Nevertheless, the outliers have the possibility to override just this method to return the right content. An example of that is SNSEvent that would override that to return self.sns_message instead.

Note: Based on the analysis I've done so far, we can implement the proposal on this RFC without breaking changes.

Out of scope

Nothing so far.

Potential challenges

Plural vs. singular

One implementation challenge that affects the UX is the JSON format of these events. For instance, the JSON object representing an SQS event has the attribute Records. Within each Record, there is a nested event. As a consequence, using the same example, SQSEvent contains a collection of nested events instead of just one. Note that in the sample code above, I return an Iterator instead of just one event because of that.

In practice, there are multiple use cases that we empirically know that the event contains just one single Record. With that in mind, EventWrapper could provide a method called decode_nested_event (in singular) to allow decoding the nested evant in the Record[0]. This aims to allow an experience as such: event.decode_nested_event(SNSEvent).decode_nested_event(S3Event) to get direct access to the S3Event wrapped within an SNSEvent with is wrapped within an SQSEvent.

For the Iterable version (plural), the UX would be similar to (consider that event is an instance of SQSEvent):

sns_events = event.decode_nested_events(SNSEvent)
for sns_event in sns_events:
    s3_events = sns_event.decode_nested_events(S3Event)
    for s3_event in s3_events:
        print(s3_event.bucket.name)

Dependencies and Integrations

No response

Alternative solutions

No response

Acknowledgment

@rafaelgsr rafaelgsr added RFC triage Pending triage from maintainers labels Jul 5, 2023
@leandrodamascena leandrodamascena removed the triage Pending triage from maintainers label Jul 5, 2023
@leandrodamascena
Copy link
Contributor

Hello @rafaelgsr! This is fantastic because it solves a need that we have to create events every time we want to unwrap nested events. That way, we make it linear and very easy.

Let's analyze this RFC until next week, ok?

@leandrodamascena leandrodamascena added the help wanted Could use a second pair of eyes/hands label Jul 5, 2023
@heitorlessa heitorlessa self-assigned this Jul 10, 2023
@heitorlessa
Copy link
Contributor

Update: we're still prioritizing Pydantic v2 POC as it impacts a larger set of customers. We'll get back to review this RFC as soon as we're able.

@heitorlessa heitorlessa added the need-customer-feedback Requires more customers feedback before making or revisiting a decision label Sep 20, 2023
@heitorlessa heitorlessa removed their assignment Nov 10, 2023
@seshubaws
Copy link
Contributor

Hello @rafaelgsr, thank you for writing this RFC, this will be extremely useful to have! I would love to work with you to get the RFC and UX finalized. Some questions/comments:

“The implementation of nested_event_contents assumes that self has an attribute called Records, and each Record has the attribute body with the actual content of the nested event. At this point, I assume this is the most common use case, but I know there are exceptions, hence this mock implementation may change. Nevertheless, the outliers have the possibility to override just this method to return the right content. An example of that is SNSEvent that would override that to return self.sns_message instead.”

Should we consider adding the methods for some of the most popular outlier events for how they would return the right content? This would probably help users out as they would quickly know how to get the right info.

"In practice, there are multiple use cases that we empirically know that the event contains just one single Record. With that in mind, EventWrapper could provide a method called decode_nested_event (in singular) to allow decoding the nested evant in the Record[0]"

I think having a method to return only Record[0] is a great idea! We could definitely do that in addition to having a method for the Iterable version. Was the potential challenge to do this or not? Or did I misread something in that section?

@seshubaws
Copy link
Contributor

seshubaws commented Jan 17, 2024

Discussed offline with @rafaelgsr and got my questions clarified:

“The implementation of nested_event_contents assumes that self has an attribute called Records, and each Record has the attribute body with the actual content of the nested event. At this point, I assume this is the most common use case, but I know there are exceptions, hence this mock implementation may change. Nevertheless, the outliers have the possibility to override just this method to return the right content. An example of that is SNSEvent that would override that to return self.sns_message instead.”

Should we consider adding the methods for some of the most popular outlier events for how they would return the right content? This would probably help users out as they would quickly know how to get the right info.

The original comment was more saying we should implement some way for the .Record call to automatically call the specific event's Record equivalent, i.e. SNSEvent.Record should call SNSEvent.sns_message. I believe this means all event classes would share the DictWrapper as a super class.

"In practice, there are multiple use cases that we empirically know that the event contains just one single Record. With that in mind, EventWrapper could provide a method called decode_nested_event (in singular) to allow decoding the nested event in the Record[0]"

I think having a method to return only Record[0] is a great idea! We could definitely do that in addition to having a method for the Iterable version. Was the potential challenge to do this or not? Or did I misread something in that section?

Yes, we should implement both a method to get Record[0] and have a method that goes through Records in general.

@leandrodamascena
Copy link
Contributor

Hello @rafaelgsr and @seshubaws! I think we can create a PoC and discuss this implementation, what do you think?

I'm sure we'll have a lot of things to discuss about the best API design, user experience, plural vs singular and others when this code is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Could use a second pair of eyes/hands need-customer-feedback Requires more customers feedback before making or revisiting a decision RFC
Projects
Status: Pending review
Development

No branches or pull requests

4 participants