Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-stepfunctions: add keyPath parameter to the S3JsonItemReader constructor #29889

Open
2 tasks
Lykan9999 opened this issue Apr 18, 2024 · 3 comments
Open
2 tasks
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2

Comments

@Lykan9999
Copy link

Describe the feature

add an optional keyPath parameter to the S3JsonItemReader constructor, that would allow for more dynamic S3 object fetching. Right now the key field only accepts a hard-coded string values. and we can't specify an object key that might be generated dynamically from a previous step

Use Case

DistributedMaps are great for orchestrating iterable tasks on big data, imagine a flow where the first step is a data fetching task that reads a huge dataset from a data vendor and writes a JSON dump of the responses in a S3 bucket, then you want to iterate over each of the responses in a DistributedMap, the data fetching service can generate the objectKey dynamically and set it to for example $.data.objectKey the the DistributedMap can reference it as follows:-

...
itemReader: new sfn.S3JsonItemReader({
    bucket: bucket,
    keyPath: `$.data.objectKey`
})
...

Proposed Solution

Add akeyPath parameter to the S3JsonItemReader constructor

Other Information

I am not sure if there is an existing way we can do this, if yes then please enlighten me and mind my ignorance.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.131.0

Environment details (OS name and version, etc.)

MacOS Sonoma 14.4.1 (23E224)

@Lykan9999 Lykan9999 added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Apr 18, 2024
@github-actions github-actions bot added the @aws-cdk/aws-stepfunctions Related to AWS StepFunctions label Apr 18, 2024
@khushail khushail added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Apr 18, 2024
@khushail khushail self-assigned this Apr 18, 2024
@khushail khushail removed the needs-triage This issue or PR still needs to be triaged. label Apr 18, 2024
@khushail
Copy link
Contributor

Thanks @Lykan9999 for reaching out. Correct me if I understood it wrong. You want to make the keys dynamic so I think best way to implement this would be to make 'key' field accept dynamic values rather than hardcoded string.
However I would leave it upto the community and team to decide on implementation.

@khushail khushail added p2 effort/small Small work item – less than a day of effort and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Apr 18, 2024
@khushail khushail removed their assignment Apr 18, 2024
@dreessan
Copy link

This would be useful to us as well. We'r trying to iterate over a fetched SQL table. The simplest way for Athena to iterate over rows is to store the SELECT as a .csv whose output location we cannot specify ahead of time.

@svleeuwen
Copy link

svleeuwen commented May 29, 2024

This is an essential feature. We use this in all our workflows (fetch from db -> write to S3 -> use map to iterate and process)

The key should have a unique name to prevent getting overwritten when several processes run in parallel. So we can't hard code it.

When editing state machine via AWS console you can do this and also specify bucket path.
image
Resulting in the configuration:

"ItemReader": {
    "Resource": "arn:aws:states:::s3:getObject",
    "ReaderConfig": {
      "InputType": "JSON"
    },
    "Parameters": {
      "Bucket.$": "$.bucketName",
      "Key.$": "$.fileKey"
    }
  },

A hard coded bucket name and dynamic key is supported as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-stepfunctions Related to AWS StepFunctions effort/small Small work item – less than a day of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

No branches or pull requests

4 participants