DynamoDB: Add ctk load table
interface for processing CDC events
#247
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
About
Running DynamoDB CDC events through Kinesis and processing them using an AWS Lambda is cumbersome more often than not, and not too suitable for collaboration and development purposes. This patch provides a standalone implementation, as a sister to the corresponding full-load implementation, DynamoDB Table Loader.
Documentation
Preview: https://cratedb-toolkit--247.org.readthedocs.build/io/dynamodb/cdc.html
Synopsis
Use AWS for real, or exercise using LocalStack.
Install
pip install 'cratedb-toolkit[kinesis] @ git+https:/crate/cratedb-toolkit.git@dynamodb-cdc-standalone'
Details
This data nozzle is tapping into Change data capture for DynamoDB Streams, in this case using Kinesis Data Streams, for maximum universality, because using Kinesis isn't a bad idea: We will also use it to ingest other event/record types in the future, thus the protocol identifier
kinesis+dynamodb+cdc://
. On the egress side, towards CrateDB, it will use thedata
/aux
column strategy.It doesn't mean it's not cloud-ready, it is just more universal, because it can be used both in an ad hoc / standalone operations mode, in development sandboxes, and can also be invoked on any other managed Python environment, at your disposal.
Backlog I
create_stream
,iterator_type
,sleep_time_no_records
, etc.Backlog II
See DynamoDB: General backlog #231.
/cc @juanpardo, @hlcianfagna, @hammerhead, @wierdvanderhaar, @karynzv