Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(glue): glue tables can include storage parameters #24498

Merged
merged 44 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
a3dfceb
removal: duplicate sentence
Rizxcviii Mar 7, 2023
3daf56a
addition: removal policies, dont want them in account after test
Rizxcviii Mar 7, 2023
37bcd79
addition: integ test construct
Rizxcviii Mar 7, 2023
1b5494b
adding storageParameters attribute to construct
Rizxcviii Mar 7, 2023
19d5a54
specific comment
Rizxcviii Mar 7, 2023
ced55e5
addition: README for the storageParameters
Rizxcviii Mar 7, 2023
a615848
removing quote from README
Rizxcviii Mar 8, 2023
7a90bc0
updating paramter comment
Rizxcviii Mar 8, 2023
c729a3f
integ test for non-functioning key
Rizxcviii Mar 8, 2023
0c5fa9a
updating README
Rizxcviii Mar 8, 2023
a71a336
addition: adding enums
Rizxcviii Mar 10, 2023
fba89cc
modification: StorageParameter as namespace for Keys, not values
Rizxcviii Mar 23, 2023
1c83552
modification: creating base parameter
Rizxcviii Mar 23, 2023
64242f3
modification: updating README
Rizxcviii Mar 23, 2023
d4da5b5
modification: updating README
Rizxcviii Mar 23, 2023
9e9c5e1
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Apr 3, 2023
db3cc72
modification: using alpha integ module
Rizxcviii Apr 3, 2023
d14beff
creating class like enum
Rizxcviii Apr 4, 2023
cc812d3
class like enum
Rizxcviii Apr 4, 2023
844fd4c
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Apr 13, 2023
bc6db05
removing extra value
Rizxcviii Apr 13, 2023
bd7a5d6
comment update
Rizxcviii Apr 13, 2023
2e6462b
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Apr 13, 2023
715dc36
changing to use enum like class
Rizxcviii Apr 13, 2023
c3c3b78
integ
Rizxcviii Apr 14, 2023
2f3944a
Merge branch 'main' into feature/glue-table-properties-parameters
comcalvi Apr 14, 2023
edb61ad
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Apr 17, 2023
a0081f7
updating comment
Rizxcviii May 2, 2023
35d27ac
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Jun 13, 2023
5d1d238
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Jun 26, 2023
8e7b802
fixing pkglint
Rizxcviii Jun 26, 2023
a46fe77
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Jul 3, 2023
a8c07dc
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Jul 13, 2023
6fe3da2
adding @see for external tables to redshift documentation
Rizxcviii Jul 17, 2023
3dbe0bb
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Jul 17, 2023
993fc4b
renaming storage descriptor attribute
Rizxcviii Jul 28, 2023
b6ce865
creating test case for duplicate keys
Rizxcviii Jul 28, 2023
fd3c961
changing API to be more simple
Rizxcviii Jul 28, 2023
3abcbf1
removing vendor
Rizxcviii Jul 28, 2023
8807835
Merge branch 'main' into feature/glue-table-properties-parameters
Rizxcviii Jul 28, 2023
13346da
misc
Rizxcviii Jul 31, 2023
5e0759b
broken example
Rizxcviii Jul 31, 2023
098a80c
integ test with additional param
Rizxcviii Jul 31, 2023
cb96655
Merge branch 'main' into feature/glue-table-properties-parameters
mrgrain Jul 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion packages/@aws-cdk/aws-glue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,24 @@ new glue.Table(this, 'MyTable', {
});
```

By default, an S3 bucket will be created to store the table's data and stored in the bucket root. You can also manually pass the `bucket` and `s3Prefix`:
Glue tables can be configured to contain user-defined properties, to describe the physical storage of table data, through the `storageParameters` property:

```ts
declare const myBucket: s3.Bucket;
declare const myDatabase: glue.Database;
new glue.Table(this, 'MyTable', {
database: myDatabase,
columns: [{
name: 'col1',
type: glue.Schema.STRING,
}],
dataFormat: glue.DataFormat.JSON,
storageParameters: {
'skip.header.line.count': 1,
separatorChar: ',',
}
});
```

### Partition Keys

Expand Down
30 changes: 30 additions & 0 deletions packages/@aws-cdk/aws-glue/lib/table.ts
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,35 @@ export interface TableProps {
* @default - The parameter is not defined
*/
readonly enablePartitionFiltering?: boolean;

/**
* The user-supplied properties for the description of the physical storage of this table. These properties help describe the format of the data that is stored within the crawled data sources.
*
* There are reserved keys that are used by AWS Glue. They CAN be mutated, but they are best left alone.
Rizxcviii marked this conversation as resolved.
Show resolved Hide resolved
*
* The key/value pairs that are allowed to be submitted are not limited, however their functionality is not guaranteed.
*
* @see https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html#r_CREATE_EXTERNAL_TABLE-parameters - under "TABLE PROPERTIES" contains a non-exhaustive list of the keys that have functionality.
*
* @example
*
* declare const glueDatabase: glue.IDatabase;
* const table = new glue.Table(this, 'Table', {
* database: glueDatabase,
* columns: [{
* name: 'col1',
* type: glue.Schema.STRING,
* }],
* dataFormat: glue.DataFormat.CSV,
* storageParameters: {
* foo: 'bar', // Will have no effect
Rizxcviii marked this conversation as resolved.
Show resolved Hide resolved
* 'skip.header.line.count': 1, // Will be used to skip the first line of the file
* },
* });
*
* @default - The parameter is not defined
*/
readonly storageParameters?: { [key: string]: any };
}

/**
Expand Down Expand Up @@ -330,6 +359,7 @@ export class Table extends Resource implements ITable {
serdeInfo: {
serializationLibrary: props.dataFormat.serializationLibrary.className,
},
parameters: props.storageParameters,
},

tableType: 'EXTERNAL_TABLE',
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
{
"version": "20.0.0",
"version": "30.1.0",
"files": {
"eef5abdc0f1ee16e5be447f60688757df6726f3c2d1d06c136e9bbdb99d96e1f": {
"553f4bc301289a3e09b1cd03c07892e4c62da6b35d057a57fe1613c230c27ef6": {
"source": {
"path": "aws-cdk-glue.template.json",
"packaging": "file"
},
"destinations": {
"current_account-current_region": {
"bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
"objectKey": "eef5abdc0f1ee16e5be447f60688757df6726f3c2d1d06c136e9bbdb99d96e1f.json",
"objectKey": "553f4bc301289a3e09b1cd03c07892e4c62da6b35d057a57fe1613c230c27ef6.json",
"assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
"Resources": {
"DataBucketE3889A50": {
"Type": "AWS::S3::Bucket",
"UpdateReplacePolicy": "Retain",
"DeletionPolicy": "Retain"
"UpdateReplacePolicy": "Delete",
"DeletionPolicy": "Delete"
},
"MyDatabase1E2517DB": {
"Type": "AWS::Glue::Database",
Expand Down Expand Up @@ -328,8 +328,8 @@
"Version": "2012-10-17"
}
},
"UpdateReplacePolicy": "Retain",
"DeletionPolicy": "Retain"
"UpdateReplacePolicy": "Delete",
"DeletionPolicy": "Delete"
},
"MyEncryptedTableBucket7B28486D": {
"Type": "AWS::S3::Bucket",
Expand Down Expand Up @@ -423,11 +423,6 @@
}
}
},
"MyPartitionFilteredTableBucket6ACAA137": {
"Type": "AWS::S3::Bucket",
"UpdateReplacePolicy": "Retain",
"DeletionPolicy": "Retain"
},
"MyPartitionFilteredTable324BA27A": {
"Type": "AWS::Glue::Table",
"Properties": {
Expand Down Expand Up @@ -477,13 +472,18 @@
[
"s3://",
{
"Ref": "MyPartitionFilteredTableBucket6ACAA137"
"Ref": "DataBucketE3889A50"
},
"/"
]
]
},
"OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
"Parameters": {
"separatorChar": ",",
"skip.header.line.count": 2,
"foo": "bar"
},
"SerdeInfo": {
"SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
},
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"version": "30.1.0",
"files": {
"21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22": {
"source": {
"path": "awscdkgluetableintegDefaultTestDeployAssert8BFB5B70.template.json",
"packaging": "file"
},
"destinations": {
"current_account-current_region": {
"bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
"objectKey": "21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json",
"assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"
}
}
}
},
"dockerImages": {}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"Parameters": {
"BootstrapVersion": {
"Type": "AWS::SSM::Parameter::Value<String>",
"Default": "/cdk-bootstrap/hnb659fds/version",
"Description": "Version of the CDK Bootstrap resources in this environment, automatically retrieved from SSM Parameter Store. [cdk:skip]"
}
},
"Rules": {
"CheckBootstrapVersion": {
"Assertions": [
{
"Assert": {
"Fn::Not": [
{
"Fn::Contains": [
[
"1",
"2",
"3",
"4",
"5"
],
{
"Ref": "BootstrapVersion"
}
]
}
]
},
"AssertDescription": "CDK bootstrap stack version 6 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI."
}
]
}
}
}
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"version":"20.0.0"}
{"version":"30.1.0"}
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
{
"version": "20.0.0",
"version": "30.1.0",
"testCases": {
"integ.table": {
"aws-cdk-glue-table-integ/DefaultTest": {
"stacks": [
"aws-cdk-glue"
],
"diffAssets": false,
"stackUpdateWorkflow": true
"assertionStack": "aws-cdk-glue-table-integ/DefaultTest/DeployAssert",
"assertionStackName": "awscdkgluetableintegDefaultTestDeployAssert8BFB5B70"
}
},
"synthContext": {},
"enableLookups": false
}
}
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@
{
"version": "20.0.0",
"version": "30.1.0",
"artifacts": {
"Tree": {
"type": "cdk:tree",
"properties": {
"file": "tree.json"
}
},
"aws-cdk-glue.assets": {
"type": "cdk:asset-manifest",
"properties": {
Expand All @@ -23,7 +17,7 @@
"validateOnSynth": false,
"assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}",
"cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}",
"stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/eef5abdc0f1ee16e5be447f60688757df6726f3c2d1d06c136e9bbdb99d96e1f.json",
"stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/553f4bc301289a3e09b1cd03c07892e4c62da6b35d057a57fe1613c230c27ef6.json",
"requiresBootstrapStackVersion": 6,
"bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version",
"additionalDependencies": [
Expand Down Expand Up @@ -93,12 +87,6 @@
"data": "MyEncryptedTable981A88C6"
}
],
"/aws-cdk-glue/MyPartitionFilteredTable/Bucket/Resource": [
{
"type": "aws:cdk:logicalId",
"data": "MyPartitionFilteredTableBucket6ACAA137"
}
],
"/aws-cdk-glue/MyPartitionFilteredTable/Table": [
{
"type": "aws:cdk:logicalId",
Expand Down Expand Up @@ -143,6 +131,59 @@
]
},
"displayName": "aws-cdk-glue"
},
"awscdkgluetableintegDefaultTestDeployAssert8BFB5B70.assets": {
"type": "cdk:asset-manifest",
"properties": {
"file": "awscdkgluetableintegDefaultTestDeployAssert8BFB5B70.assets.json",
"requiresBootstrapStackVersion": 6,
"bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version"
}
},
"awscdkgluetableintegDefaultTestDeployAssert8BFB5B70": {
"type": "aws:cloudformation:stack",
"environment": "aws://unknown-account/unknown-region",
"properties": {
"templateFile": "awscdkgluetableintegDefaultTestDeployAssert8BFB5B70.template.json",
"validateOnSynth": false,
"assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-deploy-role-${AWS::AccountId}-${AWS::Region}",
"cloudFormationExecutionRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-cfn-exec-role-${AWS::AccountId}-${AWS::Region}",
"stackTemplateAssetObjectUrl": "s3://cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}/21fbb51d7b23f6a6c262b46a9caee79d744a3ac019fd45422d988b96d44b2a22.json",
"requiresBootstrapStackVersion": 6,
"bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version",
"additionalDependencies": [
"awscdkgluetableintegDefaultTestDeployAssert8BFB5B70.assets"
],
"lookupRole": {
"arn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-lookup-role-${AWS::AccountId}-${AWS::Region}",
"requiresBootstrapStackVersion": 8,
"bootstrapStackVersionSsmParameter": "/cdk-bootstrap/hnb659fds/version"
}
},
"dependencies": [
"awscdkgluetableintegDefaultTestDeployAssert8BFB5B70.assets"
],
"metadata": {
"/aws-cdk-glue-table-integ/DefaultTest/DeployAssert/BootstrapVersion": [
{
"type": "aws:cdk:logicalId",
"data": "BootstrapVersion"
}
],
"/aws-cdk-glue-table-integ/DefaultTest/DeployAssert/CheckBootstrapVersion": [
{
"type": "aws:cdk:logicalId",
"data": "CheckBootstrapVersion"
}
]
},
"displayName": "aws-cdk-glue-table-integ/DefaultTest/DeployAssert"
},
"Tree": {
"type": "cdk:tree",
"properties": {
"file": "tree.json"
}
}
}
}
Loading