Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dropping columns as a metadata-only operation #1076

Closed
wants to merge 6 commits into from

Conversation

liwensun
Copy link
Contributor

@liwensun liwensun commented Apr 15, 2022

Description

This PR proposes to support ALTER TABLE DROP COLUMN as a metadata-only operation in Delta. No changes needed for the underlying Parquet files. This is only supported for Delta tables that enable column mapping. With column mapping, if a new column with the same name is added later, it won't be confused with the dropped column because the new column will have a different physical name than the one being dropped.

Note that there is no way to physically delete the dropped column from Parquet right now. This will be addressed as future work.

This PR partially addresses issue #1064.

How was this patch tested?

unit tests

Does this PR introduce any user-facing changes?

Add a new SQL command

  • ALTER TABLE table_name DROP COLUMN col_name
  • ALTER TABLE table_name DROP COLUMNS (col_name1, col2_name2, ...)

GitOrigin-RevId: 5bac741beced1ec847ebcc07d050234f82ece124
GitOrigin-RevId: 09280364501777e2b0953d56294775457b6adc97
@liwensun liwensun changed the title Support dropping columns as a metadata operation Support dropping columns as a metadata-only operation Apr 15, 2022
private def columnMappingAdviceMessage: String = {
s"""
|Please upgrade your Delta table to reader version 2 and writer version 5
| and change the column mapping mode to name mapping. You can use the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name in quotes?


def columnRenameNotSupported: Throwable = {
val adviceMsg = columnMappingAdviceMessage
new AnalysisException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we planning to convert the exception to DeltaAnlysisException?

}
def foundInvalidCharsInColumnNames(cause: Throwable): Throwable =
new DeltaAnalysisException(
errorClass = "UNSUPPORTED_INVALID_CHARACTERS_IN_COLUMN_NAME",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to include the column name here to make it easy for the user to find the column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name shows up in the child exception, so i think that would be ok. it's a bit involved to surface it to the top level.

override def run(sparkSession: SparkSession): Seq[Row] = {
if (!sparkSession.sessionState.conf.getConf(
DeltaSQLConf.DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED)) {
val errorMessage = "Drop Column is not enabled in Delta."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop column is not enabled in current Spark session? May be mention the SQL config to enable it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we want to mention this flag right now. this flag is mainly for merging code without the support of physical drop.

val droppedColumnSet = columnsToDrop.map(UnresolvedAttribute(_).name).toSet
val droppingPartitionCols = metadata.partitionColumns.exists(droppedColumnSet.contains(_))
if (droppingPartitionCols) {
throw DeltaErrors.dropPartitionColumnNotSupported
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include the partition column name in the error message?

Comment on lines 276 to 295
if (sparkSession.sessionState.conf.getConf(
DeltaSQLConf.DELTA_ALTER_TABLE_CHANGE_COLUMN_CHECK_EXPRESSIONS)) {
columnsToDrop.foreach { columnParts =>
// need to check if the column to rename is referenced by check constraints
val dependentConstraints =
Constraints.findDependentConstraints(sparkSession, columnParts, newMetadata)
if (dependentConstraints.nonEmpty) {
throw DeltaErrors.foundViolatingConstraintsForColumnChange(
"drop", UnresolvedAttribute(columnParts).name, dependentConstraints)
}

// need to check if the renaming the column would affect any generated columns
val dependentGenCols = SchemaUtils.findDependentGeneratedColumns(
sparkSession, columnParts, txn.protocol, newMetadata.schema)
if (dependentGenCols.nonEmpty) {
throw DeltaErrors.foundViolatingGeneratedColumnsForColumnChange(
"drop", UnresolvedAttribute(columnParts).name, dependentGenCols)
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same code is repeated in RENAME with minimal difference. Convert it to a utility method, so that we don't miss updating one if we add a new check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call.

spark.sql(s"alter table t1 rename column map to `${colName("map")}`")

// try drop map after rename to arbitrary chars now
spark.sql(s"alter table t1 drop columns `${colName("map")}`")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify it actually removed the column? Same below.


simpleNestedData.write.format("delta").mode("append").saveAsTable("t1")

assertException("Cannot drop column a") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for: Cannot drop column a due to dependency (details on why it can't be dropped)?

class DeltaDropColumnSuite extends QueryTest with DeltaArbitraryColumnNameSuiteBase {

override protected val sparkConf: SparkConf =
super.sparkConf.set(DeltaSQLConf.DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED.key, "true")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test with DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED set to false and try dropping?

override protected val sparkConf: SparkConf =
super.sparkConf.set(DeltaSQLConf.DELTA_ALTER_TABLE_DROP_COLUMN_ENABLED.key, "true")

test("drop column") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many test cases within a single test. Is possible to split them into smaller and easy to understand tests?

GitOrigin-RevId: a40fd48ca89428254a1f728703417b9a6359a5fd
GitOrigin-RevId: 9829b0fc55bebbbc970faa0ebec5635057951de9
GitOrigin-RevId: e1d641b61c770f97c0a6b17cf580faeb699de221
Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vkorukanti vkorukanti closed this in 58b2510 May 3, 2022
@vkorukanti vkorukanti added this to the 2.0 milestone Jun 28, 2022
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
Support ALTER TABLE DROP COLUMN as a metadata only operation. This feature is not ready for release yet, so it's currently behind a feature flag.

Closes delta-io#1076

Unit tests

GitOrigin-RevId: 181d6dc38f5c074eb16710191795118e78fdf972
jbguerraz pushed a commit to jbguerraz/delta that referenced this pull request Jul 6, 2022
Support ALTER TABLE DROP COLUMN as a metadata only operation. This feature is not ready for release yet, so it's currently behind a feature flag.

Closes delta-io#1076

Unit tests

GitOrigin-RevId: 181d6dc38f5c074eb16710191795118e78fdf972
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants