[Feature Request] [Kernel] Delete a partition or delete Parquet files from a table #3988
Open
2 of 8 tasks
Labels
enhancement
New feature or request
Feature request
Which Delta project/connector is this regarding?
Overview
Our pipeline generates Parquet files outside of a Delta or Spark context so we use Delta Standalone to add the Parquet files to a Delta table for a partition. If the pipeline regenerates the Parquet files for a partition, we need the ability to be able to remove the old Parquet files for that partition and add the new Parquet files to the same partition.
Motivation
This feature will add more bulk-level updates versus row-level updates currently supported by Delta Kernel
Further details
Delta Standalone support AddFile and Remove file operations on the Delta log via code such as the following:
The Delta Kernel equivalent for AddFile ss creating
DataFileStatus
objects and then usingTransaction.generateAppendActions
. There doesn't seem to be an equivalent in the code for aRemoveFile
operation.In lieu of a
RemoveFile
operation, we could work around that by having the ability to delete a partition. We can then recreate the partition viaDataFileStatus
/Transaction.generateAppendActions
calls.Willingness to contribute
The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?
The text was updated successfully, but these errors were encountered: