Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Sharing Server support for CDF request on tables readerVersion 3 #622

Open
danielmcstr opened this issue Dec 19, 2024 · 4 comments
Open

Comments

@danielmcstr
Copy link

I'm looking for informations regarding plans to support Change Data Feed (CDF) requests on tables with readerVersion 3 in the Delta Sharing Server. Currently, new Delta tables with readerVersion greater than 1 are only readable by delta-kernel, as delta-standalone does not support readerVersion > 1.

The implementation of queryCDF for DeltaKernelTable is currently hardcoded as "not supported yet":

override def queryCDF(
cdfOptions: Map[String, String],
includeHistoricalMetadata: Boolean = false,
maxFiles: Option[Int],
pageToken: Option[String],
responseFormatSet: Set[String] = Set("parquet"),
includeEndStreamAction: Boolean): QueryResult = {
throw new DeltaSharingUnsupportedOperationException("not implemented yet")
}

Is there a roadmap or timeline for implementing CDF support for readerVersion 3 tables?

Thank you!

@OussamaSaoudi-db
Copy link
Collaborator

Hi @danielmcstr, thanks for the issue! I'm not sure if this is entirely relevant to your usecase, but delta-sharing has recently added support for change data feed on version 3 tables with deletion vectors through a python/pandas api.

If this is indeed relevant, then I can say that more reader feature support is in the works at delta-kernel-rs which will open them up for delta-sharing's python connector.

@danielmcstr
Copy link
Author

Hi @OussamaSaoudi-db thank you for the information.

I could not use the delta sharing server to make a request to the endpoint /shares/{table.share}/schemas/{table.schema}/tables/{table.name}/changes with newer DeltaTables. It complains that the Delta protocol version (3,7) is too new for this version of Delta Standalone Reader/Writer (1,2).

@OussamaSaoudi-db
Copy link
Collaborator

Hmm seems that the endpoint you're using is going through Delta Standalone 🤔 The features mentioned require you to go through the python connector to leverage delta kernel rs. I'm not familiar with standalone, so I'll defer to @linzhou-db who may have more context. Sorry I couldn't be of much more help 😔

@aimtsou
Copy link

aimtsou commented Dec 26, 2024

Good afternoon @OussamaSaoudi-db, @danielmcstr:

I can read a table with Delta Sharing Sever version 1.2.0 and delta sharing client 1.2.0 with delta-kernel-rust-sharing-wrapper 0.1.0 with the following properties:

delta.enableChangeDataFeed=true
delta.enableDeletionVectors=true
delta.feature.changeDataFeed=supported
delta.feature.deletionVectors=supported
delta.minReaderVersion=3
delta.minWriterVersion=7

I can also read them with Delta Sharing Client 1.3.1 and delta-kernel-rust-sharing-wrapper 0.2.1

What I do is: spark.read.format("deltaSharing").load(urls[4]) after I have created all the short lived urls for the tables brought by the delta sharing client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants