Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scheduler): add csv support for get_file_metadata grpc method #1011

Closed
wants to merge 5 commits into from

Conversation

etolbakov
Copy link
Contributor

Which issue does this PR close?

Closes #.
I was exploring the code and came across the TODO about the csv file format support. So decided to address it.

Rationale for this change

The functionality extension.

What changes are included in this PR?

  • add csv / tlb file format support
  • fix minor typos

Are there any user-facing changes?

No

@@ -297,13 +298,24 @@ impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> SchedulerGrpc
// TODO shouldn't this take a ListingOption object as input?

let GetFileMetadataParams { path, file_type } = request.into_inner();
let file_format: Arc<dyn FileFormat> = match file_type.as_str() {
let file_format: Result<Arc<dyn FileFormat>, Status> = match file_type.as_str() {
"parquet" => Ok(Arc::new(ParquetFormat::default())),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that in the benchmark there's slightly different default for parquet:

ParquetFormat::default().with_enable_pruning(Some(true))

I wonder if we need consistency or it's ok to leave it as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pruning is default already, so in practice it will be the same.

_ => Err(tonic::Status::unimplemented(
"get_file_metadata unsupported file type",
)),
}?;
Copy link
Contributor Author

@etolbakov etolbakov Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to drop the "short circuit" from here to resolve the rust compilation error:

Type mismatch [E0308] expected `Result<Arc<CsvFormat>, Status>`, but found `Result<Arc<ParquetFormat>, Status>` 

please let me know if there's a better way of doing it?

@etolbakov
Copy link
Contributor Author

etolbakov commented Apr 29, 2024

@andygrove
Could you please take a look at this change when you have time?
Provided that there were no open ticket (but only todo in the code) I understand that maybe this change is of no use.

UPD: I will resolve the failing checks if we decide to proceed with this PR.

@etolbakov
Copy link
Contributor Author

@andygrove just a ping in case this notification is lost.
Could you please take a look?

@Dandandan Dandandan closed this Nov 20, 2024
@Dandandan Dandandan reopened this Nov 26, 2024
@etolbakov
Copy link
Contributor Author

Thank you for the feedback @Dandandan
I'll resolve the conflicts and address your comments in the next couple of days!

@milenkovicm
Copy link
Contributor

milenkovicm commented Nov 27, 2024

@Dandandan @etolbakov get_file_metadata has been removed from latest code (#1110). it was not clear what it was used for, no documentation nor any of the internal ballista code was using it.

can you please elaborate what was the point of it ?

@etolbakov
Copy link
Contributor Author

@milenkovicm @Dandandan
sorry for the confusion, as per description I was exploring the code and came across the TODO about the csv file format support. So decided to address it.
If it doesn't seem to be useful enough or creates more hassle we can close the ticket.
I'm happy to look into something else.

@milenkovicm
Copy link
Contributor

As project was not actively maintained and lot of work accumulated we decide to change project's scope and remove code so maintainers have smaller code base to maintain ( some details in #1067 ). This specific functionality has been removed as there is no clear benefit of keeping it.

Currently, there is active work ongoing towards #974

@etolbakov if you wish to contribute please join discord channel, there are ongoing discussions on tasks and priority

@etolbakov
Copy link
Contributor Author

@milenkovicm makes sense! Thanks!
I’m already in the discord, will check the ballista stream.
Happy if you close the issue.

@etolbakov etolbakov closed this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants