Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get the original dataset name with username? #7311

Open
npuichigo opened this issue Dec 8, 2024 · 0 comments
Open

How to get the original dataset name with username? #7311

npuichigo opened this issue Dec 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@npuichigo
Copy link
Contributor

npuichigo commented Dec 8, 2024

Feature request

The issue is related to ray data ray-project/ray#49008 which it requires to check if the dataset is the original one just after load_dataset and parquet files are already available on hf hub.

The solution used now is to get the dataset name, config and split, then load_dataset again and check the fingerprint. But it's unable to get the correct dataset name if it contains username. So how to get the dataset name with username prefix, or is there another way to query if a dataset is the original one with parquet available?

@lhoestq

Motivation

ray-project/ray#49008

Your contribution

Would like to fix that.

@npuichigo npuichigo added the enhancement New feature or request label Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant