You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating a dataset from a list of datapoints, information is lost of the individual items.
Specifically, when creating a dataset from a list of datapoints (from another dataset). Either the datatype is lost or the values are lost. See examples below.
-> What is the best way to create a dataset from a list of datapoints?
e.g.: When running this code:
fromdatasetsimportload_dataset, Datasetcommonvoice_data=load_dataset("mozilla-foundation/common_voice_17_0", "it", split="test", streaming=True)
datapoint=next(iter(commonvoice_data))
out= [datapoint]
new_data=Dataset.from_list(out) #this loses datatype informationnew_data2=Dataset.from_list(out,features=commonvoice_data.features) #this loses value information
Describe the bug
When creating a dataset from a list of datapoints, information is lost of the individual items.
Specifically, when creating a dataset from a list of datapoints (from another dataset). Either the datatype is lost or the values are lost. See examples below.
-> What is the best way to create a dataset from a list of datapoints?
e.g.:
When running this code:
We get the following:
datapoint
: (the original datapoint)Original Dataset Features:
path
& andarray
) and has the correct datatype (Audio).New Dataset 1 Features:
New Dataset 2 Features:
Steps to reproduce the bug
Run:
Expected behavior
Expected:
datapoint == new_data[0]
AND
datapoint == new_data2[0]
Environment info
datasets
version: 3.1.0huggingface_hub
version: 0.26.2fsspec
version: 2024.3.1The text was updated successfully, but these errors were encountered: