You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading a large 2D data (1000 × 1152) with a large number of (2,000 data in this case) in load_dataset, the error message OSError: Invalid flatbuffers message is reported.
When only 300 pieces of data of this size (1000 × 1152) are stored, they can be loaded correctly.
When 2,000 2D arrays are stored in each file, about 100 files are generated, each with a file size of about 5-6GB. But when 300 2D arrays are stored in each file, about 600 files are generated, which is too many files.
Describe the bug
When loading a large 2D data (1000 × 1152) with a large number of (2,000 data in this case) in
load_dataset
, the error messageOSError: Invalid flatbuffers message
is reported.When only 300 pieces of data of this size (1000 × 1152) are stored, they can be loaded correctly.
When 2,000 2D arrays are stored in each file, about 100 files are generated, each with a file size of about 5-6GB. But when 300 2D arrays are stored in each file, about 600 files are generated, which is too many files.
Steps to reproduce the bug
error:
reproduce:Here is just an example result, the real 2D matrix is the output of the ESM large model, and the matrix size is approximate
Expected behavior
load_dataset
load the dataset as normal asfeather.read_feather
Plus
load_dataset("parquet", data_files='test.arrow', split="train")
works fineEnvironment info
datasets
version: 3.2.0huggingface_hub
version: 0.26.5fsspec
version: 2024.9.0The text was updated successfully, but these errors were encountered: