Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make a custom LeRobotDataset with v2? #547

Open
alik-git opened this issue Dec 4, 2024 · 6 comments
Open

How to make a custom LeRobotDataset with v2? #547

alik-git opened this issue Dec 4, 2024 · 6 comments

Comments

@alik-git
Copy link

alik-git commented Dec 4, 2024

Hi folks, thanks for the amazing open source work!

I am trying to make a custom dataset to use with the LeRobotDataset format.

The readme says to copy the example scripts here which I've done, and I have a working format script of my own.

If your dataset format is not supported, implement your own in `lerobot/common/datasets/push_dataset_to_hub/${raw_format}_format.py` by copying examples like [pusht_zarr](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py), [umi_zarr](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/umi_zarr_format.py), [aloha_hdf5](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/aloha_hdf5_format.py), or [xarm_pkl](https://github.com/huggingface/lerobot/blob/main/lerobot/common/datasets/push_dataset_to_hub/xarm_pkl_format.py).

but when it comes time to create the dataset, the push_dataset_to_hub.py uses LeRobotDataset.from_preloaded which is no longer supported in dataset V2

lerobot_dataset = LeRobotDataset.from_preloaded(

So I'm just wondering what the proper way of loading your own custom local dataset is?

Thank you in advance for your help!

@alik-git alik-git changed the title How to make a custom dataset with v2? How to make a custom LeRobotDataset with v2? Dec 4, 2024
@alik-git
Copy link
Author

alik-git commented Dec 4, 2024

okay so I've found a work around for now. I initialize an empty dataset and add the frames to it, and then I can load it after calling dataset.consolidate(). If this is a proper way to do it, pls lmk and I'll make a PR with updates to the docs.

Otherwise please let me know what the right way to do this is. Thank you! I'll update this issue with my code once I've cleaned it up.

@Robert-hua
Copy link

I encountered the same issue.

@taochenshh
Copy link

@aliberts i also got the same issue, the documentation on how to generate custom dataset is not up to date now (the code doesn't run anymore). could you please up the instruction and relevant scripts for custom dataset generation? thanks

@aliberts
Copy link
Collaborator

Hey there,
Yes, all the push_to_hub script are deprecated in favor of scripts in examples/port_datasets (just one for now).

Basically, you need to create a new empty dataset using LeRobotDataset.create(), then add individual frames using add_frame(), then save the added frames into an episode using save_episode() (which actually saves data).
Then at the end you need to call the consolidate() method to handle a few more things (we will try to get rid of this step in future iterations) before finally calling the push_to_hub() method.

You can find more info about the changes of this new api in the PR (#461)

We will remove push_to_hub.py scripts in the future after adding more equivalent scripts like the one mentioned above in the examples section. Hope this helps!

@aliberts
Copy link
Collaborator

Will update the Readme soon!

@reproduce-bot
Copy link

The following script is generated by AI Agent to help reproduce the issue:

# lerobot/reproduce.py
import os
import pytest
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

def test_custom_lerobot_dataset():
    try:
        repo_id = "custom_repo"
        hf_dataset = None  # This should be replaced with actual dataset object
        episode_data_index = None  # This should be replaced with actual episode data index
        info = None  # This should be replaced with actual info
        videos_dir = "/path/to/videos"  # This should be replaced with actual videos directory

        # Attempt to create a LeRobotDataset using the from_preloaded method
        lerobot_dataset = LeRobotDataset.from_preloaded(
            repo_id=repo_id,
            hf_dataset=hf_dataset,
            episode_data_index=episode_data_index,
            info=info,
            videos_dir=videos_dir,
        )
        raise AssertionError("Test failed: from_preloaded method did not throw an error as expected.")
    except AttributeError as e:
        raise AssertionError(e)
    except Exception as e:
        raise AssertionError(e)

if __name__ == "__main__":
    test_custom_lerobot_dataset()

How to run:

python3 lerobot/reproduce.py

Expected Result:

Traceback (most recent call last):
  File "lerobot/reproduce.py", line 14, in test_custom_lerobot_dataset
    lerobot_dataset = LeRobotDataset.from_preloaded(
AttributeError: type object 'LeRobotDataset' has no attribute 'from_preloaded'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "lerobot/reproduce.py", line 28, in <module>
    test_custom_lerobot_dataset()
  File "lerobot/reproduce.py", line 23, in test_custom_lerobot_dataset
    raise AssertionError(e)
AssertionError: type object 'LeRobotDataset' has no attribute 'from_preloaded'

Thank you for your valuable contribution to this project and we appreciate your feedback! Please respond with an emoji if you find this script helpful. Feel free to comment below if any improvements are needed.

Best regards from an AI Agent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants