Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include meta data in tasks #6786

Open
pdlje82 opened this issue Dec 12, 2024 · 4 comments
Open

include meta data in tasks #6786

pdlje82 opened this issue Dec 12, 2024 · 4 comments

Comments

@pdlje82
Copy link

pdlje82 commented Dec 12, 2024

Is your feature request related to a problem? Please describe.
Actually this is not a feature request, I just want your opinion on this workflow:

I have a lot of metadata I want to include in the json tasks, so that I can filter these tasks later in LabelStudio.

Describe the solution you'd like

I figured out that meta data can be included in a task like this:

[
    {
        "data": {
            "image": "s3:/...1.jpg",
            "meta_stuff_I_want_to_add": "meta data string1",    # <--
            "more_meta_stuff": "meta data string2"              # <--
        },
        "annotations": [],
        "predictions": [
            {
                "result": [
                    {....

After importing to LS, the meta data appears like this:
image

Also, the meta data is later saved in the annotation which is synced to a cloud storage, which I need.

(1)
So, is this task setup of the "correct"? Is there any other, better way to do this?

(2)
In order to keep filtering in labelstudio performant, should I rather work with one-hot encoding:

[
    {
        "data": {
            "image": "s3:/...1.jpg",
            "meta_feature_1": True
            "meta_feature_2": False
            ...

or stick to strings:

[
    {
        "data": {
            "image": "s3:/...1.jpg",
            "feature": "meta_feature_1"
            ...

What is your experience? With one-hot encoding I might end up with 100+ additional columns. Would this be a problem for the Backend DB or Labelstudio itself?

Thanks!

@heidi-humansignal
Copy link
Collaborator

Hello,

Yes, that is one of the ways to do it. Its just depends on if you want easier accessiltibility to the metadata. If there are quite a bit of meta data fields, you can also trying nested approach. Something like this:

data": { "my_text": "Opossums are great", "ref_id": 456, "meta_info": { "timestamp": "2020-03-09 18:15:28.212882", "location": "North Pole" }  },

It sort of groups the related categories together and filter should still work.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@pdlje82
Copy link
Author

pdlje82 commented Dec 17, 2024

Thanks, I will try

@pdlje82 pdlje82 closed this as completed Dec 17, 2024
@pdlje82 pdlje82 reopened this Dec 17, 2024
@heidi-humansignal
Copy link
Collaborator

Great, please let us know how that goes.

Thank you,
Abu

Comment by Abubakar Saad
Workflow Run

@pdlje82
Copy link
Author

pdlje82 commented Dec 25, 2024

Hi @heidi-humansignal,

I wonder if labelstudio can work with Booleans?

[ { "data": { "image": "s3:/...1.jpg", "meta_feature_1": True, ...

The "True" in the metadata appears as string in labelstudio with no option to switch to Boolean. In order to properly filter, I changed the data to a number int.

[ { "data": { "image": "s3:/...1.jpg", "meta_feature_1": 1, ...

With this, I can set the datatype to "int" in Labelstudio and filter properly. But this is a bit painful, as later in my workflow, when exporting the annotations, I have to convert the int in task.data.meta_feature_1 back to Boolean.

So, does Labelstudio somehow allow to import Boolean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants