Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Parquet import can support column-based import like numpy #38740

Open
1 task done
zhuwenxing opened this issue Dec 25, 2024 · 1 comment
Open
1 task done
Assignees
Labels
kind/feature Issues related to feature request from users

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe.

I have multiple collections that need data import, but the vector fields are all the same, and I don't want to save the vector fields in multiple files.

collection 1

id.parquet
text.parquet
emb.parquet

collection 2

id.parquet
word.parquet
emb.parquet

Each parquet file has only one column, and the column name is the file name

According to the current format requirements, if we need to save (id,text,emb) as one file and (id,word,emb) as another file, this will bring additional storage

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

@zhuwenxing zhuwenxing added the kind/feature Issues related to feature request from users label Dec 25, 2024
@xiaofan-luan
Copy link
Collaborator

parquet is already a columnar storage.
usually people store all the column in one file rather than seperate it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Issues related to feature request from users
Projects
None yet
Development

No branches or pull requests

2 participants