Inconsistent Query Results Based on Output Fields Selection in Milvus Dashboard #28668

Izukimat · 2023-11-22T16:52:37Z

Izukimat
Nov 22, 2023

Hello Milvus Team,

I have been experiencing an issue where the search results from the Milvus dashboard change depending on whether certain output fields are selected.

Details:
I am working on RAG with text data converted into embeddings, stored in a Milvus collection with approximately 8000 elements.
Last week, the retrieval results with the following snippet matched my expectations (referred to as "good"). However, this week the retrieval results have degraded ("bad").

Interestingly, when I perform the vector search by excluding the embeddings_vector field from the output fields in the Milvus dashboard, I receive the original "good" results. Including the embeddings_vector in the output changes the results to the current "bad" state.

I have attached two screenshots showing the difference in the results based on the output fields selected.
Any insights or guidance on this issue would be greatly appreciated. Thanks!

Language/SDK Used:
Python 3.11
SDK: pymilvus 2.3.2
llama_index 0.8.64

from llama_index.vector_stores import MilvusVectorStore
from llama_index import ServiceContext, VectorStoreIndex

# Some other lines..

# Setup for MilvusVectorStore and query execution
vector_store = MilvusVectorStore(uri=MILVUS_URI,
                                 token=MILVUS_API_KEY,
                                 collection_name=collection_name,
                                 embedding_field='embeddings_vector',
                                 doc_id_field='chunk_id',
                                 similarity_metric='IP',
                                 text_key='chunk_text')

embed_model = get_embeddings()
service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
query_engine = index.as_query_engine(similarity_top_k=5, streaming=True)

rag_result = query_engine.query(prompt)

Good result without showing embedding vector

Bad result with showing embedding vector

Answered by yhmo

Nov 23, 2023

The "good" result:

rank 1, id=0, chunk_id=0, score=0.79736596, text = ".... report a good start to 2023 ....",
rank 2, id=8, chunk_id=8, score=0.790876

The "bad" result:

rank 1, id=0, chunk_id=0, score=0.79736596, text = "Highlights ...2020.",
rank 2, id=8, chunk_id=8, score=0.790876

Seems there are duplicate IDs in the 8000 entities.

The rank 1 of "good" and "bad" are two entities with the same chunk_id=0 and id=0, and they have the same embedding(because the score is equal), but they have different text.

View full answer

yhmo · 2023-11-23T02:43:13Z

yhmo
Nov 23, 2023
Collaborator

The "good" result:

rank 1, id=0, chunk_id=0, score=0.79736596, text = ".... report a good start to 2023 ....",
rank 2, id=8, chunk_id=8, score=0.790876

The "bad" result:

rank 1, id=0, chunk_id=0, score=0.79736596, text = "Highlights ...2020.",
rank 2, id=8, chunk_id=8, score=0.790876

Seems there are duplicate IDs in the 8000 entities.

The rank 1 of "good" and "bad" are two entities with the same chunk_id=0 and id=0, and they have the same embedding(because the score is equal), but they have different text.

0 replies

Izukimat · 2023-11-24T00:25:22Z

Izukimat
Nov 24, 2023
Author

Hi, thank you for the prompt reply.

You were absolutely right, and I checked the database carefully and turned out the primary ids were not assigned the way to avoid the collision. Re-defining the id fixed this issue.

So probably the database returned the closest "good" embedding vector, it was not returning the corresponding text due to the duplication error, and "wrong" texts were somehow filtered out by unselecting the embedding_vector from the output field.

Thank you so much for your help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Query Results Based on Output Fields Selection in Milvus Dashboard #28668

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Inconsistent Query Results Based on Output Fields Selection in Milvus Dashboard #28668

Izukimat Nov 22, 2023

Replies: 2 comments

yhmo Nov 23, 2023 Collaborator

Izukimat Nov 24, 2023 Author

Izukimat
Nov 22, 2023

yhmo
Nov 23, 2023
Collaborator

Izukimat
Nov 24, 2023
Author