Inconsistent Query Results Based on Output Fields Selection in Milvus Dashboard #28668
-
Hello Milvus Team, I have been experiencing an issue where the search results from the Milvus dashboard change depending on whether certain output fields are selected. Details: Interestingly, when I perform the vector search by excluding the I have attached two screenshots showing the difference in the results based on the output fields selected. Language/SDK Used: from llama_index.vector_stores import MilvusVectorStore
from llama_index import ServiceContext, VectorStoreIndex
# Some other lines..
# Setup for MilvusVectorStore and query execution
vector_store = MilvusVectorStore(uri=MILVUS_URI,
token=MILVUS_API_KEY,
collection_name=collection_name,
embedding_field='embeddings_vector',
doc_id_field='chunk_id',
similarity_metric='IP',
text_key='chunk_text')
embed_model = get_embeddings()
service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
query_engine = index.as_query_engine(similarity_top_k=5, streaming=True)
rag_result = query_engine.query(prompt) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The "good" result:
The "bad" result:
Seems there are duplicate IDs in the 8000 entities. The rank 1 of "good" and "bad" are two entities with the same chunk_id=0 and id=0, and they have the same embedding(because the score is equal), but they have different text. |
Beta Was this translation helpful? Give feedback.
-
Hi, thank you for the prompt reply. You were absolutely right, and I checked the database carefully and turned out the primary ids were not assigned the way to avoid the collision. Re-defining the id fixed this issue. So probably the database returned the closest "good" embedding vector, it was not returning the corresponding text due to the duplication error, and "wrong" texts were somehow filtered out by unselecting the embedding_vector from the output field. Thank you so much for your help. |
Beta Was this translation helpful? Give feedback.
The "good" result:
The "bad" result:
Seems there are duplicate IDs in the 8000 entities.
The rank 1 of "good" and "bad" are two entities with the same chunk_id=0 and id=0, and they have the same embedding(because the score is equal), but they have different text.