Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Q&A mode always fails. #12142

Open
5 tasks done
gongshaojie12 opened this issue Dec 27, 2024 · 13 comments
Open
5 tasks done

The Q&A mode always fails. #12142

gongshaojie12 opened this issue Dec 27, 2024 · 13 comments
Labels
🐞 bug Something isn't working

Comments

@gongshaojie12
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

The configuration after I upload files to the knowledge base is as follows.
image
image

When I click "Save and Process," the following issue occurs.

2024-12-27 02:26:50,448.448 ERROR [Thread-1123 (_format_qa_document)] [http_request.py:181] - Request: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation failed, status: 429, message: Requests rate limit exceeded, please try again later.
2024-12-27 02:26:50,450.450 ERROR [Thread-1123 (_format_qa_document)] [qa_index_processor.py:159] - Failed to format qa document
Traceback (most recent call last):
  File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 110, in invoke
    result = self._invoke(
             ^^^^^^^^^^^^^
  File "/app/api/core/model_runtime/model_providers/tongyi/llm/llm.py", line 84, in _invoke
    return self._generate(model, credentials, prompt_messages, model_parameters, tools, stop, stream, user)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/model_runtime/model_providers/tongyi/llm/llm.py", line 205, in _generate
    return self._handle_generate_response(model, credentials, response, prompt_messages)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/model_runtime/model_providers/tongyi/llm/llm.py", line 220, in _handle_generate_response
    raise ServiceUnavailableError(response.message)
dashscope.common.error.ServiceUnavailableError: Requests rate limit exceeded, please try again later.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/api/core/rag/index_processor/processor/qa_index_processor.py", line 146, in _format_qa_document
    response = LLMGenerator.generate_qa_document(tenant_id, document_node.page_content, document_language)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/llm_generator/llm_generator.py", line 306, in generate_qa_document
    response = model_instance.invoke_llm(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/model_manager.py", line 127, in invoke_llm
    return self._round_robin_invoke(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/model_manager.py", line 310, in _round_robin_invoke
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/core/model_runtime/model_providers/__base/large_language_model.py", line 134, in invoke
    raise self._transform_invoke_error(e)
core.model_runtime.errors.invoke.InvokeServerUnavailableError: [tongyi] Server Unavailable Error, Requests rate limit exceeded, please try again later.

How can I resolve this issue?

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 27, 2024
Copy link

dosubot bot commented Dec 27, 2024

To resolve the "Requests rate limit exceeded" error in your self-hosted Dify setup, you can try the following solutions:

  1. Reduce Concurrent Processing: Limit the number of files processed simultaneously or break them into smaller batches to reduce the load on the embedding service.

  2. Implement Retry Logic: Use retry logic with exponential backoff when rate limit errors occur to manage the request flow more effectively. Dify implements exponential backoff by using a backoff factor and retrying the request with increasing delays [1].

  3. Use Multiple API Keys: Consider using multiple API keys to distribute the load across different keys, which can help in managing the rate limits better.

  4. Upgrade Machine Resources: Increase your machine's memory capacity to handle more processes without hitting rate limits.

  5. Use a Different Embedding Model Provider: Consider switching to an embedding model provider that offers higher rate limits to accommodate your processing needs.

  6. Check Rate Limits: Ensure that the rate limit for the Embedding model API key used has not been reached. Refer to the official documentation for details on rate limits.

These steps are based on handling similar rate limit issues in Dify version 0.13.2 and the Tongyi embedding service [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@yihong0618
Copy link
Contributor

will check today thanks

@gongshaojie12
Copy link
Author

  1. I only uploaded one file, and the number of files is not large.
  2. How exactly should the retry logic be implemented?
  3. How should the "Use Multiple API Keys" option be configured?
  4. Currently, the machine resources are quite sufficient. It's processing just one file, and 16GB of memory should be enough, right?
  5. I tried OpenAI's text-embedding-3-large, but the same issue occurred.
  6. Regarding rate limits, I only uploaded one file, which is 27MB in size. This shouldn't trigger the limit, right?

To resolve the "Requests rate limit exceeded" error in your self-hosted Dify setup, you can try the following solutions:

  1. Reduce Concurrent Processing: Limit the number of files processed simultaneously or break them into smaller batches to reduce the load on the embedding service.
  2. Implement Retry Logic: Use retry logic with exponential backoff when rate limit errors occur to manage the request flow more effectively. Dify implements exponential backoff by using a backoff factor and retrying the request with increasing delays [1].
  3. Use Multiple API Keys: Consider using multiple API keys to distribute the load across different keys, which can help in managing the rate limits better.
  4. Upgrade Machine Resources: Increase your machine's memory capacity to handle more processes without hitting rate limits.
  5. Use a Different Embedding Model Provider: Consider switching to an embedding model provider that offers higher rate limits to accommodate your processing needs.
  6. Check Rate Limits: Ensure that the rate limit for the Embedding model API key used has not been reached. Refer to the official documentation for details on rate limits.

These steps are based on handling similar rate limit issues in Dify version 0.13.2 and the Tongyi embedding service [2].

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@gongshaojie12
Copy link
Author

will check today thanks

Alright, looking forward to the results. Thank you.

@AugNSo
Copy link

AugNSo commented Dec 27, 2024

According to the log, the issue is not on the embedding side but on the LLM side. Q&A mode is passing your chunked document content to LLM generate Q&A pairs. My guess is that your document has a bit more chunks than the rate limit and triggered your LLM provider's error.

@gongshaojie12
Copy link
Author

According to the log, the issue is not on the embedding side but on the LLM side. Q&A mode is passing your chunked document content to LLM generate Q&A pairs. My guess is that your document has a bit more chunks than the rate limit and triggered your LLM provider's error.

I uploaded a PDF with only two pages, but the same issue occurred. The PDF is as follows.
mobile-aloha.pdf

@yihong0618
Copy link
Contributor

According to the log, the issue is not on the embedding side but on the LLM side. Q&A mode is passing your chunked document content to LLM generate Q&A pairs. My guess is that your document has a bit more chunks than the rate limit and triggered your LLM provider's error.

I uploaded a PDF with only two pages, but the same issue occurred. The PDF is as follows. mobile-aloha.pdf

This pdf works fine in my side..

@gongshaojie12
Copy link
Author

gongshaojie12 commented Dec 27, 2024

That's strange. I only modified a few parameters in the .env and docker-compose.yaml files, as shown below.

.env

UPLOAD_FILE_SIZE_LIMIT=1024
UPLOAD_FILE_BATCH_LIMIT=500
ETL_TYPE=Unstructured
UNSTRUCTURED_API_URL=http://unstructured:8000/general/v0/general

docker-compose.yaml

UPLOAD_FILE_SIZE_LIMIT: ${UPLOAD_FILE_SIZE_LIMIT:-1024}
UPLOAD_FILE_BATCH_LIMIT: ${UPLOAD_FILE_BATCH_LIMIT:-500}
environment:
  NGINX_CLIENT_MAX_BODY_SIZE: 1024M

I haven't made any other changes. @yihong0618

@AugNSo
Copy link

AugNSo commented Dec 27, 2024

Made the same change to .env and docker-compose.yaml, still cannot replicated your problem with the pdf file provided.

@gongshaojie12
Copy link
Author

That's so strange.

@AugNSo
Copy link

AugNSo commented Dec 27, 2024

What is your current dify's system model? Can you try changing your dify's system reasoning model to a model with a higher rate limit or a locally deployed one and see whether the problem persists?

@gongshaojie12
Copy link
Author

dify's system model:qwen-vl-max。After switching to qwen-max, it was able to process normally, but ultra-long PDF files still fail to process. This should be related to the model's rate limit. Currently, local model deployment is not possible. Can Dify control the indexing speed during the indexing process?

@AugNSo
Copy link

AugNSo commented Dec 30, 2024

From my experience, you don’t really need qwen-max for task like this, but if you insist, contacting the service provider to increase the rate limit is also an option.
@yihong0618 I think the issue can be closed, not a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants