Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Summarize fails even when a model response is generated with the error "HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF" #2131

Open
8 of 9 tasks
SuperSonnix71 opened this issue Nov 28, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@SuperSonnix71
Copy link

Pre-check

  • I have searched the existing issues and none cover this bug.

Description

while using summarize i keep getting the below error. I had to fix the summarize_service.py and ui.py to catch it in a good way.
here is the returned error
10:30:57.046 [ERROR ] private_gpt.server.recipes.summarize.summarize_service - HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF

It looked to be a backend issue but we can clearly see that the response is created correctly by the Ollama backend


Given the information from multiple sources and not prior knowledge, answer the query.
Query: Provide a comprehensive summary of the provided context information. The summary should cover all the key points and main ideas presented in the original text, while also condensing the information into a concise and easy-to-understand format. Please ensure that the summary includes relevant details and examples that support the main ideas, while avoiding any unnecessary information or repetition.

Answer:


** Response: **
assistant: The provided context information outlines various aspects of user management, software usage tracking, and computer inventory in an IT system. Here's a comprehensive summary:

now, i had to fix a lot in the code. fixed all of these:

Summary of the Main Problem

The main problem involved handling asynchronous operations correctly in the summarization service and the UI. Specifically, the issues were:

  1. The code was having errors due to nested async calls, which required the use of nest_asyncio to allow nested event loops.
  2. The _summarize method needed to be converted to an async generator to handle streaming responses correctly. For that the stream_summarize and summarize methods needed to use the async generator correctly
  3. Added Proper error handling for exceptions such asyncio.CancelledError, ResponseError, and StopAsyncIteration.
  4. Fix The _chat method in the UI needed to handle async streaming responses correctly and making sure that stream_summarize was used correctly in an async for loop in the UI.

As i see it the error we have here is due to the following since i have ruled out the server side and networking issues.
since we can clearly see that a response is generated my the model

  1. Timeout: so , if the server takes too long to respond, the client might close the connection which can result in EOF error.
  2. incorrect query parameters .. so if these are incorrect of malfformed the server will close the connection

Steps to Reproduce

  1. Rag some bigger docs
  2. summarize

Expected Behavior

No errors and correct summarization

Actual Behavior

Summarize fails even a model response is generated with the error "HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF"

Environment

CUDA12, Ubuntu, Ollama profile

Additional Information

No response

Version

No response

Setup Checklist

  • Confirm that you have followed the installation instructions in the project’s documentation.
  • Check that you are using the latest version of the project.
  • Verify disk space availability for model storage and data processing.
  • Ensure that you have the necessary permissions to run the project.

NVIDIA GPU Setup Checklist

  • Check that the all CUDA dependencies are installed and are compatible with your GPU (refer to CUDA's documentation)
  • Ensure an NVIDIA GPU is installed and recognized by the system (run nvidia-smi to verify).
  • Ensure proper permissions are set for accessing GPU resources.
  • Docker users - Verify that the NVIDIA Container Toolkit is configured correctly (e.g. run sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi)
@SuperSonnix71 SuperSonnix71 added the bug Something isn't working label Nov 28, 2024
@SuperSonnix71 SuperSonnix71 changed the title [BUG] Summarize fails even a model response is generated with the error "HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF" [BUG] Summarize fails even when a model response is generated with the error "HTTP request failed: POST predict: Post "http://127.0.0.1:36333/completion": EOF" Nov 28, 2024
@SuperSonnix71
Copy link
Author

SuperSonnix71 commented Nov 28, 2024

adding my changes
changes_summarize_service.txt
changes_ui.txt

@SuperSonnix71
Copy link
Author

so, the summarizeservice is retrieved from the request state using SummarizeService = request.state.injector.get(SummarizeService) adn for the streaming Response uses to_openai_stream to convert the response to a SSE stream. the issue is somewhere here

@SuperSonnix71
Copy link
Author

SuperSonnix71 commented Nov 28, 2024

also tested to increase the timeout value here which did not have any effect.. so
query_engine = summary_index.as_query_engine(
llm=self.llm_component.llm,
response_mode=ResponseMode.TREE_SUMMARIZE,
streaming=stream,
use_async=self.settings.summarize.use_async,
timeout=360 # <------------------Increase timeout to 360 seconds
)
as you can see the nested async issue is solved using my code changes attached. we are at least seeing clear logs

  • Software management includes three sub-tabs: Applications, Raw Usage, and Active Usage.
  • These tabs display information about software applications used by users, such as total usage, active usage, raw usage data, and more.

PART OF THE MODEL RESPONSE:


...
Overall, the system provides a comprehensive platform for managing subscriptions, uploaded files, integration platforms, user profiles, and software usage. The system's features are designed to streamline processes, provide insights into user behavior, and support various workflows.


11:37:04.888 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.888 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.889 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.890 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.891 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.891 [ERROR ] private_gpt.server.recipes.summarize.summarize_service - HTTP request failed: POST predict: Post "http://127.0.0.1:33899/completion": EOF
11:37:04.893 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.893 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.893 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.934 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.935 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.935 [INFO ] httpx - HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 500 Internal Server Error"
11:37:04.974 [INFO ] uvicorn.access - 127.0.0.1:53322 - "POST /run/predict HTTP/1.1" 200

@jaluma
Copy link
Collaborator

jaluma commented Dec 2, 2024

Can you check ollama server logs? The problem is ollama related to, since it's throwing 500. Something with context window, computer resources, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants