-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. bug: Inconsistent Vulkan segfault #10528
Comments
This might be a driver bug. Can you try the latest drivers? I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated. That's something we should look into fixing. |
We may need to add a function to destroy a backend and release all the resources, otherwise calling |
IMO issue #10420 is also a question about the object model for ggml backends, i.e. should it be possible for each thread to have its own VkInstance/VkDevice and what ggml/llama object should their lifetime be tied to. |
I think this is a bug I often observe on Linux, but only on Nvidia. It happens when exiting the application, so some issue with clean-up. I haven't looked into it yet.
I've tried building the global stuff in the way I think CUDA is handling it in the background, but it's not done yet. It should be possible to keep the device and instance global if all temporary variables stay attached to the backend instance. Command buffers are probably not the only thing where that hasn't been implemented yet. |
This is how the crash looks for me. It happens in some Nvidia driver thread after all the ggml code has already exited:
This is the thread:
This is the stack trace:
@jeffbolznv Do you happen to know what the Interestingly @RobbyCBennett saw it in one of the other Nvidia threads ( Resolving this has gotten a little more important since @LostRuins reported that a crash on exit on Windows in certain cases causes a system crash (BSOD). Might be the same cause. |
I hacked together a commit (335f48a) where the devices and Vulkan instance get cleaned up properly (at least I think so, validation layers didn't print anything), but the Nvidia driver still segfaults. |
They're both threads created by the driver. I'm pretty sure they should get shut down when the VkDevice is destroyed.
Is this relying on the static destructor for the vk_instance pointer? I think that may happen too late. Is there a hook where we can destroy the objects before the process is terminated? |
Not at the moment. I may add destructors for the backend_device and backend_reg objects in the future, but these would still rely on a static destructor to be called normally when exiting the application. I understand that static destructors can be risky due to the order of destruction, but I am not sure why that should be a problem for the Vulkan driver. I would very much prefer to avoid adding a function to shutdown ggml unless it is absolutely necessary. |
I had some system problems when updating the driver, but I finally got some results. I still see segmentation faults with the new driver. I haven't tried the commit 335f48a yet. The different types of seg faults I got:
Here's some more system information if that helps at all:
|
It's probably not absolutely necessary, since this issue only appears on Nvidia. Their driver should handle this gracefully. |
@0cc4m or @RobbyCBennett, can you try adding a call into ggml-vulkan to destroy the VkDevice and VkInstance right before dlclose is called in unload_backend?
I suspect (not sure) it's OK to invoke the cleanup from a static destructor in the main executable (or ggml.so?), as long as it's before ggml-vulkan or the vulkan driver libraries have been unloaded. Linux doesn't give a good way to have this entirely self-contained in the vulkan driver or in ggml-vulkan. I think some kind of call from ggml is needed. |
I don't know this code and I haven't made any commits to this project. I don't see any VkDevice or VkInstance types. Maybe @0cc4m can make this change. |
Here's an idea for a potential workaround: provide a way to have a preferred backend. For my example, if CUDA is available, then use CUDA otherwise use Vulkan. I don't currently see a way to specify the preferred backend. Maybe it could look like the following. enum llama_specific_backend_type {
LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
// others...
};
const llama_specific_backend_type PREFERRED_BACKENDS[] = {
LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
};
int main()
{
llama_set_backend(PREFERRED_BACKENDS, sizeof(PREFERRED_BACKENDS) / sizeof(PREFERRED_BACKENDS[0]));
} |
You can set the devices that you want to use in |
I don't have any crashes on CUDA, so selecting CUDA instead of Vulkan at runtime would prevent crashing in Vulkan with Nvidia. It wouldn't actually fix Vulkan crashing. It would just be a workaround. |
If you build with |
I looked into both options and Here's a snippet of my workaround: // ... create the params
#ifdef __linux__
static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
if (sDevice != nullptr) {
static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
params.devices = sDevices;
}
#endif
// ... use the params |
That should work, but if you don't intend to use the Vulkan backend at all, you can avoid loading it entirely by using Eventually this will become the standard in all the llama.cpp binary distributions. |
I still intend to use the Vulkan backend to support non-CUDA hardware like AMD. I'll keep that in mind. Thank you. |
I'll look into it soon, I've been busy with #10597 |
I've borrowed a linux system and have reproduced this locally, I'll try to put together a fix. |
Unfortunately, I've been unable to reproduce this again, running for the rest of the day. Only ever saw it the one time. So I'm not sure this system will be very helpful for testing. In the meantime, I looked at the destruction order on Windows. Looks like the Vulkan driver gets unloaded before any static destructors run in ggml, so by then it's too late to do any cleanup. So I don't think we can handle this automatically from, say, ~ggml_backend_registry. |
@RobbyCBennett Can you try #10989? For me that fixed the segfault. |
With aa014d7 I have a consistent crash if the Vulkan backend is available in the test program on that same Linux system. This even happens if I only use the CUDA device. Stack trace with Vulkan (caused by the destructor
|
That's concerning. Do you have example code that triggers this crash? |
Yes. Here's my original example with the addition of changing #include <stdio.h>
#include "llama.h"
static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}
int main(int argc, char **argv)
{
llama_log_set(handleLog, 0);
struct llama_model_params params = llama_model_default_params();
// Only use CUDA if it's available
static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
if (sDevice != nullptr) {
puts("Switching to CUDA");
static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
params.devices = sDevices;
}
else {
puts("Not using CUDA");
}
char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
struct llama_model *model = llama_load_model_from_file(path, params);
llama_free_model(model);
return 0;
} |
Name and Version
library 531cb1c (gguf-v0.4.0-2819-g531cb1c2)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Problem description & steps to reproduce
gdb
debugger.Simple program:
Shell script to run the program several times:
First Bad Commit
No response
Relevant log output
GDB output from crash caused by /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
GDB output from crash with unknown cause
The text was updated successfully, but these errors were encountered: