Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: Inconsistent Vulkan segfault #10528

Open
RobbyCBennett opened this issue Nov 26, 2024 · 26 comments
Open

Misc. bug: Inconsistent Vulkan segfault #10528

RobbyCBennett opened this issue Nov 26, 2024 · 26 comments

Comments

@RobbyCBennett
Copy link

RobbyCBennett commented Nov 26, 2024

Name and Version

library 531cb1c (gguf-v0.4.0-2819-g531cb1c2)

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

No response

Problem description & steps to reproduce

  1. Compile the program below
  2. Run it a thousand times and it will probably have a segmentation fault at least once. I used the gdb debugger.

Simple program:

#include "llama.h"

static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}

int main(int argc, char **argv)
{
  llama_log_set(handleLog, 0);

  char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
  struct llama_model_params params = llama_model_default_params();
  struct llama_model *model = llama_load_model_from_file(path, params);
  llama_free_model(model);

  return 0;
}

Shell script to run the program several times:

#! /bin/sh

PROGRAM=llama-bug
LOG=debug.log
COUNT=1000

rm -f "$LOG"

for i in `seq 1 $COUNT`; do
	gdb -batch -ex run -ex bt "$PROGRAM" >> "$LOG" 2>> "$LOG"
done

First Bad Commit

No response

Relevant log output

GDB output from crash caused by /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
ggml_vulkan: Compiling shaders..............................Done!

Thread 3 "[vkrt] Analysis" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe35a8640 (LWP 1789333)]
0x00007fffeff1cb00 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#0  0x00007fffeff1cb00 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#1  0x00007ffff0246f1d in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#2  0x00007fffeff1fcfa in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01
#3  0x00007ffff7a1dac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#4  0x00007ffff7aaf850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

GDB output from crash with unknown cause

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
ggml_vulkan: Compiling shaders..............................Done!

Thread 3 "[vkrt] Analysis" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe35a8640 (LWP 1750868)]
0x00007fffeff1cb00 in ?? ()
#0  0x00007fffeff1cb00 in ?? ()
#1  0x000000006746139a in ?? ()
#2  0x0000000002a1b0d8 in ?? ()
#3  0x0000000067461399 in ?? ()
#4  0x00000000000e6817 in ?? ()
#5  0x00005555561076c0 in ?? ()
#6  0x00007fffeff1ef10 in ?? ()
#7  0x0000000000000000 in ?? ()
@jeffbolznv
Copy link
Collaborator

This might be a driver bug. Can you try the latest drivers?

I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated. That's something we should look into fixing.

@slaren
Copy link
Collaborator

slaren commented Nov 27, 2024

I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated.

We may need to add a function to destroy a backend and release all the resources, otherwise calling ggml_backend_unload to unload a dynamically loaded backend may result in a leak.

@jeffbolznv
Copy link
Collaborator

IMO issue #10420 is also a question about the object model for ggml backends, i.e. should it be possible for each thread to have its own VkInstance/VkDevice and what ggml/llama object should their lifetime be tied to.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 27, 2024

This might be a driver bug. Can you try the latest drivers?

I think there's also a chance this could be caused by the ggml-vulkan backend not destroying the VkDevice/VkInstance before the process is terminated. That's something we should look into fixing.

I think this is a bug I often observe on Linux, but only on Nvidia. It happens when exiting the application, so some issue with clean-up. I haven't looked into it yet.

IMO issue #10420 is also a question about the object model for ggml backends, i.e. should it be possible for each thread to have its own VkInstance/VkDevice and what ggml/llama object should their lifetime be tied to.

I've tried building the global stuff in the way I think CUDA is handling it in the background, but it's not done yet. It should be possible to keep the device and instance global if all temporary variables stay attached to the backend instance. Command buffers are probably not the only thing where that hasn't been implemented yet.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 29, 2024

This is how the crash looks for me. It happens in some Nvidia driver thread after all the ggml code has already exited:

Thread 7 "[vkps] Update" received signal SIGSEGV, Segmentation fault.

This is the thread:

* 7    Thread 0x7fffd56006c0 (LWP 683442) "[vkps] Update"   0x00007fffe5401960 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01

This is the stack trace:

#0  0x00007fffe5401960 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01
#1  0x00007fffe57392b4 in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01
#2  0x00007fffe5404dfa in ?? () from /lib/x86_64-linux-gnu/libnvidia-eglcore.so.565.57.01
#3  0x00007ffff729ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#4  0x00007ffff7329c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

@jeffbolznv Do you happen to know what the [vkps] Update thread is? I don't know what is/isn't getting cleaned up in a way to cause the Nvidia driver to segfault. No other driver shows this issue.

Interestingly @RobbyCBennett saw it in one of the other Nvidia threads ([vkrt] Analysis).

Resolving this has gotten a little more important since @LostRuins reported that a crash on exit on Windows in certain cases causes a system crash (BSOD). Might be the same cause.

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 29, 2024

I hacked together a commit (335f48a) where the devices and Vulkan instance get cleaned up properly (at least I think so, validation layers didn't print anything), but the Nvidia driver still segfaults.

@jeffbolznv
Copy link
Collaborator

They're both threads created by the driver. I'm pretty sure they should get shut down when the VkDevice is destroyed.

I hacked together a commit (335f48a) where the devices and Vulkan instance get cleaned up properly

Is this relying on the static destructor for the vk_instance pointer? I think that may happen too late. Is there a hook where we can destroy the objects before the process is terminated?

@slaren
Copy link
Collaborator

slaren commented Nov 29, 2024

Is there a hook where we can destroy the objects before the process is terminated?

Not at the moment. I may add destructors for the backend_device and backend_reg objects in the future, but these would still rely on a static destructor to be called normally when exiting the application. I understand that static destructors can be risky due to the order of destruction, but I am not sure why that should be a problem for the Vulkan driver. I would very much prefer to avoid adding a function to shutdown ggml unless it is absolutely necessary.

@RobbyCBennett
Copy link
Author

I had some system problems when updating the driver, but I finally got some results. I still see segmentation faults with the new driver. I haven't tried the commit 335f48a yet.

The different types of seg faults I got:

  • Thread 5 "[vkps] Update" received signal SIGSEGV, Segmentation fault.
  • Thread 3 "[vkrt] Analysis" received signal SIGSEGV, Segmentation fault.
  • Thread 5 received signal SIGSEGV, Segmentation fault.

Here's some more system information if that helps at all:

  • Ubuntu 22 before and Ubuntu 24 now
  • NVIDIA RTX 4090

@0cc4m
Copy link
Collaborator

0cc4m commented Nov 29, 2024

I would very much prefer to avoid adding a function to shutdown ggml unless it is absolutely necessary.

It's probably not absolutely necessary, since this issue only appears on Nvidia. Their driver should handle this gracefully.

@jeffbolznv
Copy link
Collaborator

@0cc4m or @RobbyCBennett, can you try adding a call into ggml-vulkan to destroy the VkDevice and VkInstance right before dlclose is called in unload_backend?

I may add destructors for the backend_device and backend_reg objects in the future, but these would still rely on a static destructor to be called normally when exiting the application.

I suspect (not sure) it's OK to invoke the cleanup from a static destructor in the main executable (or ggml.so?), as long as it's before ggml-vulkan or the vulkan driver libraries have been unloaded.

Linux doesn't give a good way to have this entirely self-contained in the vulkan driver or in ggml-vulkan. I think some kind of call from ggml is needed.

@RobbyCBennett
Copy link
Author

I don't know this code and I haven't made any commits to this project. I don't see any VkDevice or VkInstance types. Maybe @0cc4m can make this change.

@RobbyCBennett
Copy link
Author

Here's an idea for a potential workaround: provide a way to have a preferred backend. For my example, if CUDA is available, then use CUDA otherwise use Vulkan. I don't currently see a way to specify the preferred backend. Maybe it could look like the following.

enum llama_specific_backend_type {
    LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
    LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
    // others...
};

const llama_specific_backend_type PREFERRED_BACKENDS[] = {
    LLAMA_SPECIFIC_BACKEND_TYPE_CUDA,
    LLAMA_SPECIFIC_BACKEND_TYPE_VULKAN,
};

int main()
{
  llama_set_backend(PREFERRED_BACKENDS, sizeof(PREFERRED_BACKENDS) / sizeof(PREFERRED_BACKENDS[0]));
}

@slaren
Copy link
Collaborator

slaren commented Dec 2, 2024

You can set the devices that you want to use in llama_model_params::devices, but I don't see how that's related to Vulkan crashing.

@RobbyCBennett
Copy link
Author

I don't have any crashes on CUDA, so selecting CUDA instead of Vulkan at runtime would prevent crashing in Vulkan with Nvidia. It wouldn't actually fix Vulkan crashing. It would just be a workaround.

@slaren
Copy link
Collaborator

slaren commented Dec 2, 2024

If you build with GGML_BACKEND_DL enabled, then you can also use ggml_backend_load to load only the backend that you want to use.

@RobbyCBennett
Copy link
Author

RobbyCBennett commented Dec 2, 2024

I looked into both options and llama_model_params::devices seems to be a good solution for me. Thanks for the help!

Here's a snippet of my workaround:

  // ... create the params
  #ifdef __linux__
    static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
    if (sDevice != nullptr) {
      static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
      params.devices = sDevices;
    }
  #endif
  // ... use the params

@slaren
Copy link
Collaborator

slaren commented Dec 2, 2024

That should work, but if you don't intend to use the Vulkan backend at all, you can avoid loading it entirely by using GGML_BACKEND_DL and loading the backends dynamically. That should give you better compatibility, and use less resources. Keep in mind that without it, the CUDA backend will fail to load if the driver is not installed and stop your application from starting entirely.

Eventually this will become the standard in all the llama.cpp binary distributions.

@RobbyCBennett
Copy link
Author

I still intend to use the Vulkan backend to support non-CUDA hardware like AMD. I'll keep that in mind. Thank you.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 3, 2024

I'll look into it soon, I've been busy with #10597

@jeffbolznv
Copy link
Collaborator

I've borrowed a linux system and have reproduced this locally, I'll try to put together a fix.

@jeffbolznv
Copy link
Collaborator

Unfortunately, I've been unable to reproduce this again, running for the rest of the day. Only ever saw it the one time. So I'm not sure this system will be very helpful for testing.

In the meantime, I looked at the destruction order on Windows. Looks like the Vulkan driver gets unloaded before any static destructors run in ggml, so by then it's too late to do any cleanup. So I don't think we can handle this automatically from, say, ~ggml_backend_registry.

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 29, 2024

@RobbyCBennett Can you try #10989? For me that fixed the segfault.

@RobbyCBennett
Copy link
Author

With aa014d7 I have a consistent crash if the Vulkan backend is available in the test program on that same Linux system. This even happens if I only use the CUDA device.

Stack trace with Vulkan (caused by the destructor ~vk_instance_t):

Thread 1 "ai_test" received signal SIGSEGV, Segmentation fault.
0x00007fffb6a33de0 in ?? ()
#0  0x00007fffb6a33de0 in ?? ()
#1  0x00007ffff7e0a123 in ?? () from /lib/x86_64-linux-gnu/libvulkan.so.1
#2  0x00007fffee4e95bb in ?? () from /lib/x86_64-linux-gnu/libVkLayer_MESA_device_select.so
#3  0x00007ffff7e1dde5 in vkDestroyInstance () from /lib/x86_64-linux-gnu/libvulkan.so.1
#4  0x000055555594d850 in vk::DispatchLoaderStatic::vkDestroyInstance (pAllocator=0x0, instance=<optimized out>, this=<optimized out>) at /usr/include/vulkan/vulkan.hpp:995
#5  vk::Instance::destroy<vk::DispatchLoaderStatic> (d=..., allocator=..., this=<optimized out>) at /usr/include/vulkan/vulkan_funcs.hpp:94
#6  vk_instance_t::~vk_instance_t (this=<optimized out>, __in_chrg=<optimized out>) at /home/robby/sti/src/lib/llama/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:764
#7  std::default_delete<vk_instance_t>::operator() (this=<optimized out>, __ptr=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:99
#8  std::default_delete<vk_instance_t>::operator() (__ptr=<optimized out>, this=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:93
#9  std::unique_ptr<vk_instance_t, std::default_delete<vk_instance_t> >::~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/unique_ptr.h:404
#10 0x00007fffee847a66 in __run_exit_handlers (status=0, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:108
#11 0x00007fffee847bae in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:138
#12 0x00007fffee82a1d1 in __libc_start_call_main (main=main@entry=0x5555555f9f20 <main(int, char**)>, argc=argc@entry=1, argv=argv@entry=0x7fffffffe448) at ../sysdeps/nptl/libc_start_call_main.h:74
#13 0x00007fffee82a28b in __libc_start_main_impl (main=0x5555555f9f20 <main(int, char**)>, argc=1, argv=0x7fffffffe448, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe438) at ../csu/libc-start.c:360
#14 0x000055555561fa15 in _start ()

@0cc4m
Copy link
Collaborator

0cc4m commented Dec 30, 2024

With aa014d7 I have a consistent crash if the Vulkan backend is available in the test program on that same Linux system. This even happens if I only use the CUDA device.

That's concerning. Do you have example code that triggers this crash?

@RobbyCBennett
Copy link
Author

Yes. Here's my original example with the addition of changing params.devices to use only CUDA.

#include <stdio.h>

#include "llama.h"

static void handleLog(enum ggml_log_level level, const char *text, void *user_data) {}

int main(int argc, char **argv)
{
  llama_log_set(handleLog, 0);

  struct llama_model_params params = llama_model_default_params();

  // Only use CUDA if it's available
  static ggml_backend_device *const sDevice = ggml_backend_dev_by_name("CUDA0");
  if (sDevice != nullptr) {
    puts("Switching to CUDA");
    static ggml_backend_dev_t sDevices[] = {sDevice, nullptr};
    params.devices = sDevices;
  }
  else {
    puts("Not using CUDA");
  }

  char path[] = "/your-path-to/llama.cpp/models/ggml-vocab-llama-bpe.gguf";
  struct llama_model *model = llama_load_model_from_file(path, params);
  llama_free_model(model);

  return 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants