Multithreading and History #899
JosephCatrambone
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background:
We play pretty fast and loose with Guard.history. There are 27 direct references to Guard.history and 69 dynamic references to history in the main Guardrails repo. Other repositories also modify history directly, appending Calls and iterating over the values. Thread safety has not been a consideration thus far, and when a single guard is shared across multiple threads we run the risk of interweaving calls or injecting odd values.
What are Our Options
Option M: for Mutex. We could lift the history stack into a private method and provide accessor methods. This is good practice in some senses, but it's a significant change and it's not clear if this is even possible, given that
guardrails.Guard
is a subclass ofguardrails_api_client.Guard
(an autogenerated class). Furthermore, Guard needs to be able to be serialized, which almost certainly rules out using a mutex unless we find some kind of special ritual to strip off and re-add a mutex.Option A: for API. We could change the guardrails_api client and add mutex protections alongside the guards themselves at a top level. This is not the cleanest operation and won't even provide help if we find ourselves handing off direct references to the guards. It has the benefit of being more straightforward than Option M, but it means when people are using guards in a multithreaded context they might still encounter the problem if they don't have mutex guards in place. (Side note: If gunicorn is run with --num-workers>1 then threading is not a problem. This is "the right way" to run gunicorn. The issue only appears when --num-threads>1.)
Option Q: for queue. The word 'queue' is one voiced letter followed by four silent letters waiting their turn. We could restructure the very concept of history in a guard and split it off into something standalone to which we append. This is perhaps a more significant change than Option M, and depending on how integrated the queue is we might find it difficult to cover the myriad ways that Guardrails is used. As a matter of personal preference, I don't like it when a piece of software spins up a server on my machine, much less a library. We would have to be very deliberate in our choice or construction of the queue, making sure it had little to no resource consumption. We still want a Guard to be able to be initialized and used in an interactive way.
This list is incomplete.
Still To Do:
We should quantify the risk of multithreaded usage. How likely is it that a guard shared across two threads will have an issue with concurrent modification of history? Can we detect when a guard is used from multiple threads? Do we want to detect this and allow it?
Beta Was this translation helpful? Give feedback.
All reactions