[feature discussion] Changing the way Guardrails Interacts with LLM APIs #680

zsimjee · 2024-03-26T23:34:09Z

zsimjee
Mar 26, 2024
Maintainer

Summary

This document explores problems with how guardrails currently expects LLM parameters to be passed through inputs. It suggests a strategy of standardizing and improving the way guardrails operates with LLM APIs.

Motivation

This task requires that we standardize our approach to calling LLMs. This is because it’s confusing to use guardrails right now with different models, and historically, we have not been able to keep up with changing interfaces from different libraries.

Interfaces

Different LLM libraries have different ways of constructing llm clients than those that work with guardrails, and this makes it harder to add guardrails to an existing project. An example of this is the new OpenAI 1.X standard. In this standard, users are expected to create a client, then use that client’s APIs to make requests out to OpenAI. The Guardrails way to do this is to not initialize the client, but instead to pass raw client creator pulled off the openai import, a pattern which is not used in any openai docs. The way message history is passed also differs between the two.

OpenAI Client

import openai

client = openai.Client()

res = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[{
    "role": "user",
    "content": "what's a high protein snack I can eat today?"
  }]
)

print(res.choices[0].message.content)

Guardrails Client

from guardrails import Guard
import openai

guard = Guard()
res = guard(
  openai.chat.completions.create,
  model="gpt-3.5-turbo",
  msg_history=[{
    "role": "user",
    "content": "what's a high protein snack I can eat today?"
  }]
)

print(res.raw_llm_output)

Recently, litellm was added to the guardrails project as a way to help standardize this situation. While this did make it easier to deal with different LLM apis, it still does not have parity with openai

from guardrails import Guard
from litellm import litellm

guard = Guard()
res = guard(
  llm_api=litellm.completion,
  model="gpt-3.5-turbo",
  msg_history=[{"role": "user", "content": "what's a high protein snack I can eat today?"}]
)

print(res.raw_llm_output)

Standardizing inputs

The main interface differences between Guardrails and OpenAI are the use of prompt and msg_history. Guardrails tries to be flexible by allowing either of these fields to be passed to both chat endpoints and regular completion endpoints. The package makes sure that in an invocation of Guard.__call__, one of these fields must be present. While initially a good usability feature, this may now be a cause for confusion. Guardrails also uses model which is fairly standard. Other inputs include temperature and max_tokens , which are also standard. See Apendix A for a breakdown of how different LLMs accept these parameters.

Detailed design

Guardrails will remove standard support for prompt , msg_history, and other LLM arguments altogether.

Currently, guardrails tries to find these input arguments to

inject prefixes/suffixes into prompts/instructions/messages
autofill instructions for models that don't accept them through prompt or message history amendment
support input validation (prompt + instruction + msg_history)

To do these things consistently, guardrails standardizes those fields, and tries to map back and forth between these param types and downstream LLM param types. This is what causes a lot of confusion and compatability issues.

Using this option, guardrails can avoid these issues of mapping to client-specific params (i.e. get rid of llm_providers).

Instead, guardrails should keep track of two lists of kwargs

guardrails.__call__ specific keywords - reask etc...
common input kwargs(prompt, input, instruction, messages, message_history)

Instead of mapping parameter names back and forth, guardrials will always maintain the params passed to the __call__ function (i.e. closest to the original API) and make requests to the LLM API callable using those mapped values.

This would change the call workflow from

Guard.__call__ => llm_providers.get_llm_ask => LLMClient

to

Guard.__call__ => filter_guard_args => LLMClient

Specifically, the workflow would follow these steps

compile all kwargs passed to __call__
remove from that map all guardrails specific args (reask, etc)
pass all remaining kwargs to the llm client

This strategy would confer a few key benefits

we become truly agnostic to the LLM API altogether.
Users can use the SAME syntax that's already documented for their provider
we let the API callables do type-checking, parameter-checking, and other checks.
custom callables are still supported

# Some pseudocode for what calls could look like
def __call__(llm_api, *args, **kwargs):
	inputs = extract_inputs(kwargs)
	for input in inputs:
		kwargs[input.key] = hydrate_messages(input.value) # json suffixes etc

	for input in inputs:
		validate_messages(messages) # input validation
		
	llm_api(kwargs)
	

guard = Guard()
guard(
	litellm,
	model='gpt-3.5-turbo',
	messages=[{"role": "user", "content": "What's the best ice cream flavor?"}]
)

client = openai.Client()
guard = Guard()
guard(
	client.chat.completions.create,
	model='gpt-3.5-turbo',
	messages=[{"role": "user", "content": "What's the best ice cream flavor?"}]
)

import cohere
co = cohere.Client('<<apiKey>>')
guard = Guard()
guard(
	co.chat,
	chat_history=[{"role": "user", "content": "What's the best ice cream flavor?"}]
)

Drawbacks

The goal of this project does not have significant drawbacks other than the cost of hte phased implementation to maintain backwards compatibility. There are a few drawbacks with this specific solution though:

the interface is still not standardized
we need a way to parse outputs from each model, and there is still some mapping required there

Alternatives

Int his alternative option, guardrials standardizes on the LiteLLM style of input handling - accepting messages as a parameter. This negates the need for ANY mapping to occur within the guardrails package, and simplifies our approach. Instead, we can route all requests directly through LiteLLM

Pros

deal with any LLM using the same unified interface
makes switching between LLMs easy to do

Cons:

Custom LLM callables become difficult to implement
guardrails supports 1:1 what litellm supports

guard.__call__(messages, *args, **kwargs):
	messages = hydrate_messages(messages) # json suffixes etc
	validate_messages(messages) # input validation
	
	litellm(messages=messages, kwargs)

Adoption Strategy

This may seem like a backwards incompatible change, but that can be managed as such

in 0.4.3, we remove the requirement to have prompt, msg_history, and instructions as input params for prompt. We also implement the majority of the passthrough strategy.
in 0.5.0, we remove the legacy LLM API mapping functions

How we teach this

Docs must all be updated to reflect the new strategy. It’s still advantageous to lean heavily on LiteLLM where we can, and we should try to educate by pointing towards openai first, then litellm for more advanced usecases

We need to update the OpenAI doc as well as the custom callables doc. These can be consolidated into a single doc - ‘Calling LLM APIs’ with examples
Getting Started, Introduction, and Readme should directly use OpenAI/Cohere/Anthropic in tabbed views of the guards. We should also include a link that links to the ‘Calling LLM APIs’ doc
the notebooks need to be updated to use LiteLLM

ShreyaR · 2024-03-27T18:43:58Z

ShreyaR
Mar 27, 2024
Maintainer

From offline discussion: next steps:

for option 1, we need to figure out the output standardization/mapping story
option 2 - bw compatability because we remove the 'llm_api' thing. Remove llm_callabe param in docs, if it exists use existing flow, if not passthrough to litellm

0 replies

zsimjee · 2024-03-27T20:59:22Z

zsimjee
Mar 27, 2024
Maintainer Author

linking this item here - #681

0 replies

ShreyaR · 2024-03-31T07:10:50Z

ShreyaR
Mar 31, 2024
Maintainer

One more thing to consider for option 1 is output standardization. We've discussed output standardization in the past as something we'd like to do -- for option 1, this would require maintaining more of a mapping between LLMs and expected output types.

1 reply

zsimjee Apr 2, 2024
Maintainer Author

That's true. I'm looking through output formats for the most common LLM APIs to see what that effort looks like.

krrishdholakia · 2024-05-07T16:10:33Z

krrishdholakia
May 7, 2024

Hi @zsimjee - litellm maintainer here

litellm .. it still does not have parity with openai

What are we missing currently? We should be 1:1 compatible with openai

Happy to accelerate any pending work on this, from our end

1 reply

zsimjee May 8, 2024
Maintainer Author

Hi @krrishdholakia my comment on parity is incorrect. I'm not sure what I meant, but I might have meant the parity within our codebase - we didn't support streaming for litellm.

I've also changed the next steps here.

For 0.5.0, I want to make litellm the DEFAULT llm handler we provide first class support for. We will change all runbooks etc to use litellm. We will make the llm_api parameter in guard calls optional, and pass through all args provided in that call directly to an internal litellm chat client.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature discussion] Changing the way Guardrails Interacts with LLM APIs #680

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[feature discussion] Changing the way Guardrails Interacts with LLM APIs #680

zsimjee Mar 26, 2024 Maintainer

Summary

Motivation

Interfaces

Standardizing inputs

Detailed design

Drawbacks

Alternatives

Adoption Strategy

How we teach this

Replies: 4 comments · 2 replies

ShreyaR Mar 27, 2024 Maintainer

zsimjee Mar 27, 2024 Maintainer Author

ShreyaR Mar 31, 2024 Maintainer

zsimjee Apr 2, 2024 Maintainer Author

krrishdholakia May 7, 2024

zsimjee May 8, 2024 Maintainer Author

zsimjee
Mar 26, 2024
Maintainer

Replies: 4 comments 2 replies

ShreyaR
Mar 27, 2024
Maintainer

zsimjee
Mar 27, 2024
Maintainer Author

ShreyaR
Mar 31, 2024
Maintainer

zsimjee Apr 2, 2024
Maintainer Author

krrishdholakia
May 7, 2024

zsimjee May 8, 2024
Maintainer Author