How to trace cost for AzureChatOpenAI with a custom model #28949

AsmaaMHadir · 2024-12-28T03:19:46Z

AsmaaMHadir
Dec 28, 2024

I created a Langgraph graph which contains a node that call a function on one of my nodes that uses AzureChatOpenAI with a hosted OpenAI model on my Azure AI studio in a chain as defined below:

def annotate_answer(answer, docs) -> List[Citation]:
    """
    Annotate the given answer with citations from documents.
    """
    llm = AzureChatOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        streaming=True,
    )
    formatted_docs = format_docs(docs)
    prompt = PromptTemplate(
        template="""Given the following context documents: {context} \n\nAnswer: {answer} \n\n
        If the answer references any information from the context, extract the source document title(s) and relevant snippet(s) from the source text(s) as a citation to the answer.

        Context:
        1. **Document Title:** "Paul's Age"
        **Text:** "Paul is 5 years old."
        2. **Document Title:** "Mary's Age"
        **Text:** "Mary is 6 years old."
        3. **Document Title:** "Jinny's Age"
        **Text:** "Jinny is 12 years old."

        Example Answer: No, Paul is not older than Jinny. Jinny is 12 years old, while Paul is only 5 years old. Therefore, Jinny is older than Paul.
        Citations extracted: 1. **Source Document:** "Mary's Age" - text snippet: "Mary is 6 years old."
                               Source Document: "Paul's Age" - text snippet: "Paul is 5 years old."
        IMPORTANT INSTRUCTIONS:
        - Do not change or rephrase the extracted snippet, the snipppet must be extracted from the field 'page_content' from a given document as is and must not be modified or rephrazed.
        - Do not cite the same source document only, examine all documents and include them if they contain context for the answer.

        """,
    )
    
    chain = prompt | llm.with_structured_output(AnnotatedAnswer)
    
    response = chain.invoke({"answer": answer, "context": formatted_docs})
    return response

I had been tracing my runs on LangSmith by adding the LANGCHAIN_API_KEY and setting LANGCHAIN_TRACING_V2 to true. I get the full traces for my Langgraph chain.
My issue is that I don't get any information about the cost as the cost column is empty. So I did some research and I added a new model on Langsmith portal with the following params:

model_name: azure_openai_4o
provider: azure
Prompt cose: ...
Completion cost:..

Then I added the traceabale decorator to my function. So now my function looks like this:

@traceable(
  run_type="llm",
  metadata={"ls_provider": "azure", "ls_model_name": "azure_openai_4o"}
)
def annotate_answer(answer, docs) -> List[Citation]:
    """
    Annotate the given answer with citations from documents.
    """
    llm = AzureChatOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        streaming=True,
    )
    formatted_docs = format_docs(docs)
    prompt = PromptTemplate(
        template="""Given the following context documents: {context} \n\nAnswer: {answer} \n\n
        If the answer references any information from the context, extract the source document title(s) and relevant snippet(s) from the source text(s) as a citation to the answer.

        Context:
        1. **Document Title:** "Paul's Age"
        **Text:** "Paul is 5 years old."
        2. **Document Title:** "Mary's Age"
        **Text:** "Mary is 6 years old."
        3. **Document Title:** "Jinny's Age"
        **Text:** "Jinny is 12 years old."

        Example Answer: No, Paul is not older than Jinny. Jinny is 12 years old, while Paul is only 5 years old. Therefore, Jinny is older than Paul.
        Citations extracted: 1. **Source Document:** "Mary's Age" - text snippet: "Mary is 6 years old."
                               Source Document: "Paul's Age" - text snippet: "Paul is 5 years old."
        IMPORTANT INSTRUCTIONS:
        - Do not change or rephrase the extracted snippet, the snipppet must be extracted from the field 'page_content' from a given document as is and must not be modified or rephrazed.
        - Do not cite the same source document only, examine all documents and include them if they contain context for the answer.

        """,
    )
    
    chain = prompt | llm.with_structured_output(AnnotatedAnswer)
    
    response = chain.invoke({"answer": answer, "context": formatted_docs})
    return response

Now when I login to LangSmith I can see the cost for the node that calls this function.

My concern is about whether this is the correct way to do tracing for the AzureChatOpenAI, firstly because I use this Chat model with structured output. Secondly because I have another completion node that calls the same model but return a dictionary. I am wondering in the traceable decorator takes into consideration my function's signature and if it considers the output of the function the output tokens, thus it calculates completion cost for the entire output of the function not the actual output of the LLM.

What's the best way to do cost tracing in this case?

@dosu · 2024-12-28T03:21:09Z

dosubot[bot]
bot Dec 28, 2024

To ensure accurate cost tracing for AzureChatOpenAI when using a custom model with structured output and a traceable decorator, you can follow these steps:

Use AIMessage.usage_metadata and AIMessage.response_metadata: These attributes help track token usage, which is crucial for cost tracing. They provide details about input and output tokens.
Enable Streaming Usage: Set stream_usage=True in your AzureChatOpenAI instance. This ensures that token usage information is included in the streaming context.
Structured Output Handling: When using structured outputs, ensure that your model is correctly set up to handle and parse these outputs. The with_structured_output method binds tools to the chat model and appends a parser, which is essential for structured data.
Traceable Decorator Configuration: The traceable decorator should include metadata that specifies the provider and model name, as you've done. This helps LangSmith associate the cost with the correct model.
Function Signature and Output Consideration: The traceable decorator should consider the function's signature and output. However, the cost calculation should be based on the LLM's output tokens, not the entire function's output. Ensure that the usage_metadata is correctly capturing the LLM's token usage.

Here's an example of how you can implement these steps:

from langchain_openai import AzureChatOpenAI
from pydantic import BaseModel, Field

class AnnotatedAnswer(BaseModel):
    # Define your structured output model here
    pass

@traceable(
    run_type="llm",
    metadata={"ls_provider": "azure", "ls_model_name": "azure_openai_4o"}
)
def annotate_answer(answer, docs) -> List[Citation]:
    llm = AzureChatOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        streaming=True,
        stream_usage=True,  # Enable streaming usage
    )
    formatted_docs = format_docs(docs)
    prompt = PromptTemplate(
        template="""..."""  # Your prompt template
    )
    
    chain = prompt | llm.with_structured_output(AnnotatedAnswer)
    
    response = chain.invoke({"answer": answer, "context": formatted_docs})
    # Access usage metadata for cost tracing
    usage_metadata = response.usage_metadata
    print(f"Token usage: {usage_metadata}")
    return response

By following these steps, you should be able to trace costs accurately, considering both the function's signature and the LLM's output tokens ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

4 replies

AsmaaMHadir Dec 28, 2024
Author

@dosu
sream_usage is only available for ChatOpenAI's direct invocations as indicated in the example below I believe:

from pydantic import BaseModel, Field


class Joke(BaseModel):
    """Joke to tell user."""

    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


llm = ChatOpenAI(
    model="gpt-4o-mini",
    stream_usage=True,
)
# Under the hood, .with_structured_output binds tools to the
# chat model and appends a parser.
structured_llm = llm.with_structured_output(Joke)

async for event in structured_llm.astream_events("Tell me a joke", version="v2"):
    if event["event"] == "on_chat_model_end":
        print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
    elif event["event"] == "on_chain_end":
        print(event["data"]["output"])
    else:
        pass

I'm not streaming in this case.

I think the solution in this case would be to create a function that does LLM calling only to take input, run generation, and return output. However, I don't know how it could be possible with my usage of structured output.
.usage_metadata gives me an error that the Pydantic output of my model doesn't have such an attribute. Can you help me find a solution to make my chain:

chain = prompt | llm.with_structured_output(AnnotatedAnswer)

traceable with the @Traceable decorator?

dosubot[bot] bot Dec 28, 2024

To make your chain traceable with the @traceable decorator while using structured outputs, you can follow these steps:

Separate LLM Invocation: Create a separate function that handles the LLM invocation. This function should take the input, perform the LLM call, and return the output. This will help in isolating the LLM's token usage for accurate cost tracking.
Access Usage Metadata: Since usage_metadata is not directly available on the Pydantic model output, you need to ensure that the LLM's response is captured before parsing it into the structured output. You can do this by accessing the raw response from the LLM and extracting the usage_metadata before converting it to your structured output.

Here's an example of how you can implement these steps:

from langchain_openai import AzureChatOpenAI
from pydantic import BaseModel, Field
from typing import List

class AnnotatedAnswer(BaseModel):
    # Define your structured output model here
    pass

@traceable(
    run_type="llm",
    metadata={"ls_provider": "azure", "ls_model_name": "azure_openai_4o"}
)
def annotate_answer(answer, docs) -> List[Citation]:
    llm = AzureChatOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        streaming=False,  # Not using streaming
    )
    formatted_docs = format_docs(docs)
    prompt = PromptTemplate(
        template="""..."""  # Your prompt template
    )
    
    # Separate function for LLM invocation
    def call_llm(input_data):
        response = llm.invoke(input_data)
        usage_metadata = response.usage_metadata
        print(f"Token usage: {usage_metadata}")
        return response

    # Call the LLM and get the raw response
    raw_response = call_llm({"answer": answer, "context": formatted_docs})
    
    # Parse the raw response into the structured output
    structured_output = AnnotatedAnswer.parse_obj(raw_response)
    
    return structured_output

By separating the LLM call and accessing the usage_metadata before parsing the response, you can ensure that the cost tracking is accurate and that the @traceable decorator captures the necessary information.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

AsmaaMHadir Dec 28, 2024
Author

@dosu, The example you provided invokes the llm, whereas my use case uses a chain defined as

chain = prompt | llm.with_structured_output()

which is invoked:

chain.invoke(input)

dosubot[bot] bot Dec 28, 2024

To trace costs accurately when using a chain with llm.with_structured_output() in LangChain, you can follow these steps:

Capture Raw LLM Response: Modify the chain to capture the raw LLM response before it is parsed into the structured output. This allows you to access the usage_metadata for cost tracking.
Access Usage Metadata: Ensure that you extract the usage_metadata from the raw response to track token usage.

Here's how you can implement these steps:

from langchain_openai import AzureChatOpenAI
from pydantic import BaseModel, Field
from typing import List

class AnnotatedAnswer(BaseModel):
    # Define your structured output model here
    pass

@traceable(
    run_type="llm",
    metadata={"ls_provider": "azure", "ls_model_name": "azure_openai_4o"}
)
def annotate_answer(answer, docs) -> List[Citation]:
    llm = AzureChatOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
        streaming=False,  # Not using streaming
    )
    formatted_docs = format_docs(docs)
    prompt = PromptTemplate(
        template="""..."""  # Your prompt template
    )
    
    # Create the chain
    chain = prompt | llm.with_structured_output(AnnotatedAnswer)
    
    # Invoke the chain and capture the raw response
    raw_response = chain.invoke({"answer": answer, "context": formatted_docs})
    
    # Access usage metadata for cost tracing
    usage_metadata = raw_response.usage_metadata
    print(f"Token usage: {usage_metadata}")
    
    return raw_response

By capturing the raw response and accessing the usage_metadata, you can ensure that the cost tracking is accurate and that the @traceable decorator captures the necessary information. This approach allows you to maintain the use of the chain while still obtaining the required token usage data.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to trace cost for AzureChatOpenAI with a custom model #28949

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to trace cost for AzureChatOpenAI with a custom model #28949

AsmaaMHadir Dec 28, 2024

Replies: 1 comment · 4 replies

dosubot[bot] bot Dec 28, 2024

AsmaaMHadir Dec 28, 2024 Author

dosubot[bot] bot Dec 28, 2024

AsmaaMHadir Dec 28, 2024 Author

dosubot[bot] bot Dec 28, 2024

AsmaaMHadir
Dec 28, 2024

Replies: 1 comment 4 replies

dosubot[bot]
bot Dec 28, 2024

AsmaaMHadir Dec 28, 2024
Author

AsmaaMHadir Dec 28, 2024
Author