TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

hiroci · 2024-12-28T01:10:18Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This raises a TypeError missing 1 required positional argument: 'body' (trying to use the bytes_source parameter)

endpoint = ""
key = ""
loader = AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint, api_key=key, mode='single',
    bytes_source=b'%PDF-1.7\n...%',
)

loader.load()

Seems like the error is in the parse_bytes function of the file /langchain_community/document_loaders/parsers/doc_intelligence.py, line 116

all of the other parsers in this file do not specify the name for the second argument in self.client.begin_analyze_document

Example of working parser:

def parse_url(self, url: str) -> Iterator[Document]:
      from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
    
      poller = self.client.begin_analyze_document(
          self.api_model,
          AnalyzeDocumentRequest(url_source=url),
          # content_type="application/octet-stream",
          output_content_format="markdown" if self.mode == "markdown" else "text",
      )
      result = poller.result()
...

Parser that does NOT work

def parse_bytes(self, bytes_source: bytes) -> Iterator[Document]:
      from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
  
      poller = self.client.begin_analyze_document(
          self.api_model,
          analyze_request=AnalyzeDocumentRequest(bytes_source=bytes_source),
          # content_type="application/octet-stream",
          output_content_format="markdown" if self.mode == "markdown" else "text",
      )

The parse_bytes function does not work properly, the second parameter should be body=... instead of analyze_request or do not specify the name of the parameter at all

Error Message and Stack Trace (if applicable)

File "/home/projects/intelligent_chat-be/server/routers/v1/conversation/file_loader.py", line 114, in _load_azure
document = loader.load()
^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/doc_intelligence.py", line 105, in lazy_load
yield from self.parser.parse_bytes(self.bytes_source)
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/parsers/doc_intelligence.py", line 116, in parse_bytes
poller = self.client.begin_analyze_document(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body'

Description

I'm trying to use the azure document intelligence loader from langchain to process a sequence of bytes

System Info

System Information

OS: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Python Version: 3.12.8 (main, Dec 4 2024, 08:54:12) [GCC 11.4.0]

Package Information

langchain_core: 0.3.28
langchain: 0.3.13
langchain_community: 0.3.13
langsmith: 0.2.4
langchain_openai: 0.2.14
langchain_qdrant: 0.2.0
langchain_text_splitters: 0.3.4
langgraph_sdk: 0.1.48

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.11
async-timeout: Installed. No version info available.
dataclasses-json: 0.6.7
fastembed: Installed. No version info available.
httpx: 0.27.2
httpx-sse: 0.4.0
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 2.1.2
openai: 1.58.1
orjson: 3.10.12
packaging: 24.1
pydantic: 2.9.2
pydantic-settings: 2.6.1
PyYAML: 6.0.2
qdrant-client: 1.12.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.36
tenacity: 9.0.0
tiktoken: 0.8.0
typing-extensions: 4.12.

The text was updated successfully, but these errors were encountered:

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

hiroci commented Dec 28, 2024 •

edited

Loading

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

Comments

hiroci commented Dec 28, 2024 • edited Loading

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

hiroci commented Dec 28, 2024 •

edited

Loading