Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body' when trying to use the AzureAIDocumentIntelligenceLoader with the bytes_source parameter #28948

Open
5 tasks done
hiroci opened this issue Dec 28, 2024 · 0 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@hiroci
Copy link

hiroci commented Dec 28, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This raises a TypeError missing 1 required positional argument: 'body' (trying to use the bytes_source parameter)

endpoint = ""
key = ""
loader = AzureAIDocumentIntelligenceLoader(
    api_endpoint=endpoint, api_key=key, mode='single',
    bytes_source=b'%PDF-1.7\n...%',
)

loader.load()

Seems like the error is in the parse_bytes function of the file /langchain_community/document_loaders/parsers/doc_intelligence.py, line 116

all of the other parsers in this file do not specify the name for the second argument in self.client.begin_analyze_document

Example of working parser:

def parse_url(self, url: str) -> Iterator[Document]:
      from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
    
      poller = self.client.begin_analyze_document(
          self.api_model,
          AnalyzeDocumentRequest(url_source=url),
          # content_type="application/octet-stream",
          output_content_format="markdown" if self.mode == "markdown" else "text",
      )
      result = poller.result()
...

Parser that does NOT work

def parse_bytes(self, bytes_source: bytes) -> Iterator[Document]:
      from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
  
      poller = self.client.begin_analyze_document(
          self.api_model,
          analyze_request=AnalyzeDocumentRequest(bytes_source=bytes_source),
          # content_type="application/octet-stream",
          output_content_format="markdown" if self.mode == "markdown" else "text",
      )

The parse_bytes function does not work properly, the second parameter should be body=... instead of analyze_request or do not specify the name of the parameter at all

Error Message and Stack Trace (if applicable)

File "/home/projects/intelligent_chat-be/server/routers/v1/conversation/file_loader.py", line 114, in _load_azure
document = loader.load()
^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/doc_intelligence.py", line 105, in lazy_load
yield from self.parser.parse_bytes(self.bytes_source)
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/langchain_community/document_loaders/parsers/doc_intelligence.py", line 116, in parse_bytes
poller = self.client.begin_analyze_document(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/projects/intelligent_chat-be/.venv/lib/python3.12/site-packages/azure/core/tracing/decorator.py", line 94, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
TypeError: DocumentIntelligenceClientOperationsMixin.begin_analyze_document() missing 1 required positional argument: 'body'

Description

I'm trying to use the azure document intelligence loader from langchain to process a sequence of bytes

System Info

System Information

OS: Linux
OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024
Python Version: 3.12.8 (main, Dec 4 2024, 08:54:12) [GCC 11.4.0]

Package Information

langchain_core: 0.3.28
langchain: 0.3.13
langchain_community: 0.3.13
langsmith: 0.2.4
langchain_openai: 0.2.14
langchain_qdrant: 0.2.0
langchain_text_splitters: 0.3.4
langgraph_sdk: 0.1.48

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.11
async-timeout: Installed. No version info available.
dataclasses-json: 0.6.7
fastembed: Installed. No version info available.
httpx: 0.27.2
httpx-sse: 0.4.0
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 2.1.2
openai: 1.58.1
orjson: 3.10.12
packaging: 24.1
pydantic: 2.9.2
pydantic-settings: 2.6.1
PyYAML: 6.0.2
qdrant-client: 1.12.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.36
tenacity: 9.0.0
tiktoken: 0.8.0
typing-extensions: 4.12.

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant