Is there any way to read from memory instead of from a file? #11946
Replies: 5 comments
-
Based on the current implementation of To achieve reading from a bytearray or an in-memory file, you would need to extend or modify the existing Here's a conceptual example of how you might implement a method to read from a bytearray for a hypothetical from io import BytesIO
from llama_index.core.readers.base import BaseReader
from llama_index.core.schema import Document
class InMemoryReader(BaseReader):
def read_from_bytearray(self, data: bytearray, metadata: dict = None) -> List[Document]:
# Convert bytearray to a file-like object
file_like_object = BytesIO(data)
# Process the file-like object as needed
# This is a placeholder for actual processing logic, which would vary based on file format
text = file_like_object.read().decode('utf-8')
# Create and return a Document object
doc = Document(text=text, metadata=metadata or {})
return [doc]
# Example usage
data = bytearray(b"Example content of the file in memory")
reader = InMemoryReader()
documents = reader.read_from_bytearray(data) This example is a simplified illustration. The actual implementation would depend on the specific file formats you need to support and how their respective reader classes in LlamaIndex process files. You would need to adapt the file processing logic within Please note that this is a conceptual example and might require adjustments based on your specific use case and the file formats you are working with. You can refer to the source code of the Sources
|
Beta Was this translation helpful? Give feedback.
-
Did you figure it out? I'm trying to read and process from the streamlit file uploader, which gives byteIO... |
Beta Was this translation helpful? Give feedback.
-
yes, how to use llamaindex loaders with streamlit file uploaded? It just returns the bytes, not the filepath |
Beta Was this translation helpful? Give feedback.
-
I worked around this by dumping uploaded files into a temporary folder that immediately gets deleted afterwards: import os
import uuid
# Create a temporary folder to store in-memory files in, which is removed after use.
# Takes a dictionary of the form {filename: bytes} so it also works for non-streamlit applications.
# Note that this removes all file metadata.
class TempDir:
def __init__(self, files):
# This assumes Unix-like filesystem.
# Consider using the tempfile module to make it cross-platform
self.tmpdir = os.path.join("/tmp/upload/", str(uuid.uuid4()))
self.files = files
def __enter__(self):
os.makedirs(self.tmpdir)
for filename, file_bytes in self.files.items():
file_path = os.path.join(self.tmpdir, filename)
with open(file_path, 'wb') as f:
f.write(file_bytes)
return self.tmpdir
def __exit__(self, exc_type, exc_value, exc_traceback):
for filename in self.files.keys():
file_path = os.path.join(self.tmpdir, filename)
os.remove(file_path)
os.rmdir(self.tmpdir)
return False Use it like this: file_dict = {file.name: file.getvalue() for file in uploaded_files}
with TempDir(file_dict) as tempdir:
reader = SimpleDirectoryReader(input_dir=tempdir) This is probably the most painless way to go about it, but I would definitely prefer to have a canonical solution to this problem. |
Beta Was this translation helpful? Give feedback.
-
The
|
Beta Was this translation helpful? Give feedback.
-
Currently, SimpleDirectoryReader allows us to read from a local filepath.
However, in some applications you might want to take a bytearray of a file that's already in memory and allow LlamaIndex to read this - the alternative would be to save the data into a file, use
SimpleDirectoryReader
, then delete the file.Is there any way to achieve this?
Beta Was this translation helpful? Give feedback.
All reactions