Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/save document converter result #157

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("test.xlsx")
print(result.text_content)
result.save("test.md")
```

To use Large Language Models for image descriptions, provide `llm_client` and `llm_model`:
Expand Down
9 changes: 7 additions & 2 deletions src/markitdown/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,12 @@ def main():
"--output",
help="Output file name. If not provided, output is written to stdout.",
)
parser.add_argument(
"-e",
"--encoding",
help="Encoding of the output file. Defaults to utf-8.",
default="utf-8",
)
args = parser.parse_args()

if args.filename is None:
Expand All @@ -72,8 +78,7 @@ def main():
def _handle_output(args, result: DocumentConverterResult):
"""Handle output to stdout or file"""
if args.output:
with open(args.output, "w", encoding="utf-8") as f:
f.write(result.text_content)
result.save(args.output, encoding=args.encoding)
else:
print(result.text_content)

Expand Down
11 changes: 11 additions & 0 deletions src/markitdown/_markitdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,17 @@ def __init__(self, title: Union[str, None] = None, text_content: str = ""):
self.title: Union[str, None] = title
self.text_content: str = text_content

def save(self, file_path: str, encoding: str = "utf-8") -> None:
"""
Save the converted document result `text_content` to a file.

params:
file_path: The path to save the document result to.
encoding: The encoding to use when writing the document.
"""
with open(file_path, "w", encoding=encoding) as f:
f.write(self.text_content)

Comment on lines +150 to +160
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it serves the same purpose as #116. could you refactor the code to use a single function for better reusability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure @l-lumin,

I think it makes more sense to have the function to save the result into a file inside the DocumentConverterResult.

  • I refactored the CLI to use this function.
  • I also added an encoding parameter to the arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think so too. Look good


class DocumentConverter:
"""Abstract superclass of all DocumentConverters."""
Expand Down