I wrote this application to test OCR (Optical Character Recognition) abilities of the new Anthropic Claude 3 models which are vision capable. Right now it can be used to upload and convert a handwritten or digital document in pdf format to markdown.
The Claude 3 family of models come in 3 different sizes: Opus, Sonnet, and Haiku. All of them support prompting with images and especially Sonnet and Haiku come at a very cheap pricing compared to other vision capable large language models like GPT4-Turbo. The main motivator for creating this demo was a post by Anthropic themselves: Claude 3 Haiku turns thousands of physical documents into structured data
Dependencies are managed by Poetry. See their documentation for installation.
Step 1: Clone the repository
git clone https://github.com/makefinks/note-it.git;
cd note-it
Step 2: Install dependencies and activate poetry environment
poetry install;
poetry shell
Step 3: Start the application
cd note_it/frontend/;
python -m streamlit run upload.py
The quality of the output can vary depending on the type of document. Languages other than english will work but my testing was mostly done with english documents (handwritten and digital). At a certain point when the font or characters are to small the model will refuse to output the full text or anything at all. Furthermore, while the model will try to reconstruct tables and diagrams in markdown syntax the results are often poor.
As with every LLM Approach you need to carefully check the generated output for mistakes.