title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned | license | short_description |
---|---|---|---|---|---|---|---|---|---|
QWEN2 VL |
🍍 |
blue |
yellow |
gradio |
5.9.1 |
app.py |
true |
creativeml-openrail-m |
Qwen VL 2B |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Welcome to the Qwen2-VL repository, a powerful tool for vision and language processing tasks. This project leverages advanced models to analyze images and generate descriptive text, offering a seamless integration of computer vision and natural language processing.
Qwen2-VL provides a user-friendly interface for processing images and generating text outputs. The application supports multiple models, each tailored for specific tasks such as OCR, math parsing, and text analogy. The generated outputs can be formatted and exported into various document formats, including PDF and DOCX.
- Model Selection: Choose from a variety of models optimized for different tasks, including OCR, math parsing, and text analogy.
- Image Input: Upload images for analysis and generate descriptive text.
- Text Input: Ask questions or provide instructions related to the image.
- Output Formatting: Customize the output text, including font choice, size, line spacing, and alignment.
- Document Generation: Export the processed image and text into PDF or DOCX formats.
- Streamlined Interface: A clean and intuitive Gradio interface for easy interaction.
To run the Qwen2-VL application locally, follow these steps:
-
Clone the Repository:
git clone https://github.com/PRITHIVSAKTHIUR/Qwen2-VL.git cd Qwen2-VL
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Application:
python app.py
-
Access the Interface: Open your web browser and navigate to
http://127.0.0.1:7860
to access the Gradio interface.
- Upload an Image: Use the file uploader to select an image for analysis.
- Select a Model: Choose the appropriate model from the dropdown menu based on your task.
- Ask a Question: Enter a question or instruction related to the image in the textbox.
- Submit: Click the "Submit" button to generate the output text.
- Customize Output: Adjust the font, size, line spacing, and alignment of the output text.
- Generate Document: Click the "Get Document" button to export the image and text into a PDF or DOCX file.
The application includes several examples to help you get started. Click on any example to load the image and pre-fill the question and model selection.
The following models are available for use:
- Qwen2VL Base: General-purpose vision and language model.
- Latex OCR: Optimized for OCR tasks involving LaTeX content.
- Math Prase: Specialized for parsing and solving mathematical problems.
- Text Analogy Ocrtest: Designed for text analogy tasks with OCR.
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
- Hugging Face: For providing the model collections and libraries.
- Gradio: For the easy-to-use interface framework.
For any questions or feedback, please contact Prithiv Sakthi.
- GitHub Repository: Qwen2-VL
- Hugging Face Collection: Vision-Language Models