A newer version of the Streamlit SDK is available:
1.44.1
metadata
title: PDF to Markdown Converter
emoji: π
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
PDF to Markdown Converter
This application converts PDF documents to Markdown format. It uses the docling
library for document conversion and provides a simple Streamlit interface.
Features
- Upload PDF files directly
- Convert PDFs from URLs
- Batch process multiple images using vLLM
- Download the resulting Markdown files
- Clean, user-friendly interface
How to Use
PDF to Markdown
- Select the "PDF to Markdown" tab
- Upload a PDF file using the file uploader or enter a URL to a PDF document
- Click the "Convert to Markdown" button
- Once conversion is complete, download the Markdown file
Batch Image Processing
- Select the "Batch Image Processing" tab
- Upload multiple image files (PNG, JPG, JPEG)
- Optionally customize the model path and prompt text
- Click the "Process Images" button
- Once processing is complete, download the ZIP file containing all results
Technical Details
Built with:
- Streamlit 1.29.0
- Docling 2.7.0
- docling_core
- vLLM (for batch processing)
- Python 3.12
Deployment
This application is deployed on Hugging Face Spaces.
To deploy this application:
- Create a new Space on Hugging Face (https://huggingface.co./spaces)
- Choose "Streamlit" as the SDK
- Upload all these files to the Space repository:
- app.py
- requirements.txt
- README.md
- runtime.txt
The application will automatically create any necessary directories when it starts.
Note: The vLLM functionality requires significant computational resources, so you may need to select a more powerful hardware configuration for your Space.