Turn any document into LLM-ready data in just a few lines of code!

Microsoft released MarkItDown a lightweight Python library that converts any document to Markdown for use with LLMs.

Key Features:

• Converts PDF, Word, Excel, PPT, images, audio to markdown
• Extracts EXIF, OCR, and transcripts automatically
• Available via CLI, Python API, or Docker
• Offers LLM-based image descriptions
• Supports batch conversions

The best part?

It's 100% Open Source

Link to the Github Repo in the comments!

If you're interested in ML, LLMs, RAG, and AI Agents and want to receive Apps and tutorials every week, subscribe to AI Engineering (for free): https://lnkd.in/d9REmcqK