Best of Philipp Schmid - LinkedIn Posts by Philipp Schmid

Philipp Schmid

Apr 18, 2024 at 4:29 PM

Llama 3 released! 🚨🔔 Meta just released their best open LLM! 👑🚀 Llama 3 is the next iteration of Llama with a ~10% relative improvement to its predecessor! 🤯 Llama 3 comes in 2 different sizes 8B and 70B with a new extended tokenizer and commercially permissive license! ✅

𝗡𝗲𝘄 𝗮𝗻𝗱 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 𝘁𝗼 𝘃𝟮✨:
🔠 Trained on 15T Tokens & fine-tuned on 10M human annotated samples
🧮 8B & 70B versions as Instruct and Base
🚀 Llama 3 70B best open LLM on MMLU (> 80 🤯)
🧑🏻‍💻 Instruct good at coding 8B with 62.2 and 70B 81.7 on Human Eval
✍🏻 Tiktoken-based tokenizer with a 128k vocabulary
🪟 8192 default context window (can be increased)
🧠 Used SFT, PPO & DPO for alignment.
💰Commercial use allowed ✅
🤗 Available on Hugging Face
🤝 1-click deployments on Hugging Face, Amazon SageMaker, Google Cloud
🔜 more model sizes & enhanced performance

Blog: https://lnkd.in/ehXXavJ8
Models: https://lnkd.in/ek2pJviv
Chat-Demo: https://lnkd.in/eyRHH2X4

Massive kudos to Meta for continuing its commitment to open AI. Honored to partner with Joseph Spisak and team! 🤗 The gap is melting. 🧊

▿ Show more

Philipp Schmid

Nov 1, 2024 at 4:57 PM

RAG Developer Attention! 🔔 Docling is a new library from IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with LlamaIndex and LangChain.

TL;DR:
🗂️ Parses numerous document formats (PDF, DOCX, PPTX, and more) into Markdown & JSON.
📑 Advanced PDF processing: handles layout, reading order, and tables.
🧩 Unified document representation for easier processing.
🤖 integration with LlamaIndex and LangChain for RAG applications.
🔍 Includes OCR for scanned PDFs.
💻 User-friendly CLI and Python API.

Docs: https://lnkd.in/ePnM84Fy
Github: https://lnkd.in/eJT46F4j

▿ Show more

Philipp Schmid

Dec 6, 2023 at 3:44 PM

Google just released Gemini. 🧠 Their most capable and general model. Gemini is a multimodal model trained across image, audio, video, and text data. Based on the Technical Report, the performance of the biggest model (ultra) seems to be on par or slightly better than GPT-4. 💪🏻

𝗧𝗟;𝗗𝗥:
✏️ Support Text, Vision, and audio inputs or outputs, e.g., transcription, image generation
📏 Decoder architecture with 32k context length and Multi Query Attention (MQA)
👀 Visual Encoder inspired by Flamingo
📚 Trained on web documents, books, and code including image, audio, and video data. No details on the number of tokens.
3️⃣ Comes in 3 sizes: Ultra, Pro, and Nano for different use cases
⚡️ Trained using TPUv5e and TPUv4
📱 Pixel 8 Pro is the first smartphone engineered to run Gemini Nano
⬆️ Performance of Gemini Ultra on par or slightly better than GPT-4
💪 Strong capabilities across reasoning, coding, language understanding
🔁 Used RLHF to fine-tune the model
❌ No information about the Size of Ultra and Pro models
❌ No detailed training data

Blog post: https://lnkd.in/eud5Wd74
Technical Report: https://lnkd.in/e5NvbhRv

▿ Show more

Philipp Schmid

Jul 15, 2023 at 9:10 AM

Are Vector Databases here to stay? 🔍 Yes, it seems LLMs are lost in the Middle and lose focus on long inputs.🗺👁‍🗨
In the “Lost in the Middle: How Language Models Use Long Contexts,” a group of researchers from Stanford tried to understand better how LLMs make use of the context📚✨ Below are some of my key takeaways: 📝

🔍 𝗢𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲:
- Analyze and evaluate how LLMs use the context by identifying relevant information within it.

💻 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻:
- Tested open-source (MPT-30B-Instruct, LongChat-13B(16K)) and closed-source (OpenAI’s GPT-3.5-Turbo and Anthropic’s Claude 1.3) models
- Multi-document question-answering where the context included multiple retrieved documents and one correct answer, which position was shuffled around
- Key-value pair retrieval to analyze if longer contexts impact performance

💡 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴𝘀:
- Best Performance when relevant information is at the beginning
- Performance decreases with an increase in context length
- Too many retrieved documents will harm performance
- Improving the retrieval and prompt creation step with Cross-Encoders (ranking) could potentially boost performance by up to 20%
- Extended-context models (GPT-3.5-Turbo vs. GPT-3.5-Turbo (16K)) are not better if the prompt fits the original context.

Check out the full paper here: https://lnkd.in/etxXnVyp

Combining Retrieval with Ranking should yield the best performance in RAG for Question Answering. 👨‍🔬

Remember that these are just my personal findings. Make sure always to conduct your own research and analysis. 🤗

▿ Show more

Philipp Schmid

May 26, 2023 at 12:38 PM

New Open-source LLMs! 🤯 The Falcon has landed! 🦅 TII just released two new open-source LLMs called Falcon, which comes into size 7B trained on 1.5T tokens and 40B trained on 1T Tokens. 🚀🔥

Falcon:
💥 outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
🏎 uses FlashAttention and multi-query Attention
🔠 has a 2048 context window
💰 comes with a license allowing commercial use, but with limitations. Make sure to check the license‼️ -> now Apache 2.0
🧠 was trained on Amazon SageMaker, on 384 A100 40GB GPUs in P4d instances.
🌍 40B was trained on a multi-lingual dataset, including German, Spanish, French

Models are available on Hugging Face 🤗
7B: https://lnkd.in/ejpGndA2
40B: https://lnkd.in/e6ESxVTK

Check out the official announcement:
👉 https://falconllm.tii.ae/

▿ Show more

Philipp Schmid

Aug 24, 2023 at 6:56 PM

Llama can now code!🤯 🔔 @Meta just released CodeLlama, a foundation model for Code Generation. 🧑🏻‍💻 CodeLlama is the next iteration of Metas Llama family and three different sizes, 7B, 13B, and 34B, and has the same commercial-friendly license as Llama 2! 😍

CodeLlama key facts✨:
🧮 7B, 13B & 34B parameter version
🛫 initialized from Llama 2
🔠 Trained on 500B Tokens
🐍 Python version & Instruct version
✅ Commercial use allowed
😕 Training data unknown
🪟 16384 context window
🤗 Available on Hugging Face

Models: https://lnkd.in/e6VtKqGU
Announcement: https://lnkd.in/eaSan-Tt
Paper: https://lnkd.in/e-QTjQym

▿ Show more

Philipp Schmid

May 10, 2023 at 4:38 PM

Big news for any developer or AI enthusiast! 🤗 Transformers just released Agents 🤖 to easily build GenerativeAI applications and autonomous agents using LLMs like OpenAssistant, StarCoder, OpenAI, and more.

📈 With Transformers Agents, you can remove the barrier of entry to machine learning and start building powerful agents that respond to complex queries and offer a chat mode. Plus, it's full-multimodality, allowing you to work with text, images, video, audio, and documents. 🤯

👉 https://lnkd.in/eN7HGFe5

But how does it work in practice? It's as simple as building a prompt:
1️⃣ Tell the agent what it aims to do.
2️⃣ Give it tools.
3️⃣ Show examples.
4️⃣ Give it a task.

Transformers Agents comes with built-in tools like Doc QA, Text QA, Image QA, Speech-to-text and Text-to-speech, Text classification, summarization, translation, download, Image generation, captioning, segmentation, upscaling, and Text to video. And it's designed to be EXTENSIBLE, meaning you can add your own tools or use community-contributed tools. 🤖 📝🎤 🎨🌅 🏷️

Ready to get started now or check out the example notebook: https://lnkd.in/e3XMBBCq

▿ Show more

Philipp Schmid

Apr 26, 2024 at 6:59 AM

Llama 3 extended to almost 100,000-token context! ✅ By Combining PoSE and continuing pre-training on Llama 3 8B base for 300M tokens, the community (Wing Lian) managed to extend the context from 8k to 64k. 🚀 Applying rope scaling afterward led to a supported context window of close to 100,000 with perfect recall. 🤯🚀

PoSE can extend the context window of LLMs by simulating long inputs using a fixed context window during training. It chunks the document into smaller pieces and simulates them as “long” versions, which significantly reduces memory and time overhead while maintaining performance.

𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
🚫 Don't increase rope_theta during pertaining
🚀 Rank-stabilized LoRA converged much quicker than regular LoRA
⬆️ Increased the RoPE theta to extend the context to ~90k
➕ Adapters can be merged with any Llama 3 model to extend the context

Llama 3 8B 64k: https://lnkd.in/d6cprxvT
Original Thread: https://lnkd.in/dnVn8vKu
PoSE Paper: https://lnkd.in/dmtDNwwe

▿ Show more

Philipp Schmid

Aug 5, 2024 at 7:25 AM

Excited to announce “llm-sagemaker” a new Terraform module to easily deploy open LLMs from Hugging Face to Amazon Web Services (AWS) SageMaker real-time endpoints! 👀 Infrastructure as Code (IaC) tools are crucial for moving your AI Applications from Notebooks into production! 🚀

TL;DR:
🚀 New HashiCorp Terraform module simplifies LLM deployment to Amazon SageMaker
🤖 Support for popular models like Meta Llama 3, Mistral AI, Mixtral, and Cohere Command
🛠️ Handles IAM roles, SageMaker Model, Endpoint Configuration, and Autoscaling
⚡ Example deploy Llama 3.1 8B Instruct in just minutes
🔧 Customizable configurations for TGI
💻 Easy integration with AWS SDK for inference
✅ Includes integration tests using Gruntwork terratest

Blog: https://lnkd.in/eQM5KtSD
Module: https://lnkd.in/e8THb3Rd

If you have feature requests, please open an issue. 🤗

▿ Show more

Philipp Schmid

Apr 19, 2023 at 4:39 PM

Coming in hot!🔥 Another open-source LLM just got released by Stability AI. 🖼 The first release of StableLM includes a 3B and 7B parameter model, with a 15-65B model to follow. 🤯
Models are released under CC BY-SA license. 📄

Models are available on Hugging Face! 🤗

Model: https://lnkd.in/ehDJ-H5T
RHLF Demo: https://lnkd.in/eYCAzyDY

▿ Show more

Philipp Schmid

Mar 30, 2024 at 8:30 AM

DoRA a new, better, and faster LoRA? 🤔 DoRA, or Weight-Decomposed Low-Rank Adaptation, is a new, Parameter Efficient fine-tuning technique that claims to enhance the learning capacity and training stability of LoRA, while avoiding any additional overhead. 🤯

DoRA decomposes the weights into two components, magnitude, and direction. DoRA adjusts how important each part of the data is (magnitude) and how the model should focus on learning (direction). It's like fine-tuning the details of a picture without needing to redraw the whole thing. 🖼️

𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
🏅 DoRA consistently outperforms LoRA
🤗 Supported in Hugging Face PEFT
♻️ Trained Adapters can be merged back into the model
📈 +3.4% on Llama 7B and +1.0% on Llama 13B compared to LoRA on common reasoning
🔐 Improved training stability compared to LoRA
❌ In PEFT, DoRA only supports linear layers at the moment.

Paper: https://lnkd.in/eiQ-ZVTW
Peft documentation: https://lnkd.in/eMYWeGip

▿ Show more

Philipp Schmid

Best Posts by Philipp Schmid on LinkedIn

Related Influencers

Ashish Pratap Singh

UN Biodiversity

Alex Su

Priyansh Agarwal