Generate viral LinkedIn posts in your style for free.

Generate LinkedIn posts
Philipp Schmid

Philipp Schmid

These are the best posts from Philipp Schmid.

37 viral posts with 33,985 likes, 930 comments, and 2,693 shares.
34 image posts, 0 carousel posts, 0 video posts, 3 text posts.

๐Ÿ‘‰ Go deeper on Philipp Schmid's LinkedIn with the ContentIn Chrome extension ๐Ÿ‘ˆ

Best Posts by Philipp Schmid on LinkedIn

Llama 3 released! ๐Ÿšจ๐Ÿ””ย Meta just released their best open LLM! ๐Ÿ‘‘๐Ÿš€ย Llama 3 is the next iteration of Llama with a ~10% relative improvement to its predecessor! ๐Ÿคฏย Llama 3 comes in 2 different sizes 8B and 70B with a new extended tokenizer and commercially permissive license! โœ…

๐—ก๐—ฒ๐˜„ ๐—ฎ๐—ป๐—ฑ ๐—ถ๐—บ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ฒ๐—บ๐—ฒ๐—ป๐˜๐˜€ ๐˜๐—ผ ๐˜ƒ๐Ÿฎโœจ:
๐Ÿ” ย Trained on 15T Tokens & fine-tuned on 10M human annotated samples
๐Ÿงฎย 8B & 70B versions as Instruct and Base
๐Ÿš€ย Llama 3 70B best open LLM on MMLU (> 80 ๐Ÿคฏ)
๐Ÿง‘๐Ÿปโ€๐Ÿ’ปย Instruct good at coding 8B with 62.2 and 70B 81.7 on Human Eval
โœ๐Ÿปย Tiktoken-based tokenizer with a 128k vocabulary
๐ŸชŸย 8192 default context window (can be increased)
๐Ÿง ย Used SFT, PPO & DPO for alignment.
๐Ÿ’ฐCommercial use allowed โœ…
๐Ÿค—ย Available on Hugging Face
๐Ÿคย 1-click deployments on Hugging Face, Amazon SageMaker, Google Cloud
๐Ÿ”œย more model sizes & enhanced performance

Blog: https://lnkd.in/ehXXavJ8
Models: https://lnkd.in/ek2pJviv
Chat-Demo: https://lnkd.in/eyRHH2X4

Massive kudos to Meta for continuing its commitment to open AI. Honored to partner with Joseph Spisak and team! ๐Ÿค—ย The gap is melting. ๐ŸงŠ
RAG Developer Attention! ๐Ÿ””ย Docling is a new library from IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with LlamaIndex and LangChain.

TL;DR:
๐Ÿ—‚๏ธ Parses numerous document formats (PDF, DOCX, PPTX, and more) into Markdown & JSON.
๐Ÿ“‘ Advanced PDF processing: handles layout, reading order, and tables.
๐Ÿงฉ Unified document representation for easier processing.
๐Ÿค– integration with LlamaIndex and LangChain for RAG applications.
๐Ÿ” Includes OCR for scanned PDFs.
๐Ÿ’ป User-friendly CLI and Python API.

Docs: https://lnkd.in/ePnM84Fy
Github: https://lnkd.in/eJT46F4j
Post image by Philipp Schmid
Google just released Gemini. ๐Ÿง ย Their most capable and general model. Gemini is a multimodal model trained across image, audio, video, and text data. Based on the Technical Report, the performance of the biggest model (ultra) seems to be on par or slightly better than GPT-4. ๐Ÿ’ช๐Ÿป

๐—ง๐—Ÿ;๐——๐—ฅ:
โœ๏ธ Support Text, Vision, and audio inputs or outputs, e.g., transcription, image generation
๐Ÿ“ Decoder architecture with 32k context length and Multi Query Attention (MQA)
๐Ÿ‘€ Visual Encoder inspired by Flamingo
๐Ÿ“š Trained on web documents, books, and code including image, audio, and video data. No details on the number of tokens.
3๏ธโƒฃ Comes in 3 sizes: Ultra, Pro, and Nano for different use cases
โšก๏ธ Trained using TPUv5e and TPUv4
๐Ÿ“ฑ Pixel 8 Pro is the first smartphone engineered to run Gemini Nano
โฌ†๏ธ Performance of Gemini Ultra on par or slightly better than GPT-4
๐Ÿ’ช Strong capabilities across reasoning, coding, language understanding
๐Ÿ” Used RLHF to fine-tune the model
โŒ No information about the Size of Ultra and Pro models
โŒ No detailed training data

Blog post: https://lnkd.in/eud5Wd74
Technical Report: https://lnkd.in/e5NvbhRv
Post image by Philipp Schmid
Are Vector Databases here to stay? ๐Ÿ” Yes, it seems LLMs are lost in the Middle and lose focus on long inputs.๐Ÿ—บ๐Ÿ‘โ€๐Ÿ—จ
In the โ€œLost in the Middle: How Language Models Use Long Contexts,โ€ a group of researchers from Stanford tried to understand better how LLMs make use of the context๐Ÿ“šโœจ Below are some of my key takeaways: ๐Ÿ“

๐Ÿ” ๐—ข๐—ฏ๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ:
- Analyze and evaluate how LLMs use the context by identifying relevant information within it.

๐Ÿ’ป ๐—œ๐—บ๐—ฝ๐—น๐—ฒ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป:
- Tested open-source (MPT-30B-Instruct, LongChat-13B(16K)) and closed-source (OpenAIโ€™s GPT-3.5-Turbo and Anthropicโ€™s Claude 1.3) models
- Multi-document question-answering where the context included multiple retrieved documents and one correct answer, which position was shuffled around
- Key-value pair retrieval to analyze if longer contexts impact performance

๐Ÿ’ก ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด๐˜€:
- Best Performance when relevant information is at the beginning
- Performance decreases with an increase in context length
- Too many retrieved documents will harm performance
- Improving the retrieval and prompt creation step with Cross-Encoders (ranking) could potentially boost performance by up to 20%
- Extended-context models (GPT-3.5-Turbo vs. GPT-3.5-Turbo (16K)) are not better if the prompt fits the original context.

Check out the full paper here:ย https://lnkd.in/etxXnVyp

Combining Retrieval with Ranking should yield the best performance in RAG for Question Answering. ๐Ÿ‘จโ€๐Ÿ”ฌ

Remember that these are just my personal findings. Make sure always to conduct your own research and analysis. ๐Ÿค—
Post image by Philipp Schmid
New Open-source LLMs! ๐Ÿคฏย The Falcon has landed! ๐Ÿฆ…ย TII just released two new open-source LLMs called Falcon, which comes into size 7B trained on 1.5T tokens and 40B trained on 1T Tokens. ๐Ÿš€๐Ÿ”ฅ

Falcon:
๐Ÿ’ฅย outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
๐ŸŽย uses FlashAttention and multi-query Attention
๐Ÿ” ย has a 2048 context window
๐Ÿ’ฐย comes with a license allowing commercial use, but with limitations. Make sure to check the licenseโ€ผ๏ธ -> now Apache 2.0
๐Ÿง ย was trained on Amazon SageMaker, on 384 A100 40GB GPUs in P4d instances.
๐ŸŒย 40B was trained on a multi-lingual dataset, including German, Spanish, French

Models are available on Hugging Face ๐Ÿค—
7B: https://lnkd.in/ejpGndA2
40B: https://lnkd.in/e6ESxVTK

Check out the official announcement:
๐Ÿ‘‰ย https://falconllm.tii.ae/
Post image by Philipp Schmid
Llama can now code!๐Ÿคฏ ๐Ÿ””ย @Meta just released CodeLlama, a foundation model for Code Generation. ๐Ÿง‘๐Ÿปโ€๐Ÿ’ปย CodeLlama is the next iteration of Metas Llama family and three different sizes, 7B, 13B, and 34B, and has the same commercial-friendly license as Llama 2! ๐Ÿ˜

CodeLlama key factsโœจ:
๐Ÿงฎย 7B, 13B & 34B parameter version
๐Ÿ›ซย initialized from Llama 2
๐Ÿ” ย Trained on 500B Tokens
๐Ÿย  Python version & Instruct version
โœ…ย Commercial use allowed
๐Ÿ˜• Training data unknown
๐ŸชŸย 16384 context window
๐Ÿค—ย Available on Hugging Face

Models: https://lnkd.in/e6VtKqGU
Announcement: https://lnkd.in/eaSan-Tt
Paper: https://lnkd.in/e-QTjQym
Post image by Philipp Schmid
Big news for any developer or AI enthusiast! ๐Ÿค—ย Transformers just released Agents ๐Ÿค–ย to easily build GenerativeAI applications and autonomous agents using LLMs like OpenAssistant, StarCoder, OpenAI, and more.

๐Ÿ“ˆ With Transformers Agents, you can remove the barrier of entry to machine learning and start building powerful agents that respond to complex queries and offer a chat mode. Plus, it's full-multimodality, allowing you to work with text, images, video, audio, and documents. ๐Ÿคฏ

๐Ÿ‘‰ย https://lnkd.in/eN7HGFe5

But how does it work in practice? It's as simple as building a prompt:
1๏ธโƒฃย Tell the agent what it aims to do.
2๏ธโƒฃย Give it tools.
3๏ธโƒฃย Show examples.
4๏ธโƒฃย Give it a task.

Transformers Agents comes with built-in tools like Doc QA, Text QA, Image QA, Speech-to-text and Text-to-speech, Text classification, summarization, translation, download, Image generation, captioning, segmentation, upscaling, and Text to video. And it's designed to be EXTENSIBLE, meaning you can add your own tools or use community-contributed tools. ๐Ÿค– ๐Ÿ“๐ŸŽค ๐ŸŽจ๐ŸŒ… ๐Ÿท๏ธ

Ready to get started now or check out the example notebook: https://lnkd.in/e3XMBBCq
Post image by Philipp Schmid
Llama 3 extended to almost 100,000-token context! โœ… By Combining PoSE and continuing pre-training on Llama 3 8B base for 300M tokens, the community (Wing Lian) managed to extend the context from 8k to 64k. ๐Ÿš€ Applying rope scaling afterward led to a supported context window of close to 100,000 with perfect recall. ๐Ÿคฏ๐Ÿš€

PoSE can extend the context window of LLMs by simulating long inputs using a fixed context window during training. It chunks the document into smaller pieces and simulates them as โ€œlongโ€ versions, which significantly reduces memory and time overhead while maintaining performance.

๐—œ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
๐Ÿšซ Don't increase rope_theta during pertaining
๐Ÿš€ Rank-stabilized LoRA converged much quicker than regular LoRA
โฌ†๏ธ Increased the RoPE theta to extend the context to ~90k
โž• Adapters can be merged with any Llama 3 model to extend the context

Llama 3 8B 64k: https://lnkd.in/d6cprxvT
Original Thread: https://lnkd.in/dnVn8vKu
PoSE Paper: https://lnkd.in/dmtDNwwe
Post image by Philipp Schmid
Excited to announce โ€œllm-sagemakerโ€ a new Terraform module to easily deploy open LLMs from Hugging Face to Amazon Web Services (AWS) SageMaker real-time endpoints! ๐Ÿ‘€ย Infrastructure as Code (IaC) tools are crucial for moving your AI Applications from Notebooks into production! ๐Ÿš€

TL;DR:
๐Ÿš€ New HashiCorp Terraform module simplifies LLM deployment to Amazon SageMaker
๐Ÿค– Support for popular models like Meta Llama 3, Mistral AI, Mixtral, and Cohere Command
๐Ÿ› ๏ธ Handles IAM roles, SageMaker Model, Endpoint Configuration, and Autoscaling
โšกย Example deploy Llama 3.1 8B Instruct in just minutes
๐Ÿ”ง Customizable configurations for TGI
๐Ÿ’ป Easy integration with AWS SDK for inference
โœ…ย Includes integration tests using Gruntwork terratest

Blog: https://lnkd.in/eQM5KtSD
Module: https://lnkd.in/e8THb3Rd

If you have feature requests, please open an issue. ๐Ÿค—
Post image by Philipp Schmid
Coming in hot!๐Ÿ”ฅย Another open-source LLM just got released by Stability AI. ๐Ÿ–ผย The first release of StableLM includes a 3B and 7B parameter model, with a 15-65B model to follow. ๐Ÿคฏย 
Models are released under CC BY-SA license. ๐Ÿ“„

Models are available on Hugging Face! ๐Ÿค—


Model: https://lnkd.in/ehDJ-H5T
RHLF Demo: https://lnkd.in/eYCAzyDY
DoRA a new, better, and faster LoRA? ๐Ÿค” DoRA, or Weight-Decomposed Low-Rank Adaptation, is a new, Parameter Efficient fine-tuning technique that claims to enhance the learning capacity and training stability of LoRA, while avoiding any additional overhead. ๐Ÿคฏ

DoRA decomposes the weights into two components, magnitude, and direction. DoRA adjusts how important each part of the data is (magnitude) and how the model should focus on learning (direction). It's like fine-tuning the details of a picture without needing to redraw the whole thing. ๐Ÿ–ผ๏ธ

๐—œ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
๐Ÿ… DoRA consistently outperforms LoRA
๐Ÿค— Supported in Hugging Face PEFT
โ™ป๏ธ Trained Adapters can be merged back into the model
๐Ÿ“ˆ +3.4% on Llama 7B and +1.0% on Llama 13B compared to LoRA on common reasoning
๐Ÿ” Improved training stability compared to LoRA
โŒย In PEFT, DoRA only supports linear layers at the moment.

Paper: https://lnkd.in/eiQ-ZVTW
Peft documentation: https://lnkd.in/eMYWeGip
Post image by Philipp Schmid
Gemma 4 is here! 4๏ธโƒฃ Our most capable, agentic open model, built on the same research as Gemini 3. Reasoning. Multimodal. Four sizes (2B to 31B). Base + Instruct. โœจ

Released under Apache 2.0. Runs on your phone, laptop, or servers. All you need to know about Gemma 4:

4๏ธโƒฃ 4 sizes (E2B, E4B, 26B4A, 31B)
๐ŸชŸ Up to 256K context window
๐Ÿ› ๏ธ Native function-calling, structured JSON output
๐Ÿ‘๏ธ + audio on edge models (E2B/E4B)
๐ŸŒ Trained on 140+ languages
๐Ÿ† 31B ranks #3 open model on Arena AI
๐Ÿชช Apache 2.0 license
1๏ธโƒฃ Fits on a single GPU
๐Ÿš€ Gemma E4B == Gemma 3 27B

All versions support native function-calling and structured JSON output to build agents that can run locally. The small models (E2B, E4B) can run entirely offline on mobile supporting vision, audio, everything on-device.

Start building with Gemma 4 now.

Try in Google AI Studio โ†’ https://lnkd.in/ddQhAsCs
Hugging Face โ†’ https://lnkd.in/dhNBspd5
Kaggle โ†’ https://lnkd.in/dffxshtG
or in your favorite ecosystem tool!

Blog โ†’ https://lnkd.in/d_EXTGCn
Post image by Philipp Schmid
What if one embedding model could understand text, images, video, audio, and PDFs all at once? Excited to share Gemini Embedding 2 our first fully multimodal embedding model.

๐Ÿ–ผ๏ธ 5 modalities in a single unified embedding space
๐ŸŒ Supports up to 8,192 input tokens, 100+ languages
๐ŸŽง Embeds audio natively, no transcription step needed
๐Ÿ“ Flexible output dimensions: 3,072 / 1,536 / 768 via MRL
๐Ÿ“Žย Up to 6 images, 120s video, and 6-page PDFs per request

Now in Public Preview via Gemini API & Vertex AI.

Docs: https://lnkd.in/dutRFSqH
Blog: https://lnkd.in/d_YpkZq5
Post image by Philipp Schmid
Interested in how open models works? Read this visual guide to Gemma 4 that explains all works with diagrams.

Covers how the model handles images, audio, and text, why only a fraction of parameters run during inference (MoE), and how tiny models (2B) fit on a phone using a clever embedding trick.

Worth the read even if you never plan to fine-tune anything, just to understand what's actually inside these models.

๐Ÿ‘‰ https://lnkd.in/ds4zmhHP

Big Kudos to Maarten Grootendorst! LFG! ๐Ÿš€
Post image by Philipp Schmid
What comes after Transformers? Neural Memory and Test-Time Training. Google Research presented 2 new papers during NeurIps with an architecture that actively learns and update their own parameters during inference, acting as a "long-term neural memory" rather than a static context window.

Implementation:
1๏ธโƒฃ Titans replaces the fixed-state memory of linear RNNs with a deep Multi-Layer Perceptron (MLP) memory module.
2๏ธโƒฃ The model updates this memory at test-time by calculating a "surprise metric" based on the gradient of the input data.
3๏ธโƒฃ MIRAS framework generalizes this by treating memory as an optimization problem with customizable loss functions and regularization.
4๏ธโƒฃ Training is parallelized by chunking sequences, using linear operations within chunks and non-linear updates across chunks.
5๏ธโƒฃ Models incorporate "Persistent Memory" (fixed learnable weights) alongside the dynamic "Contextual Memory" to store task-specific knowledge.

Insights:
- ๐Ÿ’ก Attention mechanisms excellent at short-term memory but fails at efficient long-term storage due to quadratic costs.
- ๐Ÿ“ˆ Deep memory structures (MLPs) significantly outperform vector/matrix-based compression used in Mamba and other linear RNNs.
- ๐Ÿ› ๏ธ Memory updates are effective when driven by "surprise", high gradients indicate unexpected, memorable data.
- ๐Ÿ“š Forgetting mechanisms in recurrent models are mathematically equivalent to retention regularization (weight decay).
- ๐Ÿ“‰ Standard L2 (Mean Squared Error) objectives make memory sensitive to outliers; L1 or Huber loss provides better stability.
- ๐Ÿง  Titans outperforms GPT-4 on "Needle in a Haystack" tasks with 2M+ token contexts despite having fewer parameters.
- โšก Deep memory modules exhibit a trade-off where increased depth improves perplexity but slightly reduces training throughput.

Titans and MIRAS show potential to replace, or at least augment, pure Transformer architectures. The hybrid approach (using attention for the immediate context and Neural Memory for the deep history) suggests the future might be a convergence of RNN efficiency and Transformer performance.

Blog: https://lnkd.in/eXEEwd_t
Titan Paper: https://lnkd.in/ejxAWBJD
Miras Paper: https://lnkd.in/e27DWiyr
Post image by Philipp Schmid
"LLMs are ghosts of internet-scale code" is the recent analogy from Andrej Karpathy on the Dwarkesh Patel podcast perfectly captures the current state of AI in coding.

What does this means for us as developers:

๐Ÿ‘ปย LLMs are brilliant at re-creating what's common. They can instantly generate boilerplate code, scaffold a standard CRUD app, or write a routine unit test. They are channeling the "ghost" of millions of existing codebases. We should absolutely leverage this for speed. This is a massive productivity boost for repetitive tasks.

๐Ÿง ย They get stuck on what's new. Ask an LLM to design a novel algorithm, implement a custom architecture, or work within a unique, non-standard codebase, and the "ghost" gets confused. It will often try to "correct" your innovative solution by forcing a familiar pattern it already knows.

AI doesn't replace fundamentals yet. it magnifies them at the moment. โœจ

Deep knowledge of system design, architecture, and first-principles thinking makes you 10x more productive.

For now keep Learning how to code, keep building!
Super excited for this! ๐Ÿš€ Weโ€™re rolling out a Gemini-powered personal health coach inside Fitbit (now part of Google). It uses a deep-agent architecture to orchestrate between conversational, data science, and domain expert sub-agents.

๐Ÿ“Š Performs complex numerical reasoning on physiological time series data.
๐Ÿ“ฑ Available to eligible U.S. Android Fitbit Premium users, with iOS expanding soon.
โœ… Validated via 1 million+ human annotations and 100k+ hours of evaluation.
๐Ÿ’ฌ Personalized guidance via a 5-10 minute interactive text or voice conversation.
๐ŸŽฏ Adaptive plans based on individual health metrics and goals.
๐Ÿ”ฌ Grounded in behavioral science and Consumer Health Advisory Panel.

Learn more: https://lnkd.in/eafGYwV8
Post image by Philipp Schmid
We just launched Gemini 3.1 Flash Live! Our fastest, most natural real-time voice AI model for building Agents.

- Scores 90.8% on ComplexFuncBench Audio for tool use.
- 70 languages, Video streaming, Audio transcriptions, 128k context
- Comes with Agent Skill for building live voice agents.
- All generated audio is watermarked with SynthID.

Blog: https://lnkd.in/de-j3xCT
Skill: https://lnkd.in/dtdKiuRx
Docs: https://lnkd.in/d9Wu8PjA
Post image by Philipp Schmid
New massive update Google TPUs Inference for open Models, like Gemma! `tpu-inference` is a new vLLM backend that delivers up to 5x performance over previous prototypes.

- ๐Ÿค Unifies PyTorch and JAX under a single JAX-to-XLA lowering path.
- ๐Ÿš€ 2x-5x higher performance than the February 2025 prototype.
- ๐Ÿ“ˆ 20% higher throughput without model code changes.
- โšก Ragged Paged Attention V3 increases throughput by ~10% on Trillium (v6e).
- ๐Ÿ› ๏ธ Single Program, Multi-Data (SPMD) as the default
- โ˜๏ธ Support for Trillium (v6e) and v5e TPU

Learn More: https://lnkd.in/d6mCNaQf
Post image by Philipp Schmid
5 practical tips for Context Engineering, which apply to Google DeepMind Gemini as well:

1๏ธโƒฃ Context Ordering Matters:ย Try to use "append-only" context, adding new information to the end. This maximizes cache hits reducing cost (4x) and latency.
2๏ธโƒฃ Manage Tools Statically:ย Avoid changing tool order or availability mid-task, if not explicitly needed. This might break context caching and can/will confuse models if used tools in the history are no longer defined.
3๏ธโƒฃ Use External Memory:ย Write explicitly or implicitly context/goals to external storage to. Preventing information loss. A typical task in Manus requires around 50 tool calls on average.
4๏ธโƒฃ Recite Goals to not get lost:ย Prevent the model from "getting lost" by having it periodically restate its objectives. This keeps the primary goal in its recent attention span.
5๏ธโƒฃ Embrace Errors:ย Keep error messages in the context. This allows the model to learn from its mistakes and avoid repeating them.

Learn from it: https://lnkd.in/e6n5ej7t
Post image by Philipp Schmid
Excited to share a System Instructions for Gemini 3 Pro that improved performance on several agentic benchmarks by around 5%. We collaborated with Google DeepMind post-training research team to include some tips our docs. ๐Ÿš€

To maximize reliability in multi-step workflows, you should craft instructions that explicitly control how the model reasons and plans. While Gemini provides strong general reasoning, complex agents benefit from prompts that enforce specific behaviors like persistence in the face of issues, risk assessment, and proactive planning.

SI: https://lnkd.in/eFYG_GsK
Docs: https://lnkd.in/eT7gFr-m
Post image by Philipp Schmid
18,000 tokens/sec even for Llama 3.1 8B is ridiculous. Even a "dumb" model as llama 3.1 would be incredibly useful at this speed.

It works by merging storage and compute, permanently etching the model parameters directly into the physical transistors of the chip.

Demo link: https://chatjimmy.ai/
Post image by Philipp Schmid
Today, we're launching the File Search Tool, a fully managed RAG system built directly into the Gemini API that abstracts away the retrieval pipeline so you can focus on building. File Search provides a simple, integrated and scalable way to ground Gemini with your data, delivering responses that are more accurate, relevant and verifiable.

To make File Search simple and affordable for all developers, weโ€™re making storage and embedding generation at query time free of charge. You only pay for creating embeddings when you first index your files, at a fixed rate of $0.15 per 1 million tokens (or whatever the applicable embedding model cost is, in this case gemini-embedding-001). This new billing paradigm makes the File Search Tool both significantly easier and very cost-effective to build and scale with.

Here are the highlights:
๐Ÿ†“ $0.15/m tokens for indexing, free storage and embedding generation at query time.
๐Ÿ’ฐ Pay a fixed rate of $0.15 per 1M tokens only for initial indexing.
๐Ÿ” Leverage advanced vector search via the Gemini Embedding model.
๐Ÿงพ Receive automatic citations to verify source document usage.
๐Ÿ“‚ Ground models with PDFs, DOCX, TXT, JSON, and code formats.
โšก Combine parallel query results in under 2 seconds.
๐Ÿ› ๏ธ Integrate directly into the existingย generateContentย API.

Get started with the docs:ย https://lnkd.in/eqhdzH79
Post image by Philipp Schmid
๐Ÿ—ƒ๏ธ Context Caching Update! The Gemini API implicit caching now with 90% cost savings when your requests hit the cache!ย 
This means if you send a request to Gemini models with a common prefix as one of previous requests, it might be cached. No code changes needed.

Price details (Normal Input/ Cached Input) per 1M token:
- Gemini 2.5 Pro: $1.25 / $0.125
- Gemini 2.5 Flash: $0.30 / $0.03
- Gemini 2.5 Flash-Lite: $0.10 / $0.01

All prices: https://lnkd.in/eZNzzBny

Enjoy :)
Post image by Philipp Schmid
Nano Banana Pro (Gemini 3 Pro Image) is here ๐ŸŒ It โ€œthinks" through a prompt and can retrieve real-time data, such as weather forecasts or stock charts, using Google Search grounding before generating high-fidelity images starting at $0.13 per image.

- $0.134 per 2k image / $0.24 per 4K image.
- Support 10 different aspect ratio and upscaling to 2K and 4K.
- Integrated with Google Search to access real-time data.
- Allows up to 14 reference input images for compositing.
- Highly accurate and clear text rendering.
- Now available in Google AI Studio and Gemini API.

Testing this felt magical. My creativity is more limiting than the models capabilities. The potential impact is massive.

Blog: https://lnkd.in/epa2BEKpย 
Dev Guide: https://lnkd.in/efhAnhMP
Docs: https://lnkd.in/e5tPsVCa

Image Prompt (with Google Search Grounding): Generate an infographic of the pizza per capita.
Post image by Philipp Schmid
New Guide! โœจLearn how to deploy n8n on Google Cloud Run and build your first AI agent with Gemini 2.5.

โ˜๏ธ Serverless Deployment: Run open-source n8n on Google Cloud Run for auto-scaling and reduced operational overhead.
๐Ÿง  Gemini 2.5 Integration: Native support for Google's latest models to power advanced AI agents.
๐Ÿ”Œ MCP Support: Connect your LLMs to external tools and data sources instantly using the Model Context Protocol standard.
๐Ÿ’พ Robust Persistence: detailed setup for Cloud SQL (PostgreSQL) ensuring your workflows never lose data, using cost-effective micro tiers.
๐Ÿ“š Community Power: Leverage over 600+ pre-built Gemini automation workflows to get started immediately.

Implementation:
1. Install the Google Cloud CLI and authenticate with your GCP account.
2. Create a new GCP project, enable billing, and activate necessary APIs for Cloud Run, SQL, and Secret Manager.
3. Provision a Cloud SQL PostgreSQL instance for persistent storage.
4. Secure database credentials and encryption keys using Google Secret Manager.
5. Create a service account with permissions to access Cloud SQL and the stored secrets.
6. Deploy the n8n container to Cloud Run, ensuring CPU throttling is disabled to support background workflows.
7. Access the deployed n8n instance and initialize the owner account.
8. Configure Google Gemini API credentials within n8n.
9. Build a new workflow using the "AI Agent" node connected to the Gemini Chat Model.
10. Extend the agent's capabilities by adding tools, such as an MCP Client for external data access.

Blog: https://lnkd.in/eeJjUagN
Post image by Philipp Schmid
Here is how Spotify uses coding background agents for thousands of code migrations and what they learned:

- Define desired verifiable end states explicitly, not strict todo steps.
- Code examples improved output reliability.
- Agent has access to 3 tools, verify, git and bash.
- โ€œverifyโ€ tool runs formatters, linters, and tests > AGENTS md.

https://lnkd.in/dsth8x4w
Post image by Philipp Schmid
We created a GitHub repo for all MCP at Google.

Get info on our remote managed MCP servers, open source MCP servers, examples, and learning resources.

github.com/google/mcp
Post image by Philipp Schmid
Excited to introduce WeatherNext 2 ๐ŸŒฆ๏ธ A new AI model from Google DeepMind and Google Research delivering faster, higher-resolution global weather predictions.

- Generates forecasts 8x faster, requiring under one minute on a single TPU.
- Surpasses prior models on 99.9% of variables across 0-15 day lead times.
- Simulates hundreds of possible weather scenarios from a single data input.
- Delivers high-resolution global forecasts down to specific 1-hour intervals.

Learn More: https://lnkd.in/e8NvFUpj

Should we add this to the Gemini API and Google AI Studio? ๐Ÿค”
Post image by Philipp Schmid
Nano Banana 2 is here! Pro-level image generation and editing at Flash speed with Images from web search.

๐Ÿ’ฐ $0.045 per 512x image, $0.067 per 1024x image.
๐ŸŒ Improved image quality, consistency & i18n text rendering.
๐Ÿ–Š๏ธ Supports Text -> Text + Image(s) and Image + Text -> Text + Image(s).
๐Ÿ–ผ๏ธ Default 1024x output, with new 512x, 2048 and 4096 output resolution.
๐Ÿ” New Text + Image search to inform generation with both text and images.
๐Ÿ“ Aspect ratios: 1:1, 1:4, 4:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, 1:8, 8:1.

Available in Gemini API, AI Studio, Gemini App, Antigravity and Vertex !

https://lnkd.in/d9a_-KfU
Post image by Philipp Schmid
If 2025 was the beginning of agents, 2026 will be around Agent Harnesses. An Agent Harness is the infrastructure that wraps around an AI model to manage long-running tasks. It is not the agent itself. It operates at a higher level than agent frameworks. The harness provides prompt presets, opinionated handling for tool calls (Human in the loop), lifecycle hooks or ready-to-use capabilities like planning, filesystem access or sub-agent management.

As Benchmarks are going to become more complex we need to bridge the gap between benchmark claims and user experience. A Agent Harness can be essential for three critical reasons:
- Validating Real-World Progress: Allows users to easily test and compare how the latest models perform against their use cases and constraints.
- Empowering User Experience: Without a harness, the user's experience might be behind the model's potential.
- Hill Climbing via Real-World Feedback: A shared, stable environment (Harness) creates a feedback loop where researchers can iterate and improve ("hill climb") based on actual user adoption.

We are heading toward a convergence of training and inference environments. We see a new bottleneck being context durability. The Harness will become the primary tool for solving "model driftโ€.

Read more in my blog: https://lnkd.in/dzW2tmkh
Post image by Philipp Schmid
Learn how to build your own AI agent from scratch with Gemini 3 Pro. Excited to share this practical guide designed for everyone starting from simple text generation to a functioning CLI Agent.

๐Ÿ› ๏ธ Construct a working prototype in under 100 lines of code.
๐Ÿš€ Start from basic text generation using the Gemini 3 Pro model.
โš™๏ธ Define function capabilities using JSON schemas.
๐Ÿ”„ Build a logic loop to intercept and execute tool calls.
๐Ÿ’ป Assemble components into a functional multi-turn CLI application.
๐Ÿ“š Learn Best Practices for Engineering Agents

Blog: https://lnkd.in/ewKj9uzp

Very proud and excited to share this! ๐Ÿค—
Post image by Philipp Schmid
An `AGENTS.md` (or equivilant) has the highest configuration point for agents. It's injected into every conversation. But research shows that doing it wrong actively hurts performance. Here's how to do it right, backed by data

## Less Is More

- Auto-generated AGENTS.md reduce success rates by ~3% while increasing inference cost by over 20%
- Human-written AGENTS.md only marginally improve performance (~4%)
- Stronger models don't generate better context files.
- Codebase overviews in AGENTS.md don't help agents navigate faster.
- LLM-generated files are redundant with existing docs.
- Instructions ARE followed. Agents respect AGENTS.md instructions, but unnecessary requirements make tasks harder.

## What to Include

- The WHAT: Your tech stack, project structure, and what each part does. Critical for monorepos.
- The WHY: The purpose of the project and its key components. Help the agent understand intent, not just structure.
- The HOW: How to build, test, and verify changes. Include non-obvious tooling (e.g., `uv` instead of `pip`, `bun` instead of `npm`). Tools mentioned in AGENTS.md get used 160x more often than unmentioned ones.

## What NOT to Include

- Detailed codebase overviews or directory listings. The paper found these don't help agents navigate faster, and agents can discover structure themselves.
- Code style guidelines. Use linters and formatters instead, they're faster, cheaper, and deterministic.
- Task-specific instructions that only apply sometimes. Since AGENTS.md goes into every session, non-universal instructions dilute focus.
- Auto-generated content. Don't let the agent write its own AGENTS.md. The data shows this hurts more than it helps.

## How to Structure It

- Keep it short. General consensus is <300 lines; HumanLayer keeps theirs under 60 lines. Every line goes into every session, make each one count.
- Use progressive disclosure. Don't put everything in AGENTS.md. Instead, keep task-specific docs in separate files (e.g., `agent_docs/running_tests.md`, `agent_docs/database_schema.md`) and list them in AGENTS.md with brief descriptions so the agent reads them only when relevant.
- Prefer pointers over copies. Reference `file:line` locations rather than embedding code snippets that will go stale.
- Write it yourself, deliberately. A bad line in AGENTS.md cascades into bad plans, bad code, and bad results across every session.

Paper: https://lnkd.in/dh7wqU2z
Blog: https://lnkd.in/d6ih8DuM).
Post image by Philipp Schmid
Gemini 3.1 Flash-Lite is here! ๐Ÿ”ฆ Our fastest, most cost-efficient gemini model built for high-volume workloads at scale.

๐Ÿ’ฐ Priced at $0.25/M input, $1.50/M output tokens
๐Ÿง  Matches 2.5 Flash quality at Flash-Lite cost
โšก2.5x TFT and 45% faster output vs 2.5 Flash
๐Ÿ’ฝ Enables low-latency entity extraction, classification or data processing

Now in preview on AI Studio, Gemini API & Vertex AI. ๐Ÿ”ฝ

Try in AIS now: https://lnkd.in/d-n7hPDn
Blog: https://lnkd.in/dg6zAy3b
Post image by Philipp Schmid
I have been using Gemini 3 Pro for a bit here are my best Practices for General Usage. Including principles and structural patterns that are currently working best for me.

This isn't meant to be treated as the gold standard, but rather as a starting point to help you refine your own strategies. Take what works, tweak what doesn't, and keep iterating.

https://lnkd.in/eAa9gFD9
Post image by Philipp Schmid
Introducing the Gemini Docs MCP Server, a local STDIO server for searching and retrieving Google Gemini API documentation. This should help you build with latest SDKs and model versions. ๐Ÿš€

- Run the server directly via uvx without explicit installation.
- Performs full-text search across all Gemini documentation pages locally.
- Passed 114/117 for Python and Typescript using latest SDKs and Model.
- 3 Tools: search_documentation, get_capability_page, get_current_model.
- Utilizes a local SQLite database with FTS5 for efficient querying.
- Works with VS Code, Cursor, Gemini CLI โ€ฆ and every other tool support MCP.

Github Repository: https://lnkd.in/eTyzmAR2

Note: Thats currently a side project of me. Should we make an official one?
Post image by Philipp Schmid
More Gemma! Meet TranslateGemma, a new collection of open translation models built on Gemma 3 designed for high-performance communication.

- Available in 4B, 12B, and 27B parameter sizes.
- Evaluated on 55 languages using the WMT24++ dataset.
- 12B model outperforms the Gemma 3 27B baseline.
- 4B model optimized specifically for mobile and edge deployment.

Models: https://lnkd.in/d2KGtZiq
Technical Report: https://lnkd.in/dXMnHGrH
Post image by Philipp Schmid

Related Influencers