Generate viral LinkedIn posts in your style for free.

Generate LinkedIn posts
Sebastian Raschka, PhD

Sebastian Raschka, PhD

These are the best posts from Sebastian Raschka, PhD.

5 viral posts with 13,475 likes, 318 comments, and 947 shares.
5 image posts, 0 carousel posts, 0 video posts, 0 text posts.

👉 Go deeper on Sebastian Raschka, PhD's LinkedIn with the ContentIn Chrome extension 👈

Best Posts by Sebastian Raschka, PhD on LinkedIn

I just put together a short Jupyter notebook with tips and tricks for reducing memory usage when loading larger and larger models (like LLMs) in PyTorch. These approaches become increasingly important as we are trying to make this models work on limit hardware configurations.

Here's the link to the notebook: https://lnkd.in/gt-VMGfv

By the way, the examples aren't just for LLMs. These techniques apply to any model in PyTorch.
Post image by Sebastian Raschka, PhD
If you are looking for something to read/study this weekend, I added lots of LLM-related bonus from-scratch coding resources over the last few months (from implementing Llama 3.2 to preference tuning with DPO): https://lnkd.in/gBGdi36J\n\nI hope you find them useful! (My personal favorites are highlighted with a star.)
Post image by Sebastian Raschka, PhD
If you enjoy from-scratch implementations of self-attention and multi-head attention, I have compared and collected a few implementations for you here.
For readability, I particularly appreciate the compact one in the lower left, which features combined QKV matrices (courtesy of Md Rayed Bin Wahed).

Yes, PyTorch's scaled_dot_product_attention in the lower right, thanks to Flash Attention 2, is incredibly fast!

The code is available here on GitHub: https://lnkd.in/gcKY-yMd
Post image by Sebastian Raschka, PhD
It's been many months in the making, and I am excited to share that the print version of Machine Learning Q and AI is now finally available as of today! I am super excited about this!
This book covers 30 topics related to machine learning and AI. It's meant for those who are already familiar with the basics and are looking for more advanced topics in a concise and focused format.
I included the table of contents below:

Introduction 
PART I: NEURAL NETWORKS AND DEEP LEARNING
Chapter 1: Embeddings, Representations, and Latent Space 
Chapter 2: Self-Supervised Learning 
Chapter 3: Few-Shot Learning
Chapter 4: The Lottery Ticket Hypothesis
Chapter 5: Reducing Overfitting with Data
Chapter 6: Reducing Overfitting with Model Modifications
Chapter 7: Multi-GPU Training Paradigms 
Chapter 8: The Keys to the Success of Transformers 
Chapter 9: Generative AI Models
Chapter 10: Sources of Randomness
PART II: COMPUTER VISION
Chapter 11: Calculating the Number of Parameters
Chapter 12: The Equivalence of Fully Connected and Convolutional Layers 
Chapter 13: Large Training Sets for Vision Transformers
PART III: NATURAL LANGUAGE PROCESSING
Chapter 14: The Distributional Hypothesis
Chapter 15: Data Augmentation for Text 
Chapter 16: “Self”-Attention 
Chapter 17: Encoder- And Decoder-Style Transformers 
Chapter 18: Using and Finetuning Pretrained Transformers
Chapter 19: Evaluating Generative Large Language Models 
PART IV: PRODUCTION AND DEPLOYMENT
Chapter 20: Stateless And Stateful Training 
Chapter 21: Data-Centric AI 
Chapter 22: Speeding Up Inference
Chapter 23: Data Distribution Shifts 
PART V: PREDICTIVE PERFORMANCE AND MODEL EVALUATION
Chapter 24: Poisson and Ordinal Regression
Chapter 25: Confidence Intervals 
Chapter 26: Confidence Intervals Versus Conformal Predictions
Chapter 27: Proper Metrics 
Chapter 28: The K in K-Fold Cross-Validation
Chapter 29: Training and Test Set Discordance
Chapter 30: Limited Labeled Data 
Afterword
Appendix: Answers to Exercises 
Index

I hope you'll like it. And please don't hesitate to reach out if you have any questions!
Post image by Sebastian Raschka, PhD
There's currently a lot of talk about Mistral, but have you seen the latest QA-LoRA paper?

- LoRA (low-rank adaptation) is awesome because it adapts only a small, low-rank subset of parameters of a base LLM.
- QLoRA is awesome because it lowered memory requirements even further by quantizing the base model weights.
- QA-LoRA is even more awesome as it takes QLoRA a step further and also quantizes the LoRA (adapter) weights, avoiding a costly conversion of the quantized base model weights back into 16-bit when adding the adapter weights.

This concept is summarized in the annotated figure below.

A little nitpick: Table 2 shows that QA-LoRA is about 2x faster than QLoRA for fine-tuning. However, a much smaller number of parameters was used for the adapter weights. I believe it would have been fairer to use the same number of parameters for both when comparing their speeds.

More details in the “QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models“ paper: https://lnkd.in/g8VUW2uF

#LLMs #largelanguagemodels #AI
Post image by Sebastian Raschka, PhD

Related Influencers