Best of Sebastian Raschka, PhD - LinkedIn Posts by Sebastian Raschka, PhD

Sebastian Raschka, PhD

Oct 14, 2024 at 3:41 PM

I just put together a short Jupyter notebook with tips and tricks for reducing memory usage when loading larger and larger models (like LLMs) in PyTorch. These approaches become increasingly important as we are trying to make this models work on limit hardware configurations.

Here's the link to the notebook: https://lnkd.in/gt-VMGfv

By the way, the examples aren't just for LLMs. These techniques apply to any model in PyTorch.

▿ Show more

Sebastian Raschka, PhD

Nov 8, 2024 at 2:41 PM

If you are looking for something to read/study this weekend, I added lots of LLM-related bonus from-scratch coding resources over the last few months (from implementing Llama 3.2 to preference tuning with DPO): https://lnkd.in/gBGdi36J\n\nI hope you find them useful! (My personal favorites are highlighted with a star.)

▿ Show more

Sebastian Raschka, PhD

Mar 6, 2024 at 3:05 PM

If you enjoy from-scratch implementations of self-attention and multi-head attention, I have compared and collected a few implementations for you here.
For readability, I particularly appreciate the compact one in the lower left, which features combined QKV matrices (courtesy of Md Rayed Bin Wahed).

Yes, PyTorch's scaled_dot_product_attention in the lower right, thanks to Flash Attention 2, is incredibly fast!

The code is available here on GitHub: https://lnkd.in/gcKY-yMd

▿ Show more

Sebastian Raschka, PhD

Apr 16, 2024 at 7:22 PM

It's been many months in the making, and I am excited to share that the print version of Machine Learning Q and AI is now finally available as of today! I am super excited about this!
This book covers 30 topics related to machine learning and AI. It's meant for those who are already familiar with the basics and are looking for more advanced topics in a concise and focused format.
I included the table of contents below:

Introduction
PART I: NEURAL NETWORKS AND DEEP LEARNING
Chapter 1: Embeddings, Representations, and Latent Space
Chapter 2: Self-Supervised Learning
Chapter 3: Few-Shot Learning
Chapter 4: The Lottery Ticket Hypothesis
Chapter 5: Reducing Overfitting with Data
Chapter 6: Reducing Overfitting with Model Modifications
Chapter 7: Multi-GPU Training Paradigms
Chapter 8: The Keys to the Success of Transformers
Chapter 9: Generative AI Models
Chapter 10: Sources of Randomness
PART II: COMPUTER VISION
Chapter 11: Calculating the Number of Parameters
Chapter 12: The Equivalence of Fully Connected and Convolutional Layers
Chapter 13: Large Training Sets for Vision Transformers
PART III: NATURAL LANGUAGE PROCESSING
Chapter 14: The Distributional Hypothesis
Chapter 15: Data Augmentation for Text
Chapter 16: “Self”-Attention
Chapter 17: Encoder- And Decoder-Style Transformers
Chapter 18: Using and Finetuning Pretrained Transformers
Chapter 19: Evaluating Generative Large Language Models
PART IV: PRODUCTION AND DEPLOYMENT
Chapter 20: Stateless And Stateful Training
Chapter 21: Data-Centric AI
Chapter 22: Speeding Up Inference
Chapter 23: Data Distribution Shifts
PART V: PREDICTIVE PERFORMANCE AND MODEL EVALUATION
Chapter 24: Poisson and Ordinal Regression
Chapter 25: Confidence Intervals
Chapter 26: Confidence Intervals Versus Conformal Predictions
Chapter 27: Proper Metrics
Chapter 28: The K in K-Fold Cross-Validation
Chapter 29: Training and Test Set Discordance
Chapter 30: Limited Labeled Data
Afterword
Appendix: Answers to Exercises
Index

I hope you'll like it. And please don't hesitate to reach out if you have any questions!

▿ Show more

Sebastian Raschka, PhD

Sep 30, 2023 at 12:35 PM

There's currently a lot of talk about Mistral, but have you seen the latest QA-LoRA paper?

- LoRA (low-rank adaptation) is awesome because it adapts only a small, low-rank subset of parameters of a base LLM.
- QLoRA is awesome because it lowered memory requirements even further by quantizing the base model weights.
- QA-LoRA is even more awesome as it takes QLoRA a step further and also quantizes the LoRA (adapter) weights, avoiding a costly conversion of the quantized base model weights back into 16-bit when adding the adapter weights.

This concept is summarized in the annotated figure below.

A little nitpick: Table 2 shows that QA-LoRA is about 2x faster than QLoRA for fine-tuning. However, a much smaller number of parameters was used for the adapter weights. I believe it would have been fairer to use the same number of parameters for both when comparing their speeds.

More details in the “QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models“ paper: https://lnkd.in/g8VUW2uF

#LLMs #largelanguagemodels #AI

▿ Show more

Sebastian Raschka, PhD

Best Posts by Sebastian Raschka, PhD on LinkedIn

Related Influencers

Unity

Misa Chien

Vinita Dalal

David Costa