Best of Jim Fan - LinkedIn Posts by Jim Fan

Jim Fan

Dec 6, 2023 at 3:10 PM

This may be Apple's biggest move on open-source AI so far: MLX, a PyTorch-style NN framework optimized for Apple Silicon, e.g. laptops with M-series chips. The release did an excellent job on designing an API familiar to the deep learning audience, and showing minimalistic examples on OSS models that most people care about: Llama, LoRA, Stable Diffusion, and Whisper.

I expect no less from my former colleague Awni Hannun, spearheading this effort at Apple. Thanks for the early Christmas gift! 🎄🎁

MLX source: https://lnkd.in/g5pNrZYF
Well-documented, self-contained examples: https://lnkd.in/gqp4pCbY

▿ Show more

Jim Fan

Jan 9, 2025 at 4:16 PM

New visual Turing Test: is this CGI or Sora or Real?

Answer: I have verified from reliable source. This is a real robot demo! There're quite a few in-the-wild videos taken from different angles by a random crowd in Shenzhen - even a pic of the robot falling and engineers rescuing it (see in reply). The controller is a neural net trained in Isaac simulator using reinforcement learning, and then transferred zero-shot to the physical robot. Reward engineering is all you need.

Walking gait's got swag but we need these robots to go fire-fighting asap!

▿ Show more

Jim Fan

Jun 10, 2024 at 4:12 PM

Everyone's expecting a reborn Siri at WWDC today. Well, Apple already published a paper on it that disclosed way more details than what we expect from Apple. It's called “Ferret-UI“, a multimodal vision-language model that understands icons, widgets, and text on iOS mobile screen, and reasons about their spatial relationships and functional meanings.

Example questions you can ask Ferret-UI:
- Provide a summary of this screenshot;
- For the interactive element [bbox], provide a phrase that best describes its functionality;
- Predict whether the UI element [bbox] is tappable.

With strong screen understanding, it's not hard to add action output to the model and make it a full-fledged on-device assistant.

The paper even talks about details of the dataset and iOS UI benchmark construction. Extraordinary openness from Apple! They are truly redefining their AI research branch.

The paper was silently released with little PR fanfare in April. You still have enough time to warm up before WWDC: https://lnkd.in/dzZiPN_f

▿ Show more

Jim Fan

Jan 27, 2024 at 4:49 PM

Today is the 2-year birthday of InstructGPT, the mother of all modern LLMs. The AI circle has a time dilation effect - can't believe it's been 2 yrs! InstructGPT laid out the canonical recipe of pre-training -> supervised finetuning -> RLHF, a strategy that everyone is still following till this day (with a bit of variations like DPO).

InstructGPT was likely OpenAI's last paper that detailed how they train the frontier models. Looking back, I think it marked the watershed moment when LLMs finally went from an academic curiosity (GPT-3) to an impactful product (ChatGPT).

Some fun facts:
- InstructGPT didn't invent RLHF. In fact, the blog linked to the OG RLHF work, also done by OpenAI's team in 2017. It was conceived to solve hard-to-specify tasks in simulated robotics. RLHF asked a human annotator to give 900 binary preferences, which helped a simple “hopper“ robot learn backflips in sim: https://lnkd.in/gNKgkYfg
- InstructGPT was published at NeurIPS 2022 in New Orleans! I was presenting MineDojo at the conference and was quite surprised to see the OpenAI poster there.
- The models came in 3 sizes: 1.3B, 6B, 175B. The labelers strongly preferred Instruct-1.3B to the old, prompt-engineered GPT-3-175B. Microsoft Phi-1, one of the best-known small LMs, was also 1.3B.
- InstructGPT is a master class on how to present your research. The 3-step figure is crystal clear and becomes one of the most iconic visuals in AI. The intro section has no BS and gives 8 take-home messages in bold. The discussions on limitations and bias are grounded and honest.

Your weekend read: https://lnkd.in/g7tq_zEJ

▿ Show more

Jim Fan

Mar 29, 2024 at 4:51 PM

This sakura video is fully described by 262 characters, implemented as shader code. A text2video model that achieves maximal possible compression will be able to recover this program approximately in its weights, synthesized through denoising and gradient descent.

It's fascinating to think that Sora is learning to do graphics coding in the latent space of a transformer, though it's yet far from optimal.

Source code:
vec3 p,q=vec3(-.1,.65,-.6);for(float j,i,e,v,u;i++<130.;o+=.007/exp(3e3/(v*vec4(9,5,4,4)+e*4e6))){p=q+=vec3((FC.xy-.5*r)/r.y,1)*e;for(j=e=v=7.;j++<21.;e=min(e,max(length(p.xz=abs(p.xz*rotate2D(j+sin(1./u+t)/v))-.53)-.02/u,p.y=1.8-p.y)/v))v/=u=dot(p,p),p/=u+.01;}

From: https://lnkd.in/giQU5V6G

▿ Show more

Jim Fan

Best Posts by Jim Fan on LinkedIn

Related Influencers

Paulo Chiodi

Sol Rashidi, MBA

Stephen A. Schwarzman

Calm