Tags → #ml-ai

29 May 2026
Why VGGT Stays Stable Under Sparsty Unlike Later Models
An optimization-first analysis of why VGGT's simpler architecture produces more geometrically consistent 3D reconstructions from few images than its higher-capacity successors — examining training dynamics, basin formation, gradient alignment, and the cost of abandoning a shared backbone.
3 Mar 2025
Notes from Intro to RL
This is a summary from my understanding of reinforcement learning, based on the book Reinforcement Learning: An Introduction by Sutton and Barto, and supplemented with the YouTube series.
25 Feb 2025
How FPGA lost to the AI race
A post on how FPGA lost to NVIDIA. Not written by me.
23 Feb 2025
Flash Attention on Tenstorrent Hardware
Trying to understand how Flash Attention works on Tenstorrent and how it compares to CUDA
7 Sept 2024
Understanding AdaNorm
Understanding Adaptive Layer Normalization. First introduced in the DiT paper
31 Jul 2024
Understanding Squared attention
Just a brief explanation of how attention mechanism works. As well as the quadratic scaling of attention.
2 Jun 2024
Data Corpus of GPT-3 Training
Understanding the Text Corpus and Training Datasets of GPT-3
1 May 2024
Layernorm
layer normalization of GPT by Andrej Karpathy
1 May 2024
Layernorm - Karpathy
layer normalization of GPT by Andrej Karpathy
30 Mar 2024
Decoder Transformer
How I understand the Decoder Transformer in Generative Text Models