Today I learned
  • Github
  • Linkedin
  • Nomad Life(Travel Blog)
  • About
Sign in Subscribe

ai

A collection of 2 posts
ai

Fast & Furious Tensor Parallelism: GPU Heist Gone Wrong

Splitting a model across 4 H200 GPUs was expected to 4x throughput, but instead resulted in 2.8x worse latency and 35% lower throughput. Without NVLink, tensor parallelism causes more communication overhead than speedup, so sometimes 1 GPU outperforms 4
02 Nov 2025 17 min read
ai

Honey, I Shrunk the Model: When Quantizing 70B Parameters Broke Everything

I tried to shrink a 70B model from FP16 to FP8 to fit in my 141GB of VRAM. Spoiler: it broke everything. After testing 6 models and 3 quantization formats, I discovered that a 30B model in full precision outperformed every quantized 70B. Turns out precision matters more than parameter count.
01 Nov 2025 9 min read
Page 1 of 1
Today I learned © 2025
Powered by Ghost