2 min readfrom Machine Learning

[R] Depth-first pruning transfers: GPT-2 → TinyLlama with stable gains and minimal loss

TL;DR:
Removing the right layers (instead of shrinking all layers) makes transformer models ~8–12% smaller with only ~6–8% quality loss, and this now works across architectures (GPT-2 + TinyLlama) with near-zero variance.

I’ve been experimenting with depth-first pruning — removing entire layers based on sensitivity rather than shrinking model width.

Started on GPT-2…
Just validated it on TinyLlama 1.1B with full 3-seed replication.

🧠 Results (TinyLlama 1.1B)

Depth-First Pruning (3 seeds) Config Layers Reduction Test PPL Ratio ------------------------- ------- ---------- -------------- ------ Baseline (22L) 22 0% 9.19 1.000 20L (remove L4 + L11) 20 8.0% 9.72 ± 0.01 1.057 19L (staged pruning) 19 12.0% 9.94 ± 0.01 1.081 

⚡ What’s interesting

  • Extremely stable → ±0.01 PPL across seeds
  • Transfers across GPT-2 and Llama-family models
  • Keeps quality within ~6–8% while reducing size
  • Produces real inference speedups, not just parameter savings

🧠 Key insight

Not all transformer layers matter equally.

Removing the least important layers:

  • preserves useful structure
  • avoids degrading all layers
  • beats uniform width pruning

🔥 Takeaway

👉 Structure > uniform scaling

Instead of:
“make every layer smaller”

Do:
👉 “remove the layers that matter least”

⚠️ Notes

  • Not a new architecture
  • Not claiming SOTA
  • Just a clean, reproducible efficiency method

🧠 Bigger picture

This is part of a broader direction I’m exploring:

  • Seed → architecture discovery (finds efficient models)
  • Magnus → memory-first reasoning system

Goal:

👉 smaller, structured systems instead of bigger models

submitted by /u/califalcon
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#financial modeling with spreadsheets
#rows.com
#row zero
#cloud-based spreadsheet applications
#real-time data collaboration
#no-code spreadsheet solutions
#real-time collaboration
#Depth-first pruning
#GPT-2
#TinyLlama
#transformer models
#quality loss
#architectures
#sensitivity
#inference speedups
#layers reduction
#uniform width pruning
#architecture discovery
#stability
#structured systems