Marshall's Personal Library · Google Stadia 2019–2023
REST IN PEACE STADIA. THE HATERS KILLED YOU.
Every game cover in this library was identified using
Meta's Llama 3.2 11B Vision model
running entirely on local hardware — a custom Python script processed all 314
.webp cover images one by one,
asking the model to return the game title and release year as structured JSON.
To fit the 11 billion parameter model onto a 12 GB GPU, it was loaded with 4-bit NF4 quantisation via HuggingFace's BitsAndBytes library — shrinking VRAM usage from ~22 GB down to ~6.7 GB, leaving headroom for the vision encoder. The RTX 3060 ran at 100% utilisation for the entire batch.
Each image is sliced into 560×560 px tiles by the vision encoder (a ViT), encoded into ~1,600 image tokens, then fed alongside the text prompt to the language model. The model outputs roughly 30 tokens in ~4 seconds (~8 tok/s) per cover — all 314 images completed in about 21 minutes. Accuracy was high confidence on nearly every title.
Llama 3.2 11B Vision 4-bit NF4 quant BitsAndBytes HuggingFace Transformers PyTorch + CUDA 12.4 NVIDIA RTX 3060 12 GB Python
No games match your search.