My photography gallery has 370 images spanning a decade of shooting — landscapes, urban decay, wildlife, portraits, and everything in between. For years they sat in a flat grid with no way to filter or browse by subject. I wanted category tags, but manually tagging 370 photos sounded like a weekend I'd never get back.
Enter OpenCLIP — an open-source implementation of OpenAI's CLIP model that maps images and text into the same embedding space. The idea: describe each category in plain English, encode both the descriptions and the photos, then match them by cosine similarity.
I defined 10 categories, each with 3-4 descriptive text prompts:
Multiple prompts per category improves accuracy — the model averages the text embeddings, giving a more robust representation than a single phrase.
The ViT-B-32 model processed all 370 images in 44 seconds on CPU. No GPU needed, no API calls, zero cost.
Each image gets multi-label tags (a photo can be both "landscape" and "winter"), with per-category similarity thresholds tuned to get roughly 1.3 tags per image on average. The output is a simple JSON mapping filename to tag array:
{
"1": ["nature", "silhouette"],
"15": ["abandoned", "bw"],
"50": ["architecture", "bw", "urban"],
"300": ["wildlife"]
}
Spot-checking against my own judgement, I'd estimate 85-90% accuracy. The model handles clear cases well — a sheep in a field is "wildlife", a monochrome street scene is "bw" + "urban". Edge cases like a painted traffic cone on a street getting tagged "architecture" are the main failure mode.
The beauty of the approach is that re-running is trivial. Tweak a threshold, adjust a prompt, run again in under a minute.
The tags JSON feeds into the gallery page via JavaScript. Filter buttons with emoji and counts sit above the photo grid — click "🐾 Wildlife (78)" and the gallery filters instantly. The existing Isotope layout handles the animation.
The tagging script lives in the repo at scripts/tag_photos.py — if I add new photos, I just re-run it.
The full code is on GitHub, and you can see the result on the Photography page.