Pinterest cut AI costs 90% by gutting a frontier model's vision layer

Business Swipe to see the full story...

Key Highlight

At 620 million monthly users, calling a frontier model for every image recommendation isn't a strategy — it's a bill..

Key Highlight

Pinterest CTO Matt Madrigal solved it by gutting Qwen3-VL's vision layer and rebuilding it with proprietary embeddings, cutting costs 90% and boosting accuracy 30%.Madrigal’s team has been heavily investing in customizing open-source models “foundationally in-house.” “If you've got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size,” Madrigal explained in a recent VB Beyond the Pilot podcast. How Pinterest customized Qwen for visual discoveryPinterest, which has around 620 million monthly active users, has long applied open source models for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP..

Key Highlight

The company fine-tuned its own Pin CLIP on the latter, incorporating proprietary visual embeddings and image metadata. Pinterest’s conversational shopping assistant, Navigator 1, was built on Qwen3-VL and customized in “pretty significant” ways..

Key Highlight

Madrigal’s team essentially “ripped out” Qwen’s vision encoder layer and fine-tuned the model on proprietary multimodal embeddings..

Want the full analysis?

Detailed coverage and expert insights available on our main news hub. Read Full Article