Swipe to see the full story...
At 620 million monthly users, calling a frontier model for every image recommendation isn't a strategy — it's a bill..
Pinterest CTO Matt Madrigal solved it by gutting Qwen3-VL's vision layer and rebuilding it with proprietary embeddings, cutting costs 90% and boosting accuracy 30%.Madrigal’s team has been heavily investing in customizing open-source models “foundationally in-house.” “If you've got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size,” Madrigal explained in a recent VB Beyond the Pilot podcast. How Pinterest customized Qwen for visual discoveryPinterest, which has around 620 million monthly active users, has long applied open source models for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP..
The company fine-tuned its own Pin CLIP on the latter, incorporating proprietary visual embeddings and image metadata. Pinterest’s conversational shopping assistant, Navigator 1, was built on Qwen3-VL and customized in “pretty significant” ways..
Madrigal’s team essentially “ripped out” Qwen’s vision encoder layer and fine-tuned the model on proprietary multimodal embeddings..
Detailed coverage and expert insights available on our main news hub.
Read Full Article