DeepSeek's new V3.2X model introduces a 'Sparse Attention' system that significantly cuts the cost of running long AI tasks by up to 50%, while OpenAI's Sora app gains massive user adoption, prompting new monetization strategies like revenue sharing for copyrighted content.
Takeways• DeepSeek's V3.2X model uses 'Sparse Attention' to cut AI task costs by 50%.
• OpenAI's Sora achieves massive user adoption, leading to planned revenue-sharing with rights holders.
• Mira Murati's Tinker and IBM's Granite 4.0 offer advanced, cost-efficient AI solutions.
DeepSeek has re-emerged with its V3.2X model, featuring a 'Sparse Attention' system that promises to reduce the cost of complex AI tasks by half, demonstrating untapped potential in transformer architecture for leaner operations. Meanwhile, OpenAI's Sora video creation app has achieved rapid user adoption, leading Sam Altman to plan a revenue-sharing model with rights holders to cover ballooning serving costs and incentivize content creation. Additionally, Mira Murati's new startup, Thinking Machines, launched Tinker, a developer-grade platform for fine-tuning LLMs, and IBM introduced Granite 4.0, a hybrid model family that dramatically cuts memory usage while maintaining strong performance.
DeepSeek's Cost Reduction
• 00:01:15 DeepSeek has launched an experimental model, V3.2X, which claims to reduce the cost of running long, complex AI tasks by as much as 50% through its 'Sparse Attention' system. This innovative system uses a 'lightning indexer' to identify important text sections and a 'fine-grained token selection system' to pinpoint key details within those sections, avoiding the processing of irrelevant information. Early tests show a 50% cost reduction for API calls dealing with long context windows, and the model is openly available for community testing.
• 00:02:46 The primary financial burden in AI is often not model training but daily operation, especially with the increasingly large context windows that consume significant cash. DeepSeek's approach demonstrates that the existing transformer architecture still holds potential for more efficient operation, which could be a significant game-changer for scaling AI economically. This development highlights a shift towards optimizing runtime costs rather than just focusing on training large models.
OpenAI Sora Adoption
• 00:03:23 OpenAI's AI video creation app, Sora, rapidly climbed to number one on the U.S. App Store, accumulating 164,000 installs in its first two days despite an invite-only launch. This debut surpassed the initial adoption rates of other major AI apps like Anthropic's Claude and Microsoft's Copilot. Sam Altman announced plans for monetization, including a revenue-sharing system with rights holders whose copyrighted characters or likenesses are used, alongside tools for granular control over content generation, framed as 'interactive fan fiction' to engage audiences and generate income.
Thinking Machines' Tinker
• 00:05:15 Mira Murati, former OpenAI CTO, has launched Thinking Machines and its first product, Tinker, a Python-based platform designed to simplify fine-tuning large language models while offering researchers granular control. Tinker offloads the compute to Thinking Machines' infrastructure, allowing developers to retain algorithmic control without the complexities of managing GPUs or multi-node orchestration. This has led to impressive results, such as Princeton's Godel team achieving 90.4% pass on miniF2F for theorem-proving models with self-correction, and Stanford's Rotskopf lab boosting chemical reasoning accuracy from 15% to 50% on IUPAC to formula conversion.
IBM Granite 4.0
• 00:07:36 IBM has introduced Granite 4.0, a new family of open AI models that integrate a hybrid design of 9 parts Mamba 2 and 1 part Transformer, which reduces memory usage by over 70%. This innovation allows companies to run heavy tasks with fewer expensive GPUs, translating directly into significant cost savings. The models, available in various parameter sizes, are built to handle large workloads up to 128,000 tokens and demonstrate strong performance on reasoning and function calling benchmarks, often outperforming other open models while being more cost-effective to operate. IBM also emphasizes trust, with Granite 4.0 being the first open model family with ISO certification for AI management and cryptographically signed releases, accessible across multiple platforms.