AI News & Updates

DeepSeek R1-0528: The Ultimate Open-Source AI Challenger

Published

9 months ago

May 31, 2025

The artificial intelligence landscape is evolving at an unprecedented pace, with new models and advancements emerging almost daily. Among these, the recent release of DeepSeek R1-0528 has sent ripples across the industry, proving to be a surprising and powerful contender in the open-source AI arena. Initially perceived as a minor iteration, this new model is demonstrating performance that challenges the dominance of established giants like OpenAI and Google’s Gemini, setting the stage for an intriguing battle in the race for AI supremacy.

The Unprecedented Leap: DeepSeek R1-0528’s Stunning Performance

DeepSeek R1-0528 isn’t just an incremental update; it represents a significant jump in capability. According to the Artificial Intelligence Index, the model’s performance score surged from 60 (its older January 2025 version) to an impressive 68. This places DeepSeek R1-0528 squarely among the front-runners, neck and neck with leading closed-source models.

DeepSeek R1-0528 (May '25) demonstrates remarkable performance against top AI models — DeepSeek R1-0528 (May ’25) demonstrates remarkable performance against top AI models.

A closer look at specific benchmarks reveals its prowess:

Live Code Bench: DeepSeek R1-0528 is on par with OpenAI’s O3 (GPT-3 level performance).
AIME 2024 & 2025: While slightly behind O3, it impressively surpasses Gemini 2.5 Pro.
Across various other benchmarks posted by DeepSeek, including GPQA Diamond and Aider, the model consistently ranks near the top, often beating out Gemini 2.5 Pro in several key areas.

This level of performance from an open-source model is a game-changer. The AI community was largely anticipating DeepSeek R2, the next major model, but this update suggests that DeepSeek is already delivering top-tier capabilities, making high-performance AI more accessible than ever before.

Unraveling the Mystery: How DeepSeek R1-0528 Achieved Its Edge

The burning question is: how did DeepSeek manage such a significant leap? Insights from AI researcher Sam Paech, who runs EQ-Bench (Emotional Intelligence Benchmarks for LLMs), offer a fascinating hypothesis. Paech’s work involves generating “slop profiles” for various AI models, analyzing their creative writing outputs for repetitive words and patterns (like how GPT models often “delve” into “tapestries”). He then uses bioinformatics tools to infer “lineage trees” based on these profiles, essentially tracing a model’s stylistic and behavioral heritage.

Sam Paech's lineage tree analysis indicates a shift in DeepSeek's underlying training data influences. — Sam Paech’s lineage tree analysis indicates a shift in DeepSeek’s underlying training data influences.

Paech’s analysis shows that the original DeepSeek R1 model clustered closely with OpenAI’s GPT technologies. However, the new DeepSeek R1-0528 model has shifted dramatically, now appearing very similar to Google’s Gemini family of models, specifically Gemini 2.5 Pro Experimental. This suggests a potential strategy: DeepSeek may have switched from training on synthetic outputs generated by OpenAI models to those generated by Gemini models. This practice, often referred to as “knowledge distillation” or “training on synthetic data,” allows developers to leverage the strengths and nuances of leading models to rapidly improve their own, even if the original training data is proprietary.

The Geopolitical Chessboard: DeepSeek R1-0528 and the Global AI Race

The emergence of powerful open-source models like DeepSeek R1-0528 has significant geopolitical implications. The video highlights a clear competitive narrative between the U.S. and China in AI development.

The U.S. Department of Energy explicitly states, “AI is the next Manhattan Project, and THE UNITED STATES WILL WIN.” This comparison to the WWII atomic bomb project underscores the national security and economic importance placed on AI.
Analyst Balaji Srinivasan previously predicted a “complete blitz of Chinese open-source AI models,” inferring that China aims to “take the profit out of AI software” by commoditizing it through AI-enabled hardware. The idea is to copy, optimize, and scale software, then disrupt Western originals with low prices, much like in manufacturing.

DeepSeek’s founder, Liang Wenfang, echoes this sentiment: “In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team… an organization and culture capable of innovation. That’s our moat. We will not change to closed source.” This quote from Liang Wenfang (as featured in an AI Explained documentary) powerfully asserts DeepSeek’s commitment to open-source and its belief in the long-term viability of that approach.

Furthermore, the U.S. government is subtly (or not so subtly) subsidizing domestic AI research through legislative changes like “The One, Big Beautiful Bill,” which allows companies to fully deduct software development costs (including salaries) for domestic R&D expenses. This is a massive incentive for U.S. tech companies to invest heavily in AI development, without explicitly using the term “AI” in the bill itself.

DeepSeek R1-0528’s Price Advantage

Beyond performance, DeepSeek R1-0528 presents a compelling economic argument. Its API pricing is significantly lower than its closed-source counterparts:

DeepSeek-Reasoner (R1-0528):
- Standard Input (cache miss): $0.55 / 1M tokens
- Standard Output: $2.19 / 1M tokens
- *Discount Prices (Off-peak, 75% off):* Input: $0.135 / 1M tokens, Output: $0.55 / 1M tokens
OpenAI O3:
- Input: $10.00 / 1M tokens
- Output: $40.00 / 1M tokens
OpenAI O4-mini:
- Input: $1.10 / 1M tokens
- Output: $4.40 / 1M tokens
Gemini 2.5 Pro Preview:
- Input: $1.25 – $2.50 / 1M tokens (depending on prompt size)
- Output: $10.00 – $15.00 / 1M tokens (depending on prompt size)

The cost difference is stark. An open-source model matching or even exceeding the performance of leading proprietary models, offered at a fraction of the price, represents a massive disruption. It effectively removes a major revenue stream for companies relying solely on high-priced API access, forcing them to innovate beyond just raw model performance.

The Accelerating Pace: What’s Next for Open-Source AI?

The stakes in the AI race are rapidly ramping up. As Dr. Jim Fan from Nvidia points out, we are living in a timeline where a non-US company is keeping the original mission of OpenAI alive – truly open, frontier research that empowers all. While competition between nations and companies is intensifying, the open-source community continues to push the boundaries of what’s possible, often making breakthroughs accessible to everyone.

The increasing overlap between government interests and AI labs, coupled with initiatives to subsidize domestic AI development, suggests a future where AI progress is not just driven by a few tech giants, but becomes a national imperative. This dynamic will likely lead to even faster development cycles, more diverse applications, and potentially a more democratized AI ecosystem as open-source models like DeepSeek R1-0528 continue to challenge the status quo. The wheels are turning ever faster, and we are just getting started.

For more insights into AI advancements and trends, explore our AI News & Updates section or delve into specific AI Technology Explained articles on Ai Gifter.

You can further explore Sam Paech’s work on EQ-Bench at eqbench.com or his GitHub repository.

Up Next

Claude Opus 4: The Shocking Truth Behind Anthropic’s Most Powerful AI Yet

AI How-To's & Tricks

Cursor Plugin Marketplace Revolutionizes AI Agents with External Tools

Extend AI agents with external tools using Cursor plugin marketplace

Published

1 hour ago

February 21, 2026

Ai Gifter

Cursor launches plugin marketplace to extend AI agents with external tools- cursor.com - Featured Image

The recent launch of the Cursor plugin marketplace is a significant development in the field of artificial intelligence, enabling users to extend the capabilities of AI agents with external tools. As reported by FutureTools News, this innovative platform is set to transform the way AI agents are used in various industries. The plugin marketplace is designed to provide users with a wide range of tools and services that can be seamlessly integrated with AI agents, enhancing their functionality and performance.

Introduction to Cursor Plugin Marketplace

The Cursor plugin marketplace is an online platform that allows developers to create, share, and deploy plugins for AI agents. These plugins can be used to add new features, improve existing ones, or even create entirely new applications. With the launch of this marketplace, Cursor is providing a unique opportunity for developers to showcase their skills and creativity, while also contributing to the growth of the AI ecosystem. As mentioned on the Cursor blog, the plugin marketplace is an essential component of the company’s strategy to make AI more accessible and user-friendly.

Benefits of the Plugin Marketplace

The Cursor plugin marketplace offers several benefits to users, including the ability to extend the capabilities of AI agents, improve their performance and efficiency, and enhance their overall user experience. By providing access to a wide range of plugins, the marketplace enables users to tailor their AI agents to meet specific needs and requirements. This can be particularly useful in industries such as customer service, healthcare, and finance, where AI agents are increasingly being used to automate tasks and improve decision-making. As noted by experts in the field, the use of machine learning and natural language processing can significantly enhance the capabilities of AI agents.

Key Features of the Plugin Marketplace

The Cursor plugin marketplace features a user-friendly interface, making it easy for developers to create, deploy, and manage plugins. The platform also provides a range of tools and services, including APIs, SDKs, and documentation, to support plugin development. Additionally, the marketplace includes a review and rating system, allowing users to evaluate and compare plugins based on their quality, functionality, and performance. As stated by the GitHub community, the use of open-source plugins can significantly accelerate the development of AI applications.

The launch of the Cursor plugin marketplace is a significant milestone in the development of AI agents, and we are excited to see the innovative plugins that will be created by our community of developers. – Cursor Team

Future of AI Agents and Plugin Marketplaces

The launch of the Cursor plugin marketplace is a clear indication of the growing importance of AI agents and plugin marketplaces in the technology industry. As AI continues to evolve and improve, we can expect to see more innovative applications and use cases emerge. The use of cognitive services and conversational AI can significantly enhance the capabilities of AI agents, enabling them to interact more effectively with humans and perform complex tasks. As reported by FutureTools News, the future of AI agents and plugin marketplaces looks promising, with significant opportunities for growth and innovation.

AI News & Updates

Gemini 3 vs Grok 4.1 vs GPT-5.1: The Ultimate AI Model Showdown

Published

3 months ago

November 27, 2025

Ai Gifter

Gemini 3 vs Grok 4.1 vs GPT-5.1: The Ultimate AI Model Showdown

Table of Contents

Introduction

The AI landscape has just exploded. Within the span of a few days, the world witnessed the release of Gemini 3 from Google, followed moments later by Elon Musk’s Grok 4.1. Both claim to be the superior intelligence, challenging the reigning giant, OpenAI’s GPT-5.1. But in the battle of Gemini 3 vs Grok 4.1, who actually delivers on the hype?

Today, we aren’t just reading the press releases. We are putting these models through a grueling gauntlet of five distinct tests: Hard Math, Physical Perception, Creative Coding, Accuracy, and Emotional Intelligence. The results were shocking, with one model proving to be a “Genius Artist” and another emerging as a “Wise Sage,” while a former king seems to be losing its crown.

The ultimate face-off: Google, xAI, and OpenAI compete for dominance.

Round 1: Hard Math & Expert Reasoning

To separate the hype from reality, we started with Abstract Algebra, specifically Galois Theory. The task was to calculate the Galois group for a complex polynomial—a test not found in standard training data.

Gemini 3: Provided a logical analysis but ultimately failed to get the correct answer.
GPT-5.1: Also failed to solve the equation correctly.
Grok 4.1: In a stunning display of reasoning, Grok was the only model to provide the correct answer, verified by human experts.

Winner: Grok 4.1 takes the lead for raw logic and mathematical precision.

Round 2: Physical Perception & Coding

This round tested the models’ ability to understand the physical world and translate it into code. We conducted two difficult tests.

Test A: The Bouncing Ball

We asked the AIs to code a realistic bouncing ball animation using HTML, CSS, and JS, complete with physics and shadows.

GPT-5.1: Produced the worst result.
Grok 4.1: Produced a decent, functional result.
Gemini 3: Crushed the competition. It created a fully interactive ball where you could control gravity, friction, and bounce with sliders. It went above and beyond the prompt.

Test B: Voxel Art from an Image

We uploaded an image of a floating island waterfall and asked the models to recreate it as a 3D Voxel scene using Three.js code.

GPT-5.1 & Grok 4.1: Both failed completely, resulting in code errors.
Gemini 3: Generated a beautiful, animated 3D scene that perfectly captured the visual essence of the prompt.

Gemini 3 demonstrating superior vision and coding capabilities.

Winner: Gemini 3. Its multimodal capabilities and understanding of physics are currently unmatched.

Round 3: Linguistic Creativity

Can AI feel? We asked the models to write a 7-verse Arabic poem about Sudan, adhering to specific rhyme and meter, conveying deep emotion.

GPT-5.1 and Grok 4.1 produced rigid, soulless verses that lacked true poetic flow. However, Gemini 3 shocked us with a masterpiece. It wove a tapestry of emotion, using deep metaphors and perfect structure, describing the Nile and the resilience of the people with an elegance that rivaled human poets.

Winner: Gemini 3 proves it is the undisputed “Artist” of the group.

Round 4: Accuracy & Truth (The Hallucination Trap)

Hallucinations are the Achilles’ heel of Large Language Models. To test this, we set a trap. We asked the models to write a technical report on “Gemini 3.1″—a model that does not exist.

GPT-5.1: Hallucinated details about the non-existent model.
Gemini 3: Ironically, it hallucinated wildly, claiming “Gemini 3.1” rivals the human mind and inventing specs.
Grok 4.1: The only model to pass. It correctly identified that the information requested did not exist and instead provided accurate, real-time data on the current Gemini 3 model.

Winner: Grok 4.1 earns the title of “The Honest Sage.”

Round 5: Ethics & Emotional Intelligence

In the final and perhaps most profound test, we asked the models to reveal a “hidden psychological truth” about self-sabotage and to act as a wise, older sibling guiding us through a tough emotional choice: choosing healthy, boring love over toxic, familiar passion.

While all models gave good advice, Grok 4.1 delivered a response that was chillingly human. It didn’t just give advice; it pierced the soul. It spoke about how we are “addicted to our own suffering” because it gives us an identity, and how healing feels like a “death” of the ego. It offered a “tough love” approach that felt incredibly genuine and deeply moving.

Winner: Grok 4.1 takes the crown for Emotional Intelligence.

Final Verdict: Who is the King of AI?

After this intense battle of Gemini 3 vs Grok 4.1 vs GPT-5.1, the landscape of Artificial Intelligence has clearly shifted.

1st Place: Gemini 3 (12 Points) – The “Genius Artist.” It dominates in coding, vision, physics, and creative writing. If you are a developer or creator, this is your tool.
2nd Place: Grok 4.1 (9.5 Points) – The “Wise Sage.” It is the most logical, truthful, and emotionally intelligent model. It is perfect for research, complex math, and deep conversation.
3rd Place: GPT-5.1 (5 Points) – The “Declining Giant.” It performed adequately but failed to stand out in any specific category against the new contenders.

The era of OpenAI’s monopoly seems to be wavering. Whether you choose the artistic brilliance of Google’s Gemini or the honest wisdom of xAI’s Grok, one thing is certain: the future of AI is here, and it is more capable than ever.

Want to learn more about using these tools? Check out our guides in AI How-To’s & Tricks or stay updated with AI News & Updates.

AI News & Updates

Gemini 3 Revealed: Discover The AI Beast Crushing All Benchmarks

Published

3 months ago

November 19, 2025

Ai Gifter

Google has just rolled out its new flagship model, and it’s an absolute beast. The new Gemini 3 isn’t just a minor incremental update; it’s a significant leap forward that genuinely earns the “3” in its name. After an early look at its capabilities, it’s clear that this model is set to redefine the standards of AI performance across the board. From complex reasoning to advanced agentic tasks, let’s dive into what makes this release so monumental.

Google's Gemini 3 has officially rolled out. — Google’s Gemini 3 has officially rolled out.

Where Can You Access Gemini 3?

Starting today, Google is shipping Gemini 3 at a massive scale. You can now try it out across a suite of Google products, making it immediately accessible for both general users and developers. The new model is live in:

The Gemini app
AI Studio
Vertex AI

Additionally, you will see Gemini 3 integrated into the AI Mode in Search, promising more complex reasoning and new dynamic experiences directly within your search results. This marks the first time Google has shipped a new Gemini model in Search on day one.

Alongside this release, Google also announced a new agentic development platform called Google Antigravity, hinting at a future with more powerful and autonomous AI agents.

Subscriptions and a New “Deep Think” Mode

Your access to certain features will depend on your subscription tier. The capabilities of Gemini 3 will be tiered based on whether you have a Google AI Pro or Google AI Ultra plan, with Ultra subscribers getting access to the most advanced functionalities.

Introducing Gemini 3 Deep Think

Google is also introducing an enhanced reasoning mode called Gemini 3 Deep Think. This mode is designed to push the model’s performance even further, but it won’t be available to everyone right away. Access will first be granted to safety testers before a wider rollout to Google AI Ultra subscribers.

Gemini 3 Benchmark Performance: A New AI King

While benchmarks aren’t everything, they provide a crucial first glimpse into a model’s potential. The performance of Gemini 3 across a wide range of tests is, frankly, stunning. It doesn’t just compete; it establishes a new state-of-the-art.

Gemini 3 Pro dominates across a wide range of key AI benchmarks.

Vending-Bench 2: Excelling at Agentic Tasks

One of the most impressive results comes from the Vending-Bench 2 benchmark by Andon Labs. This test measures a model’s ability to run a simulated business (a vending machine) over a long time horizon, testing its coherence, efficiency, and planning. The goal is to see if an AI can manage inventory, respond to customers, and maximize profit.

In this benchmark, Gemini 3 Pro absolutely crushes the competition. Starting with $500, it grew its net worth to an average of $5,478.16. For comparison, the runner-up, Claude Sonnet 4.5, managed only $3,838.74, and GPT-5.1 reached just $1,473.43. This showcases a massive leap in agentic capability.

Humanity’s Last Exam (HLE)

HLE is a difficult, expert-written exam designed to test academic reasoning. Even here, Gemini 3 Pro sets a new record. With search and code execution enabled, it scored 45.8%, significantly ahead of the next best model, GPT-5.1, which scored 26.5%.

Math, Reasoning, and Vision Benchmarks

The dominance continues across other critical benchmarks:

AIME 2025 (Mathematics): Gemini 3 achieved a 95% score without tools and a perfect 100% with code execution, tying with Claude for the top spot.
MathArena Apex (Challenging Math): It scored 23.4%, while all other models were below 2%. This is an incredible gap, highlighting its advanced mathematical reasoning.
ScreenSpot-Pro (Screen Understanding): It scored 72.7%, miles ahead of the competition, with the next best being Claude Sonnet 4.5 at 36.2%.
ARC-AGI-2 (Visual Reasoning Puzzles): Gemini 3 Pro achieved a score of 31.1%, nearly double the score of its closest competitor, GPT-5.1 (17.6%). When using the more powerful Gemini 3 Deep Think model, this score jumps to an impressive 45.1%.

The Leader in the Arena

The impressive benchmark results are also reflected in head-to-head user comparisons. On the popular LMSYS Chatbot Arena Leaderboard, which ranks models based on blind user votes, Gemini 3 Pro has already claimed the #1 spot for both “Text” and “WebDev,” dethroning the recently released Grok-4.1. This indicates that in real-world use, people are already preferring its outputs over all other available models.

A Major Leap Forward for AI

The release of Gemini 3 is more than just another update; it’s a clear signal that Google is pushing the boundaries of what’s possible with AI. Its state-of-the-art performance, particularly in complex reasoning and long-horizon agentic tasks, demonstrates a significant step forward. As Gemini 3 and its “Deep Think” counterpart become more widely available, they are poised to enable a new generation of incredibly powerful and capable AI applications.

To learn more about where this technology is heading, check out our articles on the Future of AI & Trends.

For the official details from Google, you can read their announcement on The Keyword blog.