The AI Landscape in January 2025

The AI Landscape in January 2025

This has been a busy start to the new year. These are the major developments this month that I think are worth mentioning.

  1. rStar-Math
    • This is Microsoft’s new method for using monte-carlo tree search (MCTS) to finetune small language models (SLMs) to achieve start-of-the-art SOTA performanc on math problems.
  2. Deepseek-R1
    • This is new LLM has over 600B parameters and was almost exclusively trained with reinforcement learning using MCTS. It’s performance is directly comparable to o1 and an order of magnitude less expensive per million tokens.
  3. Sky-T1
    • The Berkeley Sky Computing Lab released this incredible research showing that you can train your own o1-preview model with only $405 in computing costs.
  4. Coconut
    • Research from Meta that came up with a way to enable Chain-of-Though (CoT) to occur in the latent space before translation back into natural language tokens. There are pros and cons to this approach. Biggest con is that it is now even less explainable/interepretable than before.
  5. Reddit Thread on Deepseek-R1-Zero
    • “The AI world is losing its mind over DeepSeek-R1-Zero, a model that skipped supervised fine-tuning (SFT) entirely and learned purely through reinforcement learning (RL). Unlike its sibling R1—which uses some SFT data to stay “human-readable”—R1-Zero’s training mirrors AlphaZero’s trial-and-error self-play. The result? Jaw-dropping performance (AIME math scores jumped from 15.6% → 86.7%) paired with bizarre, uninterpretable reasoning. Researchers observed “aha moments” where it autonomously rechecked flawed logic mid-process and allocated more compute to harder problems—without human guidance. But here’s the kicker: its outputs are riddled with garbled language mixes (e.g., Chinese/English spaghetti code) and logic leaps that even its creators can’t fully explain.”