Wobble
Feb 2025 – Mar 2025

Wobble

A reinforcement learning bot that solves Wordle, with a web frontend for users to test and compete against the bot.

PythonAstroFastAPIReactVercel

Overview

Wobble – RL Wordle™ Solver is a side project where we trained a reinforcement learning agent to play Wordle-style puzzles. The idea was simple: instead of hardcoding strategies, let the bot learn how to guess words by playing thousands of games against itself.

It consists of a Python backend (training + API) and a lightweight Astro frontend where you can either play manually or watch the bot make decisions in real time.


Problem

Wordle looks trivial at first, but playing it well involves making good decisions with incomplete information. Most solutions rely on fixed heuristics like letter frequency or elimination rules.

We wanted to explore a different angle:

  • Can an agent learn how to play purely from feedback?
  • How far can you push basic reinforcement learning without jumping into deep learning?

Solution

We built a Q-learning agent that plays repeated Wordle games and gradually improves based on rewards.

Instead of making the agent pick from thousands of possible words directly, we simplified the problem:

  • The agent chooses how to guess (strategy), not the exact word
  • It gets feedback (greens and yellows) after each guess
  • It updates its decisions based on what worked and what didn’t

Over time, it shifts from random guessing to favouring strategies that lead to better outcomes.

The backend handles training and exposes an API, while the frontend provides a simple interface to interact with the system and observe the agent’s behaviour.


Developer Notes

This project was built collaboratively with Shreenithi, and most of the work ended up being less about writing code and more about figuring out how to make RL behave in a controlled way.

A few things that stood out:

  • A lot of trial and error

    • The first few versions either didn’t learn anything or learned the wrong things
    • Small changes in reward logic or exploration settings completely changed behaviour
  • RL is not straightforward to debug

    • There’s no clear failure signal
    • You end up watching trends like win rate and guessing patterns to understand what’s happening
  • We had to simplify aggressively

    • The original idea was more ambitious, but the state space quickly became unmanageable
    • Reducing the problem to “pick a strategy, not a word” made it workable
  • Progress wasn’t linear

    • Sometimes the agent improved, then got worse after more training
    • Tuning it felt more like experimentation than engineering
  • Most of the learning came from breaking things

    • Reward systems that seemed reasonable led to weird behaviour
    • Fixing those issues was where most insights came from

Overall, this project was less about building a perfect solver and more about understanding how reinforcement learning behaves in practice, especially in a small but non-trivial problem space.