Feb 2025 – Mar 2025

Wobble

Developed a reinforcement learning-based Wordle solver with a competitive web interface for benchmarking human and agent performance.

PythonAstroFastAPIReactVercel

Source Code Website

Overview

Wobble – RL Wordle™ Solver is a side project where we trained a reinforcement learning agent to play Wordle-style puzzles. The idea was simple: instead of hardcoding strategies, let the bot learn how to guess words by playing thousands of games against itself.

It consists of a Python backend (training + API) and a lightweight Astro frontend where you can either play manually or watch the bot make decisions in real time.

Problem

Wordle looks trivial at first, but playing it well involves making good decisions with incomplete information. Most solutions rely on fixed heuristics like letter frequency or elimination rules.

We wanted to explore a different angle:

Can an agent learn how to play purely from feedback?
How far can you push basic reinforcement learning without jumping into deep learning?

Solution

We built a Q-learning agent that plays repeated Wordle games and gradually improves based on rewards.

Instead of making the agent pick from thousands of possible words directly, we simplified the problem:

The agent chooses how to guess (strategy), not the exact word
It gets feedback (greens and yellows) after each guess
It updates its decisions based on what worked and what didn’t

Over time, it shifts from random guessing to favouring strategies that lead to better outcomes.

The backend handles training and exposes an API, while the frontend provides a simple interface to interact with the system and observe the agent’s behaviour.

Developer Notes

This project was built collaboratively with Shreenithi, and most of the work ended up being less about writing code and more about figuring out how to make RL behave in a controlled way.

A few things that stood out:

A lot of trial and error
- The first few versions either didn’t learn anything or learned the wrong things
- Small changes in reward logic or exploration settings completely changed behaviour
RL is not straightforward to debug
- There’s no clear failure signal
- You end up watching trends like win rate and guessing patterns to understand what’s happening
We had to simplify aggressively
- The original idea was more ambitious, but the state space quickly became unmanageable
- Reducing the problem to “pick a strategy, not a word” made it workable
Progress wasn’t linear
- Sometimes the agent improved, then got worse after more training
- Tuning it felt more like experimentation than engineering
Most of the learning came from breaking things
- Reward systems that seemed reasonable led to weird behaviour
- Fixing those issues was where most insights came from

Overall, this project was less about building a perfect solver and more about understanding how reinforcement learning behaves in practice, especially in a small but non-trivial problem space.