Nov 2025

Order of Markov

Built an experimental framework for estimating the minimum Markov chain order required to model sequential prediction tasks.

ReactTypeScriptPythonFastAPITailwind CSSVercel

Source Code Website

Overview

Order of Markov started as a slightly absurd idea that refused to go away. The premise is simple: take a question that a human could answer with a bit of thought, and instead route it through a large language model to produce a single number between 1 and 5. That number represents the “correct” Markov order. The result is a compact, almost deadpan system that wraps a heavyweight model around a trivial decision, just to see what happens.

Behind the scenes, this became less about the answer and more about observing behaviour. The project sits somewhere between a toy and a probe. It is intentionally overbuilt for what it does, which made it a useful playground for experimenting with how LLMs respond to constrained outputs.

Problem

Most LLM applications aim to solve complex problems or automate meaningful tasks. This project deliberately does the opposite. It asks whether a model can consistently handle a tightly bounded, almost simplistic decision problem without overthinking it.

The “problem” here is less about user need and more about developer curiosity. Can a model reliably pick a number from a fixed range when given a vague prompt? How often does it contradict itself? How much infrastructure is required to produce something that should take a human a few seconds?

In practice, the friction came from the unpredictability. Early iterations produced wildly inconsistent outputs. Sometimes the model would refuse to commit. Other times it would confidently justify completely different answers for nearly identical inputs. That inconsistency became the core motivation to keep refining the system.

Solution

The solution is intentionally minimal on the surface. A small frontend built with Vite, React, and TypeScript sends a short problem description to a FastAPI backend. The backend forwards that input to an LLM and expects a structured response: an integer (1–5), a short justification, and a confidence score.

Most of the effort went into making that interaction feel stable despite the underlying unpredictability. Prompt design went through multiple iterations. At one point, the model was returning paragraphs instead of JSON. Another version would comply with the format but sneak in extra commentary. Tightening that loop was less about strict validation and more about nudging the model into behaving.

The UI reflects the same philosophy. It is intentionally lightweight and fast, with small touches like keyboard shortcuts and theme awareness. The goal was to keep the interface out of the way so the oddity of the responses takes centre stage.

Architecturally, it is straightforward, but the interesting part is how much “invisible” effort goes into making a simple pipeline feel reliable. There is a thin line between a playful experiment and something that feels broken, and most of the design decisions were about staying on the right side of that line.

Developer Notes

Setting this up locally is simple, but getting it to behave consistently took more trial and error than expected. The backend runs on FastAPI with a single endpoint, and the frontend is a standard Vite setup. On paper, it is a quick start. In practice, most of the debugging time went into the interaction with the LLM.

One recurring issue was malformed responses. Even with strict instructions, the model occasionally drifted. Instead of overengineering a parser, the approach was to surface detailed errors and make failures obvious. This made debugging faster and kept the system honest.

Testing across devices was uneventful from a UI perspective, but latency differences were noticeable. On slower connections, the weight of calling an LLM for such a trivial task becomes very apparent. That contrast is part of the experience rather than something to optimise away.

Extending the project is straightforward. The backend can swap out models or providers with minimal changes, and the frontend can be adapted to test different prompt styles or output formats. The real flexibility lies in experimenting with how the question is framed and observing how the model reacts.

The main takeaway from building this was not technical complexity but behavioural nuance. Even a constrained system can produce surprising results when the core decision-maker is probabilistic. The project leans into that unpredictability rather than trying to eliminate it.