Solving Colordle in Four Guesses*

Colordle is a color-guessing game. A secret color is chosen from a database of 29,979 named colors. You guess colors. After each guess, the game tells you how close you were — a score from 0 to 100. Before reading further, go play a few rounds — it builds intuition for what follows.

If you precompute a table of every color's score against a few fixed probes, you can look up the answer with a KD-tree instantly. That gives you 100%. But you've just memorized the database. The game becomes trivial. Not interesting.

The question that kept me going: what can you figure out honestly? No database. Just the public distance formula and a few measurements.

Here is what a game looks like.

Each score is a distance measurement: score = 100 − distance. Three distances from three known points in a 3D space. That sounds like enough to pin down a location.

It almost is.

30,000 Colors in a Room

Colors live in CIELAB space — a 3D coordinate system where L is lightness (0 = black, 100 = white), a is the green–red axis, and b is the blue–yellow axis. Below is a sample of the 29,979 database colors, plotted at their true coordinates and rendered in their actual color.

Step through each probe guess and watch the candidate set shrink.

But before you do — what you're about to look at is a lie. It is, in fact, the most common lie told about color: plotting points at their raw CIELAB coordinates and calling it a color space. Every color picker, every textbook diagram, every visualization you've seen of "colors in 3D" does this. It's a lie only in the sense that it is faithful to the numbers the computer stores, not to the distances your eyes perceive. Two colors that look far apart here might be nearly indistinguishable in person, and two that sit close together might look obviously different. The space is warped — and later, I will unwrap it. But first, the raw coordinates.

29,979colors it could be

After one probe, the secret could be any of 486 colors — all sitting at the same perceptual distance from the first guess. After two probes, just 12. After three, only 2 candidates remain.

Two. Not one. I will come back to this.

The Distance Function

The game's creator chose CIEDE2000 — a perceptual distance metric, engineered over decades to match how humans actually see color difference. Euclidean distance in LAB is simple but wrong: two colors can be far apart numerically yet look identical, or close numerically yet obviously different.

CIEDE2000 corrects for this with position-dependent weighting. It operates in chroma–hue–lightness coordinates rather than raw cartesian LAB. The full formula, for two colors $(L_1, a_1, b_1)$ and $(L_2, a_2, b_2)$:

$$\Delta E_{00} = \sqrt{ \left(\frac{\Delta L'}{S_L}\right)^2 + \left(\frac{\Delta C'}{S_C}\right)^2 + \left(\frac{\Delta H'}{S_H}\right)^2 + R_T \frac{\Delta C'}{S_C}\frac{\Delta H'}{S_H} }$$

where $\Delta L'$, $\Delta C'$, $\Delta H'$ are differences in lightness, chroma, and hue — but in modified coordinates that depend on both colors. The weighting functions $S_L$, $S_C$, $S_H$ vary with position: lightness is weighted differently near black than near white, hue matters more at high saturation. $R_T$ is a rotation correction that kicks in for blue hues, involving a Gaussian and an arctangent.

If you want to understand what it means to modify the Pythagorean theorem like this — to have a distance function that warps depending on where you are in the space (a Riemannian manifold) — this video about metric tensors is a beautiful introduction.

Below: the ratio of CIEDE2000 to Euclidean distance across the a*–b* plane at L*=50, with neutral gray at center.

The ratio varies from 0.62 to 0.99 — nearly a 40% warp. A "sphere" of equal CIEDE2000 distance is a warped blob in LAB coordinates. Any solver needs to account for this.

The Problem with Trilateration

Trilateration — finding a point from its distances to known references — is a solved problem. GPS does it. Indoor positioning does it. It works cleanly when one thing is true: the space is Euclidean. The Pythagorean theorem holds. Distances add up the way you expect.

CIELAB with CIEDE2000 is not that space. The distance between two points depends on where those points are — not just how far apart they sit. A step in the blue region costs differently than the same step in the red region. The Pythagorean theorem does not hold. And trilateration, in every form, assumes that it does.

So every approach to this problem is really a strategy for dealing with curved space. Some try to learn the curvature. Some try to flatten it. Some try to ignore it entirely and just solve the raw equations. Each one makes a different bet about what matters.

Methods A & B: Learn the Shape of the Space

The first approaches: learn a small model of the curvature, then do standard trilateration in the corrected space. Both train on random LAB pairs using the public CIEDE2000 formula — no database.

Method A. Metric Tensor (36 parameters)

At any point in color space, distance can be locally approximated by a metric tensor — a 3×3 matrix that says how much each direction costs:

$$d(a, b) \approx \sqrt{(a-b)^T \cdot G(\text{midpoint}) \cdot (a-b)}$$

I model $G$ as varying linearly with position: four symmetric 3×3 matrices, 36 parameters total.

Method B. Learned Embedding (115 parameters)

Alternatively, learn a small neural network that flattens the curved space — map LAB to a new 3D space where Euclidean distance approximates CIEDE2000:

def embed(lab):
    h = lab @ W1 + b1     # (3) → (16)
    h = relu(h)            # nonlinearity
    return h @ W2 + b2     # (16) → (3)

115 parameters. In this flattened space, trilateration works better. With 4 probes (5 guesses total), both methods reach 100%. With 3 probes (4 guesses), about 67%.

Respectable. But the models are learning an approximation of the distance function, then trilaterating in the approximate space. What if I used the exact formula instead?

Method C: Just Solve the Equations

I know the CIEDE2000 formula. I know the three probe positions. After probing, I have three numbers. So I can write down the system directly:

$$\Delta E_{00}(p_1, x) = d_1, \quad \Delta E_{00}(p_2, x) = d_2, \quad \Delta E_{00}(p_3, x) = d_3$$

Three equations, three unknowns $(L, a, b)$. No approximation. No learned parameters. Just hand the exact equations to a numerical root-finder and let it converge.

This works. scipy.optimize.least_squares converges in about 10 iterations, with residuals below $10^{-28}$. Zero learnable parameters. 63% win rate.

Wait — only 63%? The solver converges perfectly every time. Every solution it finds satisfies all three equations exactly. So why does it get the wrong color 37% of the time?

The Two-Solution Problem

Because three distance equations in three dimensions don't have one solution. They have two. This is the fundamental geometry of trilateration.

Probes	Constraint geometry	Degrees of freedom
0	All of 3D space	3
1	A surface (warped sphere)	2
2	A curve (sphere intersection)	1
3	Two points	0 + 1 bit of ambiguity

Two spheres intersect in a circle. A third sphere cuts that circle at two points. This is basic geometry — and it's the same reason GPS needs four satellites, not three. The fourth breaks the Earth-surface/outer-space ambiguity.

I confirmed this isn't a CIEDE2000 quirk by running the same experiment with plain Euclidean distance:

Metric	1 solution	2 solutions	3+	Honest win rate
Euclidean	46%	54%	0%	75%
CIEDE2000	32%	59%	9%	58%

Same two-point ambiguity. The warped metric was never the bottleneck. The bottleneck is the geometry of three spheres in three dimensions.

Unwrapping the Space

Every method so far has been fighting the same enemy: the space is curved, and my tools assume it isn't. The metric tensor tries to describe the curvature locally. The neural network tries to flatten it globally. The root-finder sidesteps the question entirely and just solves the raw equations — but still lands on two solutions because the geometry of three spheres in three dimensions doesn't care whether those spheres are round or warped.

So what does the curvature actually look like? Below is every color from the earlier visualization, shown at its CIELAB coordinates. Click "Perceptual coordinates" and watch them move. Each point slides to a new position where the Euclidean distance between any two points matches their CIEDE2000 perceptual distance — computed via multidimensional scaling and Procrustes-aligned to the original so you can see the deformation, not just an arbitrary rotation.

This was the part of Colordle that kept me up at night. I could live with the metric tensor being a locally valid fib. I could not live with pretending that a distance function which violates the triangle inequality in places had a flat embedding somewhere, and nobody around me was going to go hunt it down.

The deformation is subtle in some regions and dramatic in others — particularly in the blues, where CIEDE2000's rotation term $R_T$ kicks in. Points that looked evenly spaced in LAB bunch together or spread apart as the perceptual geometry reshapes around them. This is the lie from earlier, replaced with a different, less bad lie. The raw coordinates were faithful to the numbers the computer stores. The perceptual coordinates are faithful to what your eyes actually see — as far as a flat three-dimensional space can be.

Here's the part I find funny. The exact solution — classical multidimensional scaling on the full database — is out of reach on my machine. SMACOF needs the entire pairwise distance matrix in memory, and the matrix is $O(n^2)$. At 20,000 colors it fits and the embedding converges; at 30,000 the kernel reaches for 48 GB of RAM and the OOM killer shoots it. The exact method fails on the exact problem, and the consolation prize is a smaller lie.

So I reached for stochastic gradient descent. A tiny multilayer perceptron — three hidden layers, fifty thousand weights, almost embarrassingly small — trained to minimize stress on randomly sampled pairs of colors, with CIEDE2000 computed on the fly. The network learns a continuous map $f: \text{LAB} \to \mathbb{R}^3$ that covers all 29,979 colors without ever materializing a distance matrix. Memory is $O(1)$ in the dataset size. It converges in minutes on CPU, warm-started from a classical landmark-MDS solution on 600 pivot points so the optimization doesn't fall into a local minimum. The network becomes the lookup: every LAB coordinate I need in perceptual space, I get by a single forward pass through it.

That same map is what bends the shells. I run marching cubes on the analytical CIEDE2000 field in raw LAB — clean, smooth, high-resolution — and then push every mesh vertex through the network into perceptual space. The shell topology is inherited from the LAB side; the curvature you see is what the network has learned to do with it. Turn the shells on. They should be perfect spheres in this space. They are clearly less distorted than the CIEDE2000 potatoes in raw LAB, but they are also clearly not round. The residual lumpiness is how much curvature the embedding couldn't iron out — how much the lie still leaks through.

Where Things Stand

	Strategy	Guesses	Params	Win %	Honest?
—	Score-table lookup (KD-tree)	3	60K floats	100%	No — memorizes DB
—	Method C + DB disambiguation	4	0	97.6%	No — uses DB to pick basin
A	Metric tensor (4 probes)	5	36	100%	Yes, but 5 guesses
B	Learned embedding (3 probes)	4	115	67%	Yes
C	Numerical inversion (3 probes)	4	0	~63%	Yes

"Honest" = no database access except one final snap-to-nearest-color. Training uses random LAB samples + the public CIEDE2000 formula only.

Three honest methods. Three different philosophies — physics, deep learning, optimization. They all hit the same ceiling. The gap between ~65% and 100% is not algorithmic headroom. It is the two-solution degeneracy: a geometric fact about three distance equations in three dimensions.

That should be the end of the story. The theory says I've extracted almost all the information. The geometry says there's an irreducible ambiguity. "Roll credits."

Except the theory is making assumptions that the real game doesn't quite satisfy.

Information Theory (and Where It Breaks Down)

To identify one color among 29,979, I need:

$$H(\text{secret}) = \log_2(29{,}979) = 14.87 \text{ bits}$$

Each distance measurement returns a score with 0.01 precision across a ~100-point range — up to $\log_2(10{,}000) \approx 13.3$ bits per guess. How much of that is new information?

Probes used	Cumulative info	New info	Candidates left
1st probe	12.77 bits (85.9%)	12.77 bits	486
2nd probe	14.87 bits (99.99%)	2.10 bits	12
3rd probe	14.87 bits (100%)	0.002 bits	2

The first probe does most of the work. The third adds a mere 0.002 bits — but those bits narrow 12 candidates down to 2.

The score triples $(s_1, s_2, s_3)$ are unique for all 29,979 colors. The information is there. But having 14.87 bits encoded in three numbers is not the same as being able to decode them without a lookup table. The two-solution degeneracy means that without consulting the database, you cannot extract the last bit.

That's the clean version. Here's what actually happens when you run these methods against the real game.

The game rounds scores to two decimal places. That rounding destroys information — two colors that would have been distinguishable at full precision become identical after rounding. It's a small effect, but it's not zero, and it means the information-theoretic accounting above is slightly optimistic. The triples aren't quite as unique as they look on paper.

And the dataset itself is not uniform. Named colors cluster — there are far more muted earth tones than there are saturated cyans. That clustering means some regions of color space are crowded and ambiguous while others are sparse and easy. A strategy that treats all regions equally is leaving something on the table. A strategy that learns where the colors actually live — that treats this as a data science problem, not just a geometry problem — might do better.

These are cracks. Small ones. But they're the kind of cracks that make you wonder whether the ~65% ceiling is truly fundamental, or whether it's an artifact of methods that are too principled to exploit the messiness of the real game.

What's Left

I don't have a proof. I don't have a strategy that always wins in four guesses, and I don't have a proof that no such strategy exists. What I have is a strong hint from three independent methods that all land in the same place.

What I want to do next is make the distance calculation fast enough to run exhaustive games — every possible secret color, every possible probe configuration. Either there exists a set of probes and a decision rule that resolves the ambiguity for all 29,979 colors in four guesses total, or there doesn't. If it doesn't exist, I want to know the tightest bound: what is the minimum number of games that must be lost? Is the irreducible ambiguity one bit, or is it something stranger — something that depends on the dataset distribution, on the rounding, on the geometry of where the two solutions happen to fall relative to the nearest named color?

There's also the question of whether the dataset bias can be turned into an advantage. The methods above were deliberately honest — they refused to look at the database. But what if you allowed yourself a prior over where colors are likely to be? Not a lookup table, but a learned density. I'm not ready to call that honest. Maybe semi-honest — a relaxation of the rules where you can learn the shape of the distribution without memorizing individual colors. It might be the thing that breaks through the ceiling, or it might just be a more sophisticated way of cheating. I don't know yet. I'd want to understand whether the problem is genuinely unsolvable under the strict rules before I start loosening them.

I'll come back to this.

A Note on the Process

This story must have felt very nonlinear. I kept it that way on purpose.

I started with neural networks — autoregressors, embeddings, residual corrections, Fourier features. All of them achieved 0% win rate when trained honestly. I tried distillation from the database, which worked but was cheating. I tried 2-probe approaches and spent days wondering why they failed, before realizing that two distances in three dimensions is fundamentally underdetermined. The breakthrough came from asking: what if I skip the neural network entirely and just solve the equations numerically? That gave me 63% immediately — better than any learned model — and the failure analysis revealed the two-solution degeneracy, which turned out to be the same geometric fact that makes GPS need four satellites.

The truth is it was even more chaotic than what you just read. I was learning things in real time — reading about metric tensors, looking up multidimensional scaling, running experiments that failed for reasons I didn't understand until days later. Going back. Going forward. Revisiting results. Double-checking things I was sure about and finding out I was wrong. That's the process. Discovery is not a three-act play. You don't get to know which detour was the important one until after you've taken all of them.

One method came from physics. One came from deep learning. One came from optimization theory. None of them solved the problem completely, but together they drew a circle around where the answer has to be. When you're looking for something, you look wherever you need to — you just have to be careful when you're using someone else's tools in a place that's sufficiently different from where those tools were built.

"Sometimes you want certainty when life gives you stats. Some people will tell you the right answer is to press stop, but sometimes the right answer is actually to press pause and hit publish."

* Four guesses, most of the time.

Built with Three.js, Plotly, and KaTeX. Data from 29,979 colors in CIELAB space.