Machine Learning Magic

Harnessing AI to evolve solid MtG decks from cards you already own

Tell me if this sounds familiar. You’ve got a box (indeed, boxes!) full of Magic: The Gathering cards, nearly all from boosters you either bought because cracking boosters is fun, or (more likely) that you’ve accumulated from limited events. Most are bulk commons, but there’s definitely a few really interesting cards in there — but because they’re singles or doubles, and you don’t really play that color, they’re just gathering dust.

You’re thinking: The meta-game needs a bit of a shakeup, but everyone’s so damned focused on Karn, Scion of Urza and Walking Ballista. You’re sure you’ve got something in your collection that can bring an element of surprise, and maybe even a solid win or two.

But where to start?

Scryfall is an amazing resource, but it’s core use case doesn’t cover our issue. It’s geared to helping you when you already know what you want, and you already know what you own (and don’t own).

Wouldn’t it be great if there were a tool that could pluck interesting and novel card combos from your existing collection? Wouldn’t it be even better if that tool could craft a solid deck around those core cards, that you could start playing with today?

Maybe I’m alone in this, but this is certainly how I feel. So I started building just such a tool. Read along to find out how it works, how you can start using it, and better yet, how you can help make it awesome.

In this article, I want to outline at a high level how I approached this challenge, how my approach works, what went well, and what didn’t. If you just want to jump directly to the code, or a deep-dive into the technical nitty-gritty for yourself, you can dig deeper on GitHub.

In particular, I want to talk about an approach to deck building using a technique called genetic algorithms or GAs. GAs allow us to literally evolve good decks using artificial selection. Ooooooooh! Yes, it is as exciting as it sounds!

Spoiler alert: I brought a deck generated by this tool to the February 2018 Dutch Standard Open, and went 0–7, coming in dead last. But I won a couple of games, and surprised almost everyone there, with a deck built around Mechanized Production and Barricade Breaker. It was…interesting. There’s obviously a lot of work to be done! But it’s been pretty fun so far, and that’s what matters the most to me.

Genetic Algorithms

So, what’s a genetic algorithm (GA)? And how can they help us?

The goal of GAs design is to find good enough solutions to optimization problems in a reasonable amount of time. An optimization problem is one in which there are a range of possible solutions — a range too large to search one-by-one, and some solutions are better than others. The classic example is the knapsack problem: Given a set of items of particular size, shape, and weight, and a knapsack to store them in, what’s the best arrangement of them to get as many as possible packed?

Deck-building is an optimization problem, too. Given that there are something like 20,000 distinct cards in print, and that we might individually own a collection of several thousand, in multiple copies of many cards, we want to identify the 60 cards from our collection that, as a deck, give us the greatest chance of winning a game of Magic.

When we can express the solution to an optimization problem as a sequence of discrete items, like genes in a chromosome, genetic algorithms provide an interesting way to find good enough solutions by evolving them (I’ll discuss how below). And guess what? A decklist just is a sequence of discrete items, making deck building an ideal problem to approach with a GA.

Genetic Algorithms

The way GAs work is through a process called evolution by artificial selection, quite explicitly modeled after the biological process of evolution by natural selection. As you can guess, the terminology for this algorithm is all borrowed from evolutionary biology.

To begin, we randomly generate an initial population of several thousand individual solutions (that is, in our case, decks comprising only cards that I already own). Then, each member of the population is evaluated on their optimality (their “fitness”). Then, individuals are selected from the population in pairs — the greater their fitness, the more likely they are to be selected. Then those two individuals are crossed — the sequences of cards chopped up and recombined — to create new individuals for the next generation. Sometimes, individuals will receive a random mutation, just to keep things fresh. This process is repeated many times, and what comes out of the other end considerably more optimal than the random individuals that started the process.

It’s almost…magical. 🤩

But what drives the whole thing, what everything depends upon, is how we evaluate optimality, the fitness function. Essentially, we need to tell the GA how good a deck is. For some problems, the fitness function is straightforward. For example, in the knapsack problem, we can ask: How many items did we manage to stuff into the knapsack? But with Magic decks, this question is a lot harder to answer.

The Fitness of a Magic Deck

We’re thinking of Magic decks as sequences of cards that can be sliced, diced, and rearranged to make new decks. That’s the easy part. But how do we evaluate whether what we have is a good deck?

We’re going to work by assigning points to a deck, depending on how closely it meets a set of rules. Some rules assign points to the deck as a gestalt, and some assign points to individual cards, which are then tallied up. The more points, the better the deck.

For my tool, I set some basic assumptions — these are far from universal, but the more constraints on what makes a good deck, the better a GA often performs, especially in a problem space as large as ours.

No individual card can appear in the deck twice (because this is physically impossible) and the deck may not contain more than 4 copies of any card (because rules). Such cards are removed from the deck.
Decks should consist of exactly N colors, where N defaults to 2, but can be specified. Cards not of the N top colors are removed from the deck.
Decks contain 36 non-land cards. (We don’t need the GA wasting cycles adding land to our deck, we can do that ourselves manually at very little mental cost later.) Decks lose points if they are too small — and they will be too small if rule 1 or 2 is broken.
Decks should approximate a mana curve of• 9 One-Drops• 13 Two-Drops• 9 Three-Drops• 3 Four-Drops(Borrowed from this excellent analysis on Channel Fireball.) We calculate the euclidean distance from our actual deck to this ideal, and assign points based on how close the deck is to the ideal.
Decks should roughly consist of• 16 Creatures• 8 Artifacts• 3 Enchantments• 1 Planeswalker• 3 Instants• 3 Sorceries(Totally made up, and driven by my love of the artifact-heavy Kaladesh block.) We again calculate the euclidean distance to this ideal, and assign points based on how close the deck is to the ideal.
The greater the average power and average toughness of the creatures, the better. Better power and better toughness get more points.

(At the moment, the GA doesn’t check to see if the deck is legal for any particular format, but that wouldn’t be hard to add.)

The idea is that decks don’t have to conform exactly to these criteria, but that we can grade them on how closely they meet each criterion. Granted these constraints are arbitrary (they are totally configurable in the code, of course!), and subject to taste. They’re maybe not even the most important considerations, but they are fundamental, and we can at least demonstrate that the GA is working if the resulting decks have the right number of colors, the specified mana curve, and the specified type distribution.

And, in fact, at this point we can demonstrably use a GA to generate decks that meet these criteria reasonably closely. These decks will be legal, and to a certain degree playable, but they won’t be very interesting, which is a key part of our problem.

To wit, here’s an example deck built with only these constraints in the fitness function: An Evolved Pile of Cards. As you can see, it meets all the stipulated requirements: It has a solid mana curve, it’s got a reasonable distribution of card types, it’s (mostly) legal. It’s playable. And of course it consists of cards that I already own. That’s pretty amazing considering the GA began by just randomly selecting 36 cards! But it’s far from great.

Almost the definition of a pile of cards. But at least it’s Standard legal? Just pretend you didn’t see that Attune with Aether. https://deckstats.net/decks/108990/1006014-an-evolved-pile-of-cards

Two observations:

Interestingly, at this point, most decks produced are Red/Green, probably because of rule number 5 above (maximize average power and toughness). Also, most of the creatures are further up the curve, for undoubtedly the same reason, leaving in instants, sorceries, enchantments and artifacts to take up the 1-drop and 2-drop slots. Clearly this is far from optimal.
Of course there is absolutely no synergy among the cards. Yawn. Let’s do something about that.

Card abilities and interactions

So, the obvious next step, assuming we don’t find the prior assumptions about good decks horribly offensive (we can tweak them later, at least we know they are working), is to tell the fitness function something about how cards interact with each other.

At this stage, I tried to describe my cards in a way that I could use in the fitness function.

I hand-tagged each of my cards (trying my best to use community jargon, but often as not failing no doubt) with the various abilities they have: For example, flying or card drawing. For example, I tagged Heroic Intervention as “granting hex proof” and “granting indestructible”.
I also hand-tagged each of my cards based on the triggers for triggered abilities, for example, enters-the-battlefield, or…well, actually, that’s as far as I got, because the range of possible triggers is rather overwhelming. For example, I tagged Felidar Guardian as having an enters-the-battlefield trigger.
I then hand-tagged each of my cards if they have an interesting interaction with cards with specific abilities, triggers, or card types. I called these (a mistake!) “affinities”. For example, Embraal Gear-Smasher has an affinity for artifacts.

You can find all of these annotations in data/annotations.json in the code. There’s a lot going on there!

Then, I also created a list of more general interactions — both positive and negative. For example, flying creatures (to my mind!) get a little bit better when there are more of them, but they don’t pair well with cards that grant flying (because that’s redundant). Cards that produce energy (this is Kaladesh, after all!) pair very nicely with cards that consume energy, but don’t interact at all with cards that don’t use the energy mechanic.

Creating this list was a thankless task, so I left it quite short, focused on interactions that I was particularly interested in (especially energy!). You can find the list of interactions I documented in data/interactions.json.

Then, in the fitness function, I added a few more rules. The fitness function examines each pair of cards in the deck:

Card pairs with abilities that interact well are given points. Card pairs that interact negatively lose points.
Card pairs where one has an affinity for the other (using my own terminology described above) are given points.

Now, for example, when the fitness function sees two flying creatures, it nods in approval. When it sees Embraal Gear-Smasher paired with an artifact, it nods in approval. When it sees Aetherstorm Roc paired with Mighty Leap, it gets shirty.

The Results

So: A good deck is legal, has a good mana curve and distribution of card types, and is rich in card affinities and interactions. How does the GA perform with this measure of fitness? Have a look at Evolved Energy, and see what you think.

More than a pile, less than GP-winning. https://deckstats.net/decks/108990/1006019-evolved-energy

Not freaking bad, considering we’re measuring only a small subset of abilities and potential interactions (with a heavy emphasis on energy and enters-the-battle-field effects), and starting from randomly generated decks, and we’re constrained to my rather limited collection!

But it could be a hell of a lot better. Clearly, we’re being hamstrung by the rules-based approach of identifying and making explicit each and every ability and interaction. I am pretty sure the next step will involve moving to a more statistically-oriented system, rather than trying to make very possible interaction explicit.

If you want to give it a spin for yourself, the code is open source and available on GitHub. The README contains building and running instructions.

Future Work

The mechanism for tracking abilities, interactions, and so forth is incredibly cumbersome. For one thing, it requires on manually curating each card. For another, I don’t think my system of labeling abilities is especially good. It’s also needlessly complex.

Better would be a system that looked at what successful decks actually look like in the wild, and generalizing from what’s worked for others. What could that look like?

The next step, as I see it, is twofold. First, I want to deploy some natural language processing techniques like this fun experiment with deck archetypes to build a set of card archetypes, based on the rules text s— I’ve already started on this and am seeing some promising early results that I hope to share soon.

But the card tags won’t be helpful by themselves. So the second step is to start pulling down top-tier deck lists, and examining how the card archetypes generated in the previous step interact with each other in the wild.

From this, we can build a statistical model that we can feed into the fitness function. We can ask: Here is a card, what are its archetypes? Do these archetypes hang together with the other archetypes seen in the deck, based on what we’ve seen in the wild?

This is, I think, where this project is heading. I’ll have a new progress report soon!

How You Can Help

I would love some help with this. Let’s be honest, though: The code is a mess. Basically, I just threw code together until it worked, with only a modicum of sensible code structure and commentary. It’s not very clear exactly how the code goes about its business. That’s on me.

But if you really do want to jump in, and you’re willing to be patient with me as I document things, well, just start digging in on anything you see that’s missing that you can fix. The codebase and structure could be cleaner, it could be more user friendly. Just have a look at the GitHub issues around the project.

I’ll be opening up additional projects in the near future for building card archetypes, as well as analyzing the archetypes of successful decks, if you’re interested in helping push that forward.

And if you just want to show your appreciation, you can always use my Cardmarket affiliate link when shopping for your next deck, or just hit the applaud button below!