An AI that doesn't Make Pokemon (DevLog)

I made an AI that makes Pokemon. Sort of. Here are some learnings.

Apr 17, 2024

I’ve been a huge Pokémon fan since I first discovered the franchise at age 7. When I was younger I’d fill notebooks with doodles of made-up Pokémon.

A few years ago I saw someone’s project on Twitter where they trained a Generative Adversarial Network (GAN) to make Pokémon using AI, but, after losing track of it, I figured I would make one myself. While I am a self-proclaimed Pokémon master, but my machine learning experience pretty much ends at the UC Berkeley ML course CS189. So I guess that meant I was 50% of the way there. Armed with my ChatGPT subscription and a dream, I set out to make unlimited Pokémon.1

Okay, fine, you can laugh. So… I didn’t really land this project where I wanted to. That said, I found a lot of inspiration while working on this, and now I just want to move on to new things. I’m posting this because there’s something nice about donating ideas and learnings once I’m finished with them.

Galarian Moltres | Official Website | Pokémon Sword and Pokémon Shield — Actually, the above output reminds me a bit of Galarian Moltres, which is a real Pokémon.

The outputs from my model aren’t in my opinion totally un-Pokémon-like though. I mean, the quality of my images is obviously worse, but the silhouettes don’t always seem too Farfetch’d.

If you squint and think about Pokémon, these might begin to resemble Pokémon. I don’t know, I’ve been looking at colorful blobs for too long, I’m likely to be delusional here.

Training A Model

Getting started with ChatGPT and online resources was quick and easy. I put my request into ChatGPT and it told me to use a GAN. When I didn’t really know what that was, I asked and it told me. Let’s start there:

A Generative Adversarial Network is a machine learning framework composed of a Discriminator and Generator. These two networks are set up in an evolutionary arms race. My Discriminator was fed images of Pokémon as input. Its job was to decide whether an image output from the Generator was real or not. The Generator’s goal was to fool the discriminator, and it learned whether its outputs were right or wrong based on what the discriminator gave the OK to. For an example of a GAN is capable of check out Artbreeder.

Generative adversarial networks explained - IBM Developer — Diagram of a GAN training a generator to produce images of a handwritten number.

It turned out to be pretty easy to find images of Pokémon too; there was a GitHub repo full of ‘em.

A variety of handy tutorials and ChatGPT got me up and running with a training loop. From there, I spent an embarrassingly long time on Google Colab, basically fiddling around with the parameters until I got a model that actually started to produce blobs instead of vaguely colorful noise.

One huge issue is the Discriminator being much stronger than the Generator. This would result in divergence and outputs essentially going to mush:

Diagnostic graphs like this were helpful in understanding when the model started to diverge:

A Laundry List of Learnings:

Add features: line detection as a feature helps. I think I should’ve added eye-detection as a feature and checked if that helped. This is in contrast to leaving training data un-augmented.
- I think some post-processing (e.g. de-noising the outputs) could be helpful too.
Hyperparameters all seem to have a sweet spot. E.g. scaling batch size did not improve my results past a point.
Knowledge about training data helps: Why was my generator cranking out butterflies? Because the training data unintentionally had a dozen images of the pokemon Vivillon, a butterfly with a bunch of regional forms.
See those butterflies in the Fake Images?
I ended up just deleting the extra Vivillon from my training data
Annealing: By using a scheduler, I can tune down the rate at which a model learns, effectively allowing the model to converge at a pace I want. This can also help balance learning rates between the Discriminator and Generator (i.e. it stops one from getting much stronger than the other; I had to tune down the discriminator’s learning rate much faster than the generator’s).
Curriculum: For a given epoch, it can be helpful to cut off training early in case the generator is struggling to give it time to ‘catch up.’ (I think this is a variation on Curriculum Learning).
Playing around develops intuition: When I first started this project, I didn’t really know what any of the hyperparameters did. I now have a much stronger intuition about the large variety of knobs I can turn. In future ML projects I am sure I will continue to develop this. Practically, this is really annoying for ML because it involves months of tweaking parameters or model architectures then waiting for training to complete. There are existing frameworks to automatically tune these hyperparameters; I will be using these in the future.
Colab is really annoying. Google Colab, which I had to keep paying in $10 increments for GPU access while training my models, doesn’t really enjoy running longer than a few hours without interrupting and asking if I’m still there. I would not use it for an ML project again if it involved a longer training time.

Extensions

I imagined that a working AI could be applied in a ROM hack to make a Pokémon video game using entirely AI-generated Pokémon. This could be augmented with more AIs that generate:

Evolutions of a Pokémon from it’s sprite
Back sprites from front sprites
Movesets & typings from sprites

Alas, using a GAN from scratch doesn’t really seem like the way to approach the above use-cases. I think a jailbroken GPT image generator probably would do a radically better job at producing something Pokémon-like.

GitHub Repository: https://github.com/mohakjain/pokemon-gan

As of time of writing there are 1025 unique Pokémon species. I’m not sure an AI to make more is totally necessary, but, well, you know.

Art & Artifice

Discussion about this post