I'm playing around with Bomberland, which was on HN a while ago. My goal is to h...

I'm playing around with Bomberland, which was on HN a while ago.

My goal is to have the AI discover new multi-unit strategies using self-play and reinforcement learning. But so far, every day, I'm failing in new ways because this environment is really tough. Typical action sequences span 50 time steps, so the chance of discovering anything useful through random exploration is effectively 0. Hierarchical models don't work because multiple long-term actions overlap. And the state space is too large for MCTS unless you're willing to burn millions in compute.

But precisely because it's so difficult, every tiny bit of progress feels rewarding. Plus I'm positively surprised that there's still so much unexplored territory in DL RL land.