In our latest online programming competition, Hypersonic, the two winners each chose to use one of these algorithms. As most players probably recognized, Hypersonic was strongly inspired by the game Bomberman. Each player had to implement an Artificial Intelligence, a “bot”, to move a character on a grid and place bombs to destroy crates and possibly eliminate other players. The goal was to survive and destroy the most crates.
A Good Old Depth-First Search Algorithm
That’s what Izanexis, the winner of the contest, used.
“The crucial thing in Hypersonic is to destroy boxes faster than your opponents do.
Let’s first imagine we don’t have any opponent and we just want to destroy all crates as fast as we can. I didn’t try to invent any heuristics or implement a self-educating system. Just a good old Depth-First Search! On every depth level of this recursive search, I tried to perform at most 10 types of actions:
- place a bomb + stay where I am
- do not place a bomb + stay where I am
- place a bomb + move left
- do not place a bomb + move left
- place a bomb + move right
If an action happens to be valid, I simulate the environment and apply changes to it. The process recursively continues until some depth level D is reached. At the level D of the search, I estimate a score for the situation I’ve reached and remember the global best score along with the sequence of actions leading to this situation.
My initial implementation was fairly straightforward and allowed me to achieve fixed D=5 on any test-case within 20ms (with #pragma GCC optimize(“O2”)).
This was basically the extent of what I did the first day of the contest. After that, I spent 4 days debugging, improving and speeding up the code.
Since different paths in the DFS tree often lead to the same (or very much alike) situations, we can prune away those branches that lead to the situations where we have already been. In the Hypersonic game, this has improved my depth level up to 6-7.
b. Dynamic Depth
It becomes evident that in some situations we have enough time to run 10-12 steps deep when searching, while other situations branch very quickly, leading to a much lower level of recursion. In order to fully use the provided resources (100ms), we try to go as deeper as possible until we hit the time limit
Let me just show you the C++-based pseudocode that makes use of Caching + Dynamic Depth:
The Dynamic Depth approach has increased my depth level to 6-15, depending on the initial situation.
I’m convinced that any required operation such as moving a player, exploding bombs, removing crates or bonuses can be done in O(1) or almost O(1) time using bit masks in Hypersonic. However, I did not achieve such state-of-the-art implementation. I just replaced some commonly used operations with bitwise operations for speed and storage efficiency.
The Hypersonic grid contains 11*13 cells, where 5*6 cells are always immovable and irreplaceable stones, so we have 11*13 – 5*6 = 113 playable cells. Luckily, 113 is less than 128, so we can store all boxes on a field within a GCC’s unsigned __int128 type. If GCC wouldn’t support this type, we would have used two unsigned long long for the same purpose. After that, we can remove a box from a known location using a simple XOR operation. Most importantly, we can easily calculate the number of crates of a field using this code:
__builtin_popcountll would compile into a single popcnt instruction if the CPU architecture supports it, and if GCC feels like doing so (it usually does not, unless provided a -mpopcnt) option. Otherwise, it will compile into most efficient code possible to calculate those bits. From my experience, the resulting code is fast enough to be considered as a constant-time operation.
Same logic applies to storing and also manipulating bombs, players and bonuses.
These improvements allow to drastically speed up the score and hash evaluation in some cases.
Estimating the Score
No rocket science here. The score of the state consists of the following:
- Number of crates that I’ve destroyed before I reached this state + the number of crates I will destroy in the next 8 moves if I don’t move at all
- Number of bonuses that I’ve collected, the score decreases if I already have enough bombs and a respectable fire range
- If I’m dead by this time, I strongly decrease the score
- If I can escape bombs’ fire within next 8 moves, I strongly increase the score
- In the case an enemy can make such 2 moves, that I’m dead or deadly stuck after them, the score is strongly decreased
- If I can make such two moves, that an enemy is definitely dead or deadly stuck, no matter what he does in the future, the score is a little increased. (I removed this in my very final submission)
From my experience, the information provided above is enough to rank in the top 10 of the Hypersonic competition.
The Overwhelming DFS replacement
It soon becomes obvious, that DFS search visits lots and lots of useless states. E.g., if you manage to destroy 3 crates within next 12 moves, then another search path, where you do not destroy any (within next 12 moves), is most probably useless and should not be evaluated further. However, in the DFS search, we know nothing about neighboring states on the same level of depth, so we can’t estimate whether the situation is worth expanding.
The trivial solution to the problem is to replace the DFS with a Breadth-First Search algorithm (BFS) and expand all situations on the same depth level at once. On the other hand, BFS requires to simultaneously store all different states from the same depth level, which may require too much memory. The solution to the problem is to leave N (100, 300, 500) best states from the single depth level and to prune away less promising ones.
Such approach allows iterating 30-50+ levels deep within 100ms. And thanks to this I was able to rank 1st in Hypersonic. Here goes the C++-based pseudocode:
In order to better understand whether one solution is better than another, I wrote a sandbox, that constantly clashed different strategies with each other. And only after a few thousands of battles, you can conclude that solution A is 5.2% better, than solution B. Otherwise, it’s sometimes very hard to say for sure, whether some improvement you’ve made improves anything.”
Breadth-First Search Use
That’s the algorithm Romka, who ranked 2nd, used.
“At first I thought that brute-force will perform poorly in this task. The reason was that you need to have a depth of at least 8 in order to see what happens after you put a bomb and it explodes. So my initial approach was the following.
For each cell on the field
- Find some optimal path to this cell to pick up a lot of bonuses (this path could visit each cell at least once)
- Find some optimal bomb placement along this path
- Calculate the score for this path based on found bombs placement and bonuses collected
- Select plan with the best score
It was enough to be top-1 for several days, but this approach has obvious limitations. Then I decided to do Breadth-First Search on the graph of all states. I picked BFS because I could naturally limit its depth only by the time limit, which is harder to do with DFS. I decided not to pay any attention to my opponents at first.
One node of the graph was describing the state of the world after some steps. The data structure included:
- Vector of bombs that were placed either at the beginning or during the BFS
- Mask of exploded boxes (64-bit int)
- Mask of exploded bonuses (64-bit int)
- My position, bombs, range
While the maximum number of bombs was 65, I allowed collisions for them to store in a 64-bit mask. It’s quite obvious that box number will quickly drop below 64 after the first 8-12 turns of the game.
When performing BFS, for each leaf node extracted from the queue, I did the following:
- Calculate possible moves before the explosion of bombs
- Perform bombs explosion
For each possible move:
- Check if I will not trap myself (that is by far the most time-consuming part)
- Create new state after that move and put it to the queue
When I have reached the time limit for the move, I performed a move that leads to a branch with the state with maximum score. I calculated the score for each state as a linear combination of values of exploded boxes and picked bonuses.
When considering boxes, I assign higher values to boxes with bonuses than to empty boxes. Besides that, I reduce box value by (some coefficient) * (steps before the explosion of the box) in order to explode boxes faster.
When considering bonuses, I assign a value to it depending on how much of this bonus I’ve already picked up. I decided that I do not need more than 5 bombs and more than 8 explosion range. I preferred the bonuses increasing bomb count compared to the ones that increase range.
That was quite good and I decided to improve it. I created “list of forbidden initial moves”: at the beginning of each turn, I did brute-force search of depth 2 for me and every opponent. If I see that after some move, the opponent has a sequence of two moves so that I get trapped, I added this move to the list of forbidden moves. This gave me much better survivability. I decided not to spend my time writing code for attacking opponents because I found it very hard to achieve: it is very difficult to trap an opponent that performs at least 2-ply DFS in order to escape (as I do).
This was not yet as good as I wanted. I had quite ooptimizedcode and still could get only to depth 8-9 at the beginning of the game and to depth 4-5 in the middle when the field is almost empty. I decided then to cut off some branches of the BFS tree.
My first cut was not to put a bomb if it will not explode any boxes. While it is not optimal in some situations, it is a very rational decision in general. That increased my depth by 1-3 in some cases.
But the main improvement that allowed me to secure my rank in the top-3 was converting BFS to a Beam Search. On each tree level, I decided to leave only some hundreds of nodes that were “most promising”. I tried to do this in the Smash The Code contest too, but there were too many combos and it was very hard to decide which nodes were more promising than others.
Here in Hypersonic, it was very, very natural. I selected nodes according to the same scoring function described above. It turned out to be a blast 🙂
With this improvement, I managed to easily reach a search depth of 30-40 during all the game. It allowed my bot to see way more ahead than other players.
So the secret was just in the Beam Search, not in the scoring function, as some people assumed in the chat :)”
Kimiyuki used this. He ranked 3rd in the contest.
“I did a simple Beam Search: generate 10 (MOVE/BOMB and 5 directions) next states for each current state and select good ones. Depth was 7,8 turns.
To me, the main difficulty of this game is the distance of time between the action (e.g. putting a bomb) and the effect (e.g. explosion), so I tried to solve this.
I think the important points of my implementation are the following:
- In the evaluation function, I estimate the boxes which will be broken in the future.
- For each state in the beam search, I roughly calculate whether it is a survivable state or not.
- For diversity of states, I restrict the state sets to having only a few states for the same player position.
I didn’t implement anything to take other players into account. So my AI was often killed by others.
I suspect that the one step of beam search should be one event (e.g. getting item), instead of one turn.”
Thank you all of you for sharing with us and congratulations again! You can find strategies and feedbacks from a lot of other players in this forum topic.
We will release the game in the bot programming section of the platform really soon, stay tuned!