THE ENTIRE SERIES
More to come…
Note: None of the code for any of these proc-gen tutorials is highly optimised. My goal was to get the concepts across as simply as possible, rather than to make them run as fast as possible. Use the following as a springboard for further learning and experimentation, rather than as the best solution possible.
Also, if you are a complete beginner, it’s probably best to start at the first entry and work your way through, as further into the series we get, the more knowledge from previous entries is assumed.
A* pathfinding. You will never find a more wretched hive of scum and villainy. Or perhaps you will, it is a Wednesday after all. In any case, this is probably the most ubiquitous “advanced” algorithm discussed on the internet. There’s a huge amount of scientific literature on it and a ton of opinions from experts and non-experts alike. The different ways of optimising this bad boy are almost limitless. There’s much, much more to say than can be said in this little blog post, but I’m going to close my eyes, think of England and give it an honest try.
As an aside, if you’re comfortable reading other languages and figuring out how to implement algorithms from pseudocode, I highly recommend reading through RedBlobGames’ write-up on A* here: https://www.redblobgames.com/pathfinding/a-star/introduction.html. Not only am I simply going to be regurgitating a lot of what is said there with my own stupid spin on it, but there’s in-depth animations and explanations that go way beyond what I’m going to. The whole site is a treasure trove of excellent advice and implementations of a ton of different things to do with procedural generation.
We’re going to implement a relatively vanilla interpretation of A*, with only a few levels of optimisation that are pretty easy to tweak on your own. However, I must warn the novice programmer, A* is the most complicated algorithm we have approached so far and if you haven’t already, I definitely recommend doing the series from the start, as we are building upon skills learnt previously. GML as a language is definitely not the best language to approach A* with either, so that’s something that we kind of just have to recognise and move on from. If you want 1000 units pathfinding around your 10 000 x 10 000 map from end to end, you’ll either have to offload the actual pathfinding to an extension written in another language or you’ll have to become a big-brained professor and implement a bunch of optimisation techniques not discussed here (and even then…). However, A* in GML is more than capable for making sure that routes exist between key places in your procedurally generated maps and that’s what we’re going to be using it for after we’ve implemented our room perturbated dungeons in the next entry. Tally ho!
Contents
PRAY WE DON’T ALTER IT FURTHER
A STAR-T
Let’s begin with a little discussion about some other pathfinding algorithms that are not A*. In our previous entry, we implemented a flood-fill algorithm. This is better known amongst the hoity-toity as a “breadth-first” fill algorithm. Something that might be surprising to you if the last entry was your first contact with flood filling is that it can be considered a pathfinding algorithm with only one very small tweak: keeping track of the direction from the cell “currently being filled” to the cell that is “doing the filling” (or, said another way, keeping track of the “came from” direction):


All we need to do is keep track of the direction of the white arrow for each cell and we turn flood fill from a “find all points” algorithm into a “find a path from anywhere to a point” algorithm.
Let’s say we have point A and point B on a map and we want to find out how to get from A to B. We can start flood filling outwards from A, keeping track of the directions like above, and once we hit B with the flood fill we stop. Then starting from point B, all we do is add the current cell to an array (so the current cell would be point B to begin with), look at the direction that we kept track of for that cell and then move to the cell that direction is pointing towards. We then add that second cell to the array and check it’s direction and so on. The beauty of this is that as long as we follow the directions, we will end up back at point A, with an array full of the cells that lead from point B to point A in the shortest path. Reverse that array and you have your path from point A to point B.
This is known as “breadth-first-search” (BFS) and, even without keeping track of the “came from” directions, it is a powerful and very useful algorithm. There’s one big problem though. BFS searches in all directions equally, so it will search in the opposite direction of the target just as much as it searches in the direction of the target. This makes it pretty sub-optimal for pathfinding from one point to another (don’t be fooled though, it has many, many uses where it is the perfect algorithm, this just isn’t one of them).
In any case, we didn’t come here to learn BFS, we came here to learn about A*! So let’s learn about Dijkstra’s (gottem!). Dijkstra’s (or Uniform Cost Search) is useful because it lets us add “costs” to cells. What are costs? The simplest example is movement in the Civilisation games. Trying to move over jungle terrain costs more movement points than trying to move along a road, for example. We have no way to implement this in vanilla BFS, so we turn to Dijkstra’s.
The idea behind Dijkstra’s is similar to BFS, except we keep track of the total “cost” of movement as well as the direction. In the most naive implementation, we consider the cost to simply be 1 per cell:

Here, we can see that the path with the least total cost of movement before hitting the goal is the middle path, with a total cost of 3. To keep track of the cost, we simply use a priority queue. Add each cell around the current cell to a priority queue, with the priority value being the total cost of moving to that cell. Then when we need to move to a new cell, we simply pull the lowest priority entry out of the priority queue, which corresponds to the cell with the lowest cost that we have checked so far.
So we have an algorithm that searches and finds the least costly path to the target. There’s still a slight problem though. If all the cells around are the same total cost of movement, Dijkstra’s will check them all, which means that it, in essence, turns back into BFS. It’s only when Dijkstra’s encounters cells with differing costs that it’s powers are unlocked, however, it doesn’t help us actually get to the target quicker.
Ok, so BFS and Dijkstra’s are both good, but not exactly what we are looking for. What if we just directed the algorithm to move towards the target. After all, we know where the target is right? So just make the algorithm move in that direction…
This is known as Greedy Best-First-Search and anyone who has done any motion planning in games will have an understanding of why/how this might fail. Let’s compare two situations:

In the above situation, Greedy Best-First works fine. We just make it move towards the goal. It goes cell by cell and eventually reaches the goal, without much searching at all (well, I didn’t show it, but it searches the 4 squares around each cell it checks, but that’s all, much more efficient than Dijkstra’s or BFS)! But what about this situation?

While Greedy Best-First will eventually get to the goal, we can see that it follows a circuitous route that definitely isn’t the most optimal. If only we could follow the path that has the least cost associated with it, while also trying to head towards the goal consistently…
Enter our heroic champion, A*! Finally, we’ve worked our way to it. A* takes the best of both worlds from Dijkstra and Greedy Best-First Search. It does this by using a priority queue, like Dijkstra, but instead of simply using the total cost for the priority, it uses the cost of the cell plus the distance from the cell being checked to the goal. If it is moving away from the goal, the distance will increase, meaning that even if the cell cost is low, the increased distance cost will counteract that and the cell further away will be checked later than cells nearby the target (remember, we pick the lowest priority to search first and with the priority being cost+distance
, then if either the distance or the cost increases, that cell will get a higher priority and will be checked later than cells with a lower cost+distance
).
That’s a lot of theory to absorb. I’d definitely recommend having a look at some of the interactive diagrams here: https://www.redblobgames.com/pathfinding/a-star/introduction.html, if you really want to experiment and understand how each of these different pathfinding algorithms work. Anyway, this is about A* in GML so let’s get started looking at how to actually code this thing.
ILLUMINATING THE PATH
Firstly, we create our function and then do a little bit of setup:
///@func AStar(start,goal,map);
///@desc The A* pathfinding algorithm
///@param {array} start An array containing the x and y positions of the starting point
///@param {array} goal An array containing the x and y positions of the goal
///@param {array} map A 2D array containing our map data
function AStar(start,goal,map) {
#macro X 0
#macro Y 1
static map_width = array_length(map);
static map_height = array_length(map[X]);
static frontier_queue = ds_priority_create();
static came_from = array_create(map_width*map_height,-100);
static cost_so_far = array_create(map_width*map_height,-100);
static location = [[-1,0],[0,1],[0,-1],[1,0]];
Remember that the static
makes sure that the variables (and methods) are only created once, rather than being created in every instance that runs the function. So let’s have a look at what we’ve done.
Firstly, we’ve got two macros. These aren’t a part of the A* algorithm, they’re just something that I find useful to do when dealing with arrays holding positions (which we will be doing a lot of here). We set X
to 0 and Y
to 1 and that way, if we store a position in an array like this: my_pos = [x,y]
then we can access x
like this: my_pos[X]
and y
like this my_pos[Y]
. It’s a bit easier for me to read instead of my_pos[0]
and my_pos[1]
.
We figure out the width and height of the provided map, using the same technique as we learnt about in the last entry.
Then we create a ds_priority_queue
called frontier
. The way that frontier
works is each time we are working with a cell, we check it’s surrounding neighbours, calculate the cost of moving to each of them and the distance from each of them to the goal, then add the cost+distance
together and add the neighbour to the frontier
priority queue using cost+distance
as the priority. This lets us pick the optimal cell to examine next from the priority queue, with the lowest priority entry being that optimal cell. Pretty much what I explained in the previous section.
After that, we create two arrays, came_from
and cost_so_far
. We initialise them to a size of map_width*map_height, which means we have one entry in the array for each cell of the map (in other words, the area of the map is map_width*map_height and we want to be able to encompass the entire area in a single array). These arrays are how we keep track of the direction to the previous cell as well as how much the cost is to move to that cell, as explained in the previous section.
Finally, we have the location
. This is a fancy way of figuring out the coordinates to the neighbour of each cell without having to use any maths. The array is all tucked up so it looks kind of complicated, but if you were to visualise it, it would look like this:
-1 | 0 |
0 | 1 |
0 | -1 |
1 | 0 |
The left column is the change in the x
we need to get to the neighbour cell and the right column is the change in y
we need to get to the neighbour cell.
Let’s assume we are at cell x: 10, y: 5. We want to get to the neighbour on the left. We can guess that that neighbour has the coordinates x: 9, y: 5. We pick the first row because we can see it has a -1
in the x
column and that is what we need to change x: 10 to x: 9. Apply that column to the two variables:
x: 10-1
= 9
y: 5-0
= 5
If we wanted to check the neighbour above the cell, we would pick row 3 (as that has a -1
in the y
column) and it would look like this:
x: 10-0
= 10
y: 5-1
= 4
In other words, location
is holding a matrix, and we use that matrix to transform the coordinates.
Ok, so that’s all good, let’s move on with the code a little more.
///@func Pos(cell,map_height);
///@desc Returns the position of the cell as a single integer
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {int} _map_height The height of the map (unfortunately we have to pass this in as it is a static variable)
static Pos = function(cell,_map_height) {
return (cell[X]*_map_height+cell[Y]);
}
///@func OutOfBounds(cell,map_width,map_height);
///@desc Checks whether cell coordinates are out of bounds
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {int} map_width The width of the map
///@param {int} map_height The height of the map
static OutOfBounds = function(cell,_map_width,_map_height) {
if (cell[X] < 0 || cell[Y] < 0 || cell[X] >= _map_width || cell[Y] >= _map_height || map[cell[X]][cell[Y]] == SOLID) {
return true;
}
return false;
}
We’ve got two methods here. One called Pos()
, which takes an 1D array called cell
which holds the x
and y
position for the cell we are looking at like this my_cell = [x,y]
, and we have to also give it the map_height
because when we created map_height
we made it a static
variable. Making map_height
a static
variable helped save on memory and loose instance variables floating around, but static
variables are outside of the scope of method functions, which, a little bit unfortunately, means that we have to manually hand it into the method. We have to do this for a most of the static
variables at some point or another in the following functions. Anyway, what Pos()
is doing is a fancy little trick to let us store both of a cells coordinates (x
and y
) inside of a single unique integer.
Huh? What in the world does that mean? Let’s use some real numbers to get a handle on it. Imagine we have a 3×3 map (so it has a map_width of 3 and a map_height of 3) like this:
x: 0, y: 0 | x: 1, y: 0 | x: 2, y: 0 |
x: 0, y: 1 | x: 1, y: 1 | x: 2, y: 1 |
x: 0, y: 2 | x: 1, y: 2 | x: 2, y: 2 |
We can see the coordinates for each cell there. Lets take the middle cell x: 1, y: 1 and apply our Pos()
transformation to it: x*map_height+y
= 1*3+1
= 4
. Wow, we turned two numbers into one. Let’s do that for all the cells:
0*3+0 = 0 | 1*3+0 = 3 | 2*3+0 = 6 |
0*3+1 = 1 | 1*3+1 = 4 | 2*3+1 = 7 |
0*3+2 = 2 | 1*3+2 = 5 | 2*3+2 = 8 |
We can see that we’ve neatly turned all our coordinates into single integers and there are no repeats, meaning each integer is unique. This is perfect for being able to store each cell in a unique position in a 1D array. If we wanted to save the top left cell into an array, we would simply go my_array[0] = some_data_here
because that cells coordinates collapse into 0. The top right cell would be my_array[6] = some_other_data
because that cells coordinates collapse into 6.
After that we’ve got the OutOfBounds()
function, taking the cell
array again, the map_width
and the map_height
. The way this works has been explained in previous entries. We’re just making sure the coordinates aren’t outside the map size, however, we have one new check, which is to check if we are looking at a SOLID
tile on the map. If the tile is SOLID
, we want to completely skip it, so we act as though it is out of bounds.
Some more code:
///@func Cost(cell,cost_scale);
///@desc Returns the "movement cost" of the cell we are checking, with a scaler option for optimisation purposes
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {real} cost_scale The scaler for the cost, this lets us optimise how much the algorithm will try to avoid costly areas (unfortunately we have to pass this in as it is a static variable)
static Cost = function(cell,_cost_scale) {
var tile = map[cell[X]][cell[Y]];
var cost = 0;
switch (tile) {
case TREE:
cost = 10;
break;
case ROAD:
cost = 1;
break;
case EMPTY:
cost = 3;
break;
case WATER:
cost = 5;
break;
}
return 1+_cost_scale*(cost-1);
}
///@func Manhattan(cell,cell2,heuristic_scale);
///@desc This returns the manhattan distance between cell and cell2, with a scaler option for optimisation purposes
///@param {array} cell An array containing the coordinates for the first cell
///@param {array} cell2 An array containing the coordinates for the second cell
///@param {real} heuristic_scale This lets us influence the A* to act more like Greedy Best-First or Dijkstras, useful for optimisation purposes
static Manhattan = function(cell,cell2,_heuristic_scale) {
return _heuristic_scale*(abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y]));
}
Ok, another two methods, Cost()
and Manhattan()
. Cost()
takes the cell
array and the cost_scale
(cost_scale
is an optimisation technique that I’ll explain in a later section). If we remember the way we worked with our map in the Cellular Automata entry, we created two macros, EMPTY
and SOLID
, and we stored those macros in our map to tell us what each tile was. While it’s not shown in this tutorial, when we created our map, we also created some macros, EMPTY
, TREE
, ROAD
, WATER
and SOLID
and stored them appropriately in our map. In the Cost()
function, we are reading what macro is inside of a particular cell on the map.
We grab the correct tile by reading the x
and y
values from our cell
array using our macros we created at the start, X
and Y
like this: map[ cell[X] ][ cell[Y] ]
. It might seem a little confusing because we have arrays inside of arrays, but cell[X]
and cell[Y]
both just hold a number, so the computer ends up parsing it as map[5][10]
or whatever values cell[X]
and cell[Y]
holds.
Beginners Note: You might notice theswitch()
statement in the code block above. Aswitch()
statement is basically another way of formatting a series of if...else if...else conditionals. To begin with, we put the variable we want to check inside of the round brackets in the switch:switch(my_var)
, then we use thecase:
statements to check if the variable equals a value. So if we wanted something to happen whenmy_var
was equal to 1, we would do this:switch (my_var) { case 1: // Do something break; }
That is equivalent to this:if (my_var == 1) { // Do something }
switch()
is useful when we have a number of exact values we want to check for (often, it will be either enums or macros). For each new value you want to check for, add a new case. We can also add adefault:
case which will run in the event that the variable doesn't equal any of the values we are checking for. Let's do another and check for 1, 2, 3 and add a default case:switch (my_var) { case 1: // This triggers if my_var equals 1 break;
case 2: //
This triggers if my_var equals 2
break;case 3: //
This triggers if my_var equals 3
break;default: //
Always add your default case last. Switches are more organised to view and add entries to than if/else chains.This triggers if my_var equals anything but 1, 2 or 3
break; }
We run the macro value of the tile
variable through the switch()
statement and depending on what tile
is holding, it will add a certain amount to the cost
variable. This number is the “movement cost” that we associate with that cell, like in Civilisation. Then we do a little optimisation maths with cost
(which, as I said, I will explain in a later section) and return what the value is. Generally, you can consider the value being returned to be whatever cost
got from the switch()
statement and you can ignore the maths bit at the end as it only comes into play if we actively decide we need to optimise the pathfinding algorithm.
Then we have the Manhattan()
method. The Manhattan distance is a particular way of measuring distance. If we imagine we are in a city (Manhattan, for instance) and we wanted to get from a particular building to another building somewhere else in the city, we couldn’t just walk across the buildings. We would have to walk along the blocks following the street. You could consider each block in the city as a cell in a grid and if you followed the path in such a way where you made exactly one turn (so, for instance if the building is up and to the right, you would walk up until you are level with the building and then make a right turn and walk right until you got to the building), then you would have walked the Manhattan distance:

In the above example, we would need to walk 3 cells up and 4 cells right to get from cell
to cell2
. This means the Manhattan distance would be 3+4
= 7
.
In the actual code: _heuristic_scale*(abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y]))
we can ignore the _heuristic_scale
for now (it’s another optimisation technique that we will discuss later) and look at the rest:
abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y])
We’re using abs()
because we don’t care if the value is negative or positive (negative values just mean we are moving left or up), we just care what the total would be as a positive number (we’ve covered what abs()
does in a previous entry), so we just figure out what the difference is between cell[X]
and cell2[X]
and then cell[Y]
and cell2[Y]
and add them together to get the Manhattan distance and then we return that value. These two functions are the secret sauce of A*. We were discussing before how cost+distance
tells us what priority a cell has, and this is how we get the cost
and the distance
.
That’s all the method functions covered, so now we’re going to be moving on to the actual A* code and how it works. Let’s look at the first bit:
if (map[start[X]][start[Y]] == SOLID || map[goal[X]][goal[Y]] == SOLID) {
return true_path = [];
}
ds_priority_clear(frontier_queue);
for (var i=0;i<map_width*map_height;i++) {
came_from[i] = -100;
cost_so_far[i] = -100;
}
var start_key = Pos(start,map_height);
ds_priority_add(frontier_queue,start,0);
came_from[start_key] = start;
cost_so_far[start_key] = 0;
The first thing we need to do is check to see if the start
or the goal
is on a SOLID
map tile, in which case we know definitely a path cannot be made, so we can exit out of the function early, returning an empty array. Then we need to clear out any values from the last pathfinding call we made, so we run ds_priority_clear(frontier_queue)
and then we loop through the came_from
and cost_so_far
arrays and reset each entry to -100
. -100
in this case is a completely arbitrary value, but it’s important that the value is less than 0. This is because we will be checking the arrays and filling them with values and all those values will be equal to or above 0. When we’re checking them, we’ll be looking to see if the entry is equal to -100
, which tells us we haven’t visited that cell yet. If you changed the -100
to -1
, we would simply check for -1
instead, but if you changed it to 2
or something positive, we’d be checking to see if an entry is equal to 2
to see if it is hasn’t been visited, but there’s also a chance that we could then fill the entry with 2
, so we wouldn’t know we had already filled it if we checked it again. A bit confusing, but basically positive values are verboten in this for loop.
Then we have the first method call var key = Pos(start,map_height)
. We are sending the start
cell array to the Pos()
function, which will return a unique integer for that cell, and we’ll use that integer to both store data and read data from the cost_so_far
and came_from
array. So if it returned 3, we would know that if we wanted to check the direction of start
cell, we would read the came_from[3]
entry. If we wanted to know the total cost we’ve discovered of getting to the start
cell, we would read cost_so_far[3]
. Of course, neither of those two things really make sense for the start
cell, as there hasn’t been any movement from somewhere else for a “direction” to exist for that cell, neither do we associate a cost with the start
cell (we could if we wanted to, but it would be useless as the start
cell is only visited once at the very start of the pathfinding). We also throw in the static map_height
variable because scoping sucks and we need to do that in order to read it in the Pos()
method.
We then go on to add the start
cell array to the frontier
queue with a priority of 0, save the start
cell into the came_from
array at the position we got from the Pos()
function and save 0 cost to the cost_so_far
array at the same position. This is basically getting everything ready for the loop that will run until we hit the target.
The general way that the loop works is that it grabs the lowest priority cell from the frontier
priority queue. It then checks the cardinal directions around that cell. For each of the neighbouring cardinal directions, it figures out the new cost for that neighbour cell by adding the cost stored in the cost_so_far
array associated with the current cell and the tile cost for the neighbouring cell. It then checks to see if an entry for that neighbour cell already exists in the cost_so_far
array. If there is no entry associated with the neighbour cell or there is an entry, but the total cost value stored in the entry is higher than the total cost we just calculated (as in, the newly calculated cost is lower than the stored cost), we either add or update the cost in the cost_so_far
entry to the new total cost, then we figure out the priority by adding the new total cost and the distance from the neighbour cell to the goal cell together and we add the neighbour cell to the frontier
priority queue, with the priority we just calculated. Finally, we store the current cell in the entry associated with the neighbour cell in the came_from
array. This is done for each neighbouring cell, after which we return to the start of the loop, change the current cell to the cell stored in the lowest priority of the frontier
priority queue and repeat everything. This continues until we either reach the target, or there are no more entries in the frontier
priority queue, which tells us that the target is unreachable. Let’s go over that again.
- We have a current cell, which we get by retrieving the lowest priority entry from the
frontier
priority queue, and we start checking the neighbours of this current cell. - We have a total cost already stored in the
cost_so_far
array for the current cell, which we retrieve. - We add the neighbour cells tile cost to that retrieved total cost to find the “new” cost for the neighbour cell.
- We check the
cost_so_far
array to see if an entry already exists for the neighbour cell. - If an entry doesn’t exist, we skip step 6 and continue on with 7.
- If an entry does exist, we compare the entry to the new cost we calculated, if the new cost is lower, we proceed with step 7, otherwise nothing happens and we move on to step 11.
- We add the new cost to the
cost_so_far
array in the entry associated with the neighbour cell (either creating the entry if it doesn’t exist, or updating it if it does). - We then figure out the priority of the neighbour cell by adding the new cost and the Manhattan distance from the neighbour cell to the goal.
- We add the neighbour cell to the
frontier
priority queue, using the priority calculated in step 8. - Finally, we add or update the entry associated with the neighbour cell in the
came_from
array, where we store the current cell, so when we check the entry incame_from
associated with the neighbour cell later on, it gives us the current cell, letting us know the “direction” of the neighbour cell is towards the current cell. - Repeat for each neighbour cell.
- Return back to the start of the loop and change the current cell to the lowest priority entry in the
frontier
queue, rinse and repeat. - We quit the loop once either the current cell is equal to the goal cell, or there are no entries left in the
frontier
queue (which lets us know the target was unreachable).
Now let’s see the code for all of this:
var target_reached = false; // Set target_reached to false to begin with so we can either update it later if we find the target or leave it as false if we don't
while (ds_priority_size(frontier_queue) > 0) {
// The loop continues as long as frontier_queue has more than 0 entries
var current_cell = ds_priority_delete_min(frontier_queue);
// Grab the current cell from the lowest priority position of the priority cell
var current_key = Pos(current_cell,map_height);
// Get the "key" which is the single integer we condense the x and y coordinates of the cell too
if (array_equals(current_cell,goal)) {
// Break out of the loop if the current cell is equal to the goal cell, marking target_reached as true, so we can handle the "unreachable target" scenario
target_reached = true;
break;
}
// Now we start looping through the cardinal directions and grabbing the neighbour cells
for (var i=0;i<4;i++) {
var nx = current_cell[X]+location[i][X];
var ny = current_cell[Y]+location[i][Y];
// We use the location matrix and the i variable of the for loop to find the neighbouring cells coordinates, using the current cells coordinates
if (OutOfBounds(nx,ny,map_width,map_height)) {
// If the neighbouring cell is out of bounds, we increment the for loop to move on to the next neighbour, ignoring the rest of the code with a continue
continue;
}
var neighbour_cell = [nx,ny];
// We store the neighbour cell coordinates in an array we call next
var new_cost = cost_so_far[current_key]+Cost(neighbour_cell,cost_scale);
// We calculate the new cost for the neighbouring cell
var neighbour_key = Pos(neighbour_cell,map_height);
// We get the neighbouring cells condensed integer to be used in the came_from and cost_so_far array and store it in a variable called new_key
if (cost_so_far[neighbour_key] == -100 || new_cost < cost_so_far[neighbour_key]) {
// Check to see if we need to update the stored data for the neighbouring cell
cost_so_far[neighbour_key] = new_cost;
// If we do need to update, first we update the entry in cost_so_far with the new cost
var priority = new_cost+Manhattan(neighbour_cell,goal,heuristic_scale);
// Then we figure out the priority of the neighbouring cell storing it in a variable named "priority"
ds_priority_add(frontier_queue,neighbour_cell,priority);
// Add the neighbouring cell to the frontier priority queue, with the priority we just calculated
came_from[neighbour_key] = current_cell;
// Finally, update the neighbouring cells entry in the came_from array to point towards the current cell
}
}
}
I think a lot of this has already been explained in the previous paragraphs, but I will pull out a few bits and pieces to explain in a bit more detail.
var current = ds_priority_delete_min(frontier_queue);
It might seem strange that we are using a function called ds_priority_delete_min()
here, but if you read the manual entry, it first returns the minimum priority entry and then removes that entry from the priority queue. We need to delete it in order to not revisit the same cell over and over again, which is why we use the delete function and not just the read function (which would be ds_priority_find_min()
).
var nx = current[X]+location[i][X];
var ny = current[Y]+location[i][Y];
Here, we are using the location
matrix we created at the start. As the for loop runs, i
will go from 0 to 3, being incremented each time the loop resets. This gives us a handy way to check each “direction” stored in the location
array. If we scroll up and look at the way the data is stored in the location
array, we can see there are four rows, starting from 0 and going to 3 (remember, computers count from 0, not from 1). This neatly corresponds to our loop and it means that each cycle of the for loop, we will be checking the “next” row in the location
array. The first cycle will add -1
to current[X]
and 0
to current[Y]
, the next cycle will add 0
to current[X]
and 1
to current[Y]
, the cycle after that will add 0
to current[X]
and -1
to current[Y]
and finally, the last cycle will add 1
to current[X]
and 0
to current[Y]
. So by the end of the loop, we will have checked all four cardinal directions.
Finally, I just want to go over the came_from
and cost_so_far
arrays, and how we are storing the data in them, as I think it might be the most potentially confusing part. When we get the “key” using:
var neighbour_key = Pos(neighbour_cell,map_height);
We are collapsing the position of the cell we provide as an argument (in this case, the neighbour_cell
) as we explained a fair bit further up. This means that a neighbour_cell
at position x: 2, y: 3 with a map_height
of 6 (for example) ends up with a neighbour_key
of x*map_height+y
= 2*6+3
= 15
. Then we store the new_cost
that we calculated for that neighbour_cell
in the cost_so_far
array at position 15
(cost_so_far[15] = new_cost
). If later on we want to find what the total cost for that cell is, we do the maths again to get the key of 15
, and we check cost_so_far
at position 15
(cost_so_far[15]
), and we will be returned the total_cost
that we stored earlier. The same goes for the came_from
array. That’s why we keep referencing the neighbour_key
and current_key
. It’s how we keep track of the position in the two arrays that we are interested in.
Once that loop has run, we’ve finished with “proper” pathfinding part of A* and we need to run through our path from the goal
to the start
cells, and store each cell we visit on the way in a new array:
if (target_reached) {
var reversed_path = [];
var current_cell = goal;
while (!array_equals(current_cell,start)) {
array_push(reversed_path,current_cell);
var current_key = Pos(current_cell);
var current_cell = came_from[current_key];
}
array_push(reversed_path,start);
First we check if the target has actually been reached. If it has, we create an array called reversed_path
(reversed because we are starting at the goal
and moving towards the start
, so if we read the array from the first entry to the last entry in order, we will be going backwards along the path). We’ll fix this reversedness shortly. As you can see, we have another while()
loop, which runs as long as the current_cell
is not equal to the start
cell. As we loop, we first push the current_cell
into the reversed_path
array, which is storing each cell we visit, and then we get the current_key
for the current_cell
.
As explained in the previous paragraph, we can use this key to find the entry in the came_from
array that corresponds with the current_cell
. What did we store in the came_from
array? The previous cell which “lead on” to the current_cell
(we called this the “direction” a few times before, but we’re really just storing the previous cell itself).
This gives us a neat path where each cell we visit leads to the next cell in the “best” path from the goal
to the start
. We overwrite our current_cell
variable with this new cell and the loop repeats again, running this over and over again until the current_cell
equals the start
cell. Once that happens, we know we have all the cells we need to pathfind from the start
to the goal
stored in the reversed_path
array (we do end up pushing the start
cell into the reversed_path
array after the loop has finished, but we can decide not to do this and it won’t matter).
We have one last little thing to take care of, correcting the reversed nature of the path.
var true_path = [];
var len = array_length(reversed_path)-1;
for (var i=len;i>=0;i--) {
true_path[len-i] = reversed_path[i];
}
}
else {
return true_path = [];
}
return true_path;
}
We create another array called true_path
and we find out the total length of the reversed_path
array. We’re going to do something slightly strange here. We run through reversed_path
from the end towards the start (as you can see in the for() loop, we actually start at the end of the reversed_path
array and decrement i
, instead of incrementing it, until i
gets down to 0
, which is the “first” entry in reversed_path
). As we are counting down from the end of reversed_path
, we are inversely counting up from 0
for the true_path
. Let’s use some numbers to make it make sense.
We’ll pretend that reversed_path
has 3 cells in stored in, so it’s length is 3
. The entries in reversed_path
correspond to position 0, position 1 and position 2. You’ll notice that despite the fact that we have 3 entries, the last entry is actually in position 2. We have to account for this, which is why we use var len = array_length(reversed_path)-1
. len
is 3-1
or 2
. If we didn’t subtract 1
from the array_length()
, the game would crash as it would try to read position 3
in the array, which doesn’t exist.
Anyway, we start with both i
and len
equaling 2
. If we look at the position we are accessing in true_path
, it is len-i
, and we are setting it equal to reversed_path[i]
. Because both len
and i
are 2
, len-i
equals 0
, so on the first loop, the line would read like this: true_path[0] = reversed_path[2]
. The loop cycles and now i
equals 1
. Because len
equals 2
and i
now equals 1
, len-i
equals 1
instead of 0
. So the line would now read true_path[1] = reversed_path[1]
. Finally, i
decrements once more and now equals 0
, so len-i = 2
and i = 0
which means in the final cycle, the line would read true_path[2] = reversed_path[0]
. As you can see, we neatly read through reversed_path
in reverse order, while adding to true_path
in an incremented order. We have successfully unreversed reversed_path
and we have the properly ordered path stored in true_path
.
Finally, we have the else conditional which gets triggered if target_found
equals false. As we know, this means the goal
cell was unreachable. If we wanted to be really fancy here, we could keep track of whatever cell we visited had the lowest Manhattan distance to the goal
cell and do the reversed_path
> true_path
thing for that cell, like we did in the if (target_found)
conditional. This would mean that if the goal
couldn’t be reached, we would still be able to return a path to the nearest position to the target, but we won’t do that here.
Instead, we’ll just create and return an empty true_path
array if the goal
can’t be reached (the same as if the start
or goal
is on a SOLID
tile) or finally return the filled true_path
array if the goal could be reached. This is the end of the A* process. We’ll catch the returned path whenever we use the A* function and then we can use whatever movement system we decide on to move from cell to cell along that path.
Let’s have a look at our total A* pathfinding function.
///@func AStar(start,goal);
///@desc The A* pathfinding algorithm
///@param {array} start An array containing the x and y positions of the starting point
///@param {array} goal An array containing the x and y positions of the goal
///@param {array} map A 2D array containing our map data
function AStar(start,goal,map) {
#macro X 0
#macro Y 1
static map_width = array_length(map);
static map_height = array_length(map[X]);
static frontier_queue = ds_priority_create();
static came_from = array_create(map_width*map_height,-100);
static cost_so_far = array_create(map_width*map_height,-100);
static location = [[-1,0],[0,1],[0,-1],[1,0]];
static cost_scale = 1;
static heuristic_scale = 1;
///@func Pos(cell,map_height);
///@desc Returns the position of the cell as a single integer
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {int} _map_height The height of the map (unfortunately we have to pass this in as it is a static variable)
static Pos = function(cell,_map_height) {
return (cell[X]*_map_height+cell[Y]);
}
///@func OutOfBounds(cell,map_width,map_height);
///@desc Checks whether cell coordinates are out of bounds
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {int} map_width The width of the map
///@param {int} map_height The height of the map
static OutOfBounds = function(cell,_map_width,_map_height) {
if (cell[X] < 0 || cell[Y] < 0 || cell[X] >= _map_width || cell[Y] >= _map_height || map[cell[X]][cell[Y]] == SOLID) {
return true;
}
return false;
}
///@func Cost(cell,cost_scale);
///@desc Returns the "movement cost" of the cell we are checking, with a scaler option for optimisation purposes
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {real} cost_scale The scaler for the cost, this lets us optimise how much the algorithm will try to avoid costly areas (unfortunately we have to pass this in as it is a static variable)
static Cost = function(cell,_cost_scale) {
var tile = map[cell[X]][cell[Y]];
var cost = 0;
switch (tile) {
case TREE:
cost = 10;
break;
case ROAD:
cost = 1;
break;
case EMPTY:
cost = 3;
break;
case WATER:
cost = 5;
break;
}
return 1+_cost_scale*(cost-1);
}
///@func Manhattan(cell,cell2,heuristic_scale);
///@desc This returns the manhattan distance between cell and cell2, with a scaler option for optimisation purposes
///@param {array} cell An array containing the coordinates for the first cell
///@param {array} cell2 An array containing the coordinates for the second cell
///@param {real} heuristic_scale This lets us influence the A* to act more like Greedy Best-First or Dijkstras, useful for optimisation purposes
static Manhattan = function(cell,cell2,_heuristic_scale) {
return _heuristic_scale*(abs(cell[X]-cell2[X])+abs(cell[Y]-cell2[Y]));
}
if (map[start[X]][start[Y]] == SOLID || map[goal[X]][goal[Y]] == SOLID) {
return true_path = [];
}
ds_priority_clear(frontier_queue);
for (var i=0;i<map_width*map_height;i++) {
came_from[i] = -100;
cost_so_far[i] = -100;
}
var start_key = Pos(start,map_height);
ds_priority_add(frontier_queue,start,0);
came_from[start_key] = start;
cost_so_far[start_key] = 0;
var target_reached = false; // Set target_reached to false to begin with so we can either update it later if we find the target or leave it as false if we don't
while (ds_priority_size(frontier_queue) > 0) {
// The loop continues as long as frontier_queue has more than 0 entries
var current_cell = ds_priority_delete_min(frontier_queue);
// Grab the current cell from the lowest priority position of the priority cell
var current_key = Pos(current_cell,map_height);
// Get the "key" which is the single integer we condense the x and y coordinates of the cell too
if (array_equals(current_cell,goal)) {
// Break out of the loop if the current cell is equal to the goal cell, marking target_reached as true, so we can handle the "unreachable target" scenario
target_reached = true;
break;
}
// Now we start looping through the cardinal directions and grabbing the neighbour cells
for (var i=0;i<4;i++) {
var nx = current_cell[X]+location[i][X];
var ny = current_cell[Y]+location[i][Y];
// We use the location matrix and the i variable of the for loop to find the neighbouring cells coordinates, using the current cells coordinates
if (OutOfBounds(nx,ny,map_width,map_height)) {
// If the neighbouring cell is out of bounds, we increment the for loop to move on to the next neighbour, ignoring the rest of the code with a continue
continue;
}
var neighbour_cell = [nx,ny];
// We store the neighbour cell coordinates in an array we call next
var new_cost = cost_so_far[current_key]+Cost(neighbour_cell,cost_scale);
// We calculate the new cost for the neighbouring cell
var neighbour_key = Pos(neighbour_cell,map_height);
// We get the neighbouring cells condensed integer to be used in the came_from and cost_so_far array and store it in a variable called new_key
if (cost_so_far[neighbour_key] == -100 || new_cost < cost_so_far[neighbour_key]) {
// Check to see if we need to update the stored data for the neighbouring cell
cost_so_far[neighbour_key] = new_cost;
// If we do need to update, first we update the entry in cost_so_far with the new cost
var priority = new_cost+Manhattan(neighbour_cell,goal,heuristic_scale);
// Then we figure out the priority of the neighbouring cell storing it in a variable named "priority"
ds_priority_add(frontier_queue,neighbour_cell,priority);
// Add the neighbouring cell to the frontier priority queue, with the priority we just calculated
came_from[neighbour_key] = current_cell;
// Finally, update the neighbouring cells entry in the came_from array to point towards the current cell
}
}
}
if (target_reached) {
var reversed_path = [];
var current_cell = goal;
while (!array_equals(current_cell,start)) {
array_push(reversed_path,current_cell);
var current_key = Pos(current_cell);
var current_cell = came_from[current_key];
}
array_push(reversed_path,start);
var true_path = [];
var len = array_length(reversed_path)-1;
for (var i=len;i>=0;i--) {
true_path[len-i] = reversed_path[i];
}
}
else {
return true_path = [];
}
return true_path;
}
If we wanted to actually run the function from inside an instance, where we wanted to find a path from the instances position to a position where the mouse was clicked. Firstly we would need to have a variable pointing to the map array, and then we would do this:
if (mouse_check_button_released(mb_left) {
var x1 = x div CELLSIZE;
var y1 = y div CELLSIZE;
var x2 = mouse_x div CELLSIZE;
var y2 = mouse_y div CELLSIZE;
my_path = AStar([x1,y1],[x2,y2],map);
}
There’s a few things to note here. We’re converting the pixel coordinates of the instance and mouse to grid coordinates by using the keyword div
(which means divide by but ignores the remainder, unlike /
) followed by the CELLSIZE
macro for our map grid. Then we also need each set of coordinates to be inside an array, which is why we have [x1,y1]
and [x2,y2]
, instead of just the plain coordinates. After A* has done it’s thing, my_path will be an array holding the grid coordinates for the path from x1,y1 to x2,y2 (unless a path couldn’t be found, in which case it would hold an empty array).
Now let’s move on to a discussion on the optimisation techniques that we have been ignoring so far.
PRAY WE DON’T ALTER IT FURTHER
We have altered the vanilla A* recipe with two new additions. These additions revolve around the Manhattan()
method (usually known as the “heuristic” within A*) and the Cost()
method, which returns the movement cost of our tiles. Let’s look over the Cost()
method first.
///@func Cost(cell,cost_scale);
///@desc Returns the "movement cost" of the cell we are checking, with a scaler option for optimisation purposes
///@param {array} cell An array containing the coordinates of the cell we are looking at
///@param {real} cost_scale The scaler for the cost, this lets us optimise how much the algorithm will try to avoid costly areas (unfortunately we have to pass this in as it is a static variable)
static Cost = function(cell,_cost_scale) {
var tile = map[cell[X]][cell[Y]];
var cost = 0;
switch (tile) {
case TREE:
cost = 10;
break;
case ROAD:
cost = 1;
break;
case EMPTY:
cost = 3;
break;
case WATER:
cost = 5;
break;
}
return 1+_cost_scale*(cost-1);
}
The part we ignored previously was the _cost_scale
variable. Before we begin, I will let you know, this variable should be clamped between 0 and 1, as we can think of it as a kind of like a percentage. Let’s play around with some maths and see what this does to the cost of our tiles. For this example, our tile will be equal to the WATER
macro, so it has a cost of 5
and we will set _cost_scale
to 1
:
1+1*(5-1) = 5
Ok, so we can see that the _cost_scale
will not alter our base tile cost of 5
if we have it set to 1
. Let’s try setting it to 0.5
and see what happens:
1+0.5*(5-1) = 3
Reducing the _cost_scale
reduces the value of our tile cost, but it’s not an exact percentage, otherwise it would return 2.5
. Interesting. Let’s try it with a different tile cost by using our EMPTY
tile, which is equal to 3
.
1+0.5*(3-1) = 2
Hmm, it seems as though the lower the tile cost, the less it gets reduced by. Lets try our largest cost tile, the TREE
tile, with a cost of 10
.
1+0.5*(10-1) = 5.5
It’s starting to make sense now. This kind of “flattens” the cost, so as we reduce _cost_scale
, the costs for each tile get closer to each other, rather than getting reduced by a fixed amount. What this means for the A* algorithm is that it won’t try as hard to get around costly tiles.
In the normal A* algorithm, the cost of the tile is added to the manhattan distance, which tells us the priority of the cell. By reducing the cost per tile, we increase the amount of influence the manhattan distance has over the priority, meaning that costly tiles that are closer to the goal are more likely to be checked and picked to path over.
If we imagine a huge mountain range that is very costly to move over, then we can surmise that A* would spend a long time searching around the base of the mountain on the much less costly grass tiles, looking to see if it can find a way around the mountains to avoid their cost. By using _cost_scale
to reduce the cost of the mountains (and correspondingly not reducing the cost of the grass tiles by as much, because they have a much lower cost and we can see that costs scale down less when they are lower), we can make it more likely that A* will give up searching around the mountains quicker and instead head over them.
This reduces the number of cells that A* will search overall and leads to the algorithm running faster. Once you have an idea of your general costs and the way your map is built, you can play around with different values for _cost_scale
between 0 and 1 and it will influence how much time A* spends trying to find the lowest cost path. Finding a nice value for _cost_scale
will give you paths that are only a little bit different from paths that follow the absolute lowest cost, while reducing the number of cells searched by a lot.
We’ve got the _cost_scale
covered, so let’s look at the Manhattan()
method.
///@func Manhattan(cell,cell2,heuristic_scale);
///@desc This returns the manhattan distance between cell and cell2, with a scaler option for optimisation purposes
///@param {array} cell An array containing the coordinates for the first cell
///@param {array} cell2 An array containing the coordinates for the second cell
///@param {real} heuristic_scale This lets us influence the A* to act more like Greedy Best-First or Dijkstras, useful for optimisation purposes
static Manhattan = function(cell,cell2,_heuristic_scale) {
return _heuristic_scale*(abs(cell[0]-cell2[0])+abs(cell[1]-cell2[1]));
}
The optimisation we ignored here is the _heuristic_scale
. In much the same way that reducing the _cost_scale
of the Cost()
method reduces the influence cost has on A*’s searching patterns, reducing or increasing the _heuristic_scale
does the same thing from the opposite end. However, the _heuristic_scale
is a little bit different. It doesn’t need to be set between 0 and 1, you can give it any value (within reason).
Any value of 1 or lower will yield what is called an “admissable” A* searching pattern, but will increase the number of cells that A* will check. Admissable means that A* will find the perfect path. Increasing it beyond 1 (to 2, or 5, etc) will make an “inadmissable” A* searching pattern. Inadmissable paths won’t find the perfect path, but instead will find a range of paths from “pretty good” to “pretty bad” depending on how high you set it.
Often when people are talking about A*, they assume that inadmissable paths are a bad thing, but for 99% of games, perfect paths are overkill, as they require searching a lot of cells for only small differences in how the path gets constructed. Perhaps if you are working with some high tech AI or an self-driving car A* algorithm, you might want to always find the perfect path, but for games, “pretty good” paths serve us well enough and the reduced CPU load is a godsend.
Anyway, let’s try to understand how this scale works. Once again, let’s remember how the priority of a cell is calculated by taking the cost of the cell and the Manhattan distance and adding them together. The lower the cost and the lower the distance, the more likely that cell is to end up being picked next in the priority queue. If we set the _heuristic_scale
to 2
, for instance, we will effectively double the distance that the computer thinks the cell is from the goal
.
What this does is decrease the importance of cost and increase the importance of distance to the algorithm. If we pick a random cell and look at it’s neighbours, it’s as though the neighbours further away from the goal
have twice as high a priority and so will be picked much later whereas the neighbours closer to the goal will have twice as low priority and will be much more likely to by picked sooner. This means that if you keep increasing _heuristic_scale
, you will eventually end up at a point where A* simply searches directly along the axes of the Manhattan distance, which will be super quick, but the pathing will look ugly.
Another way of thinking of the _heuristic_scale
is that it morphs A* into either Greedy Best-First-Search or Dijkstra’s. The higher the _heuristic_scale
, the more A* acts like Greedy Best-First-Search and just tries to move directly towards the goal. The lower the _heuristic_scale
, the more A* acts like Dijkstra’s and eventually ends up as a Breadth-First-Search, where it expands in all directions and doesn’t care about targeted expansion at all.
By tweaking the value of _heuristic_scale
, you can decide how targeted A*’s searching is and thus how many cells it will check. Higher means less cells checked but worse paths and the algorithm runs faster. Lower means more cells checked but better paths and the algorithm runs slower. Once again, it’s up to you to play around and decide on a value you think is a good compromise between speed and accuracy.
If you do want to mess around with these, the place to change the values for _cost_scale
and _heuristic_scale
is when we are declaring all our variables at the start of the AStar()
function, here:
function AStar(start,goal,map) {
#macro X 0
#macro Y 1
static map_width = array_length(map);
static map_height = array_length(map[X]);
static frontier_queue = ds_priority_create();
static came_from = array_create(map_width*map_height,-100);
static cost_so_far = array_create(map_width*map_height,-100);
static location = [[-1,0],[0,1],[0,-1],[1,0]];
static cost_scale = 1; // Change them here, not in the methods when they are called
static heuristic_scale = 1;
One of the cooler things about the scalers is that you can adjust them on the fly if you want.
You can check how many spare CPU cycles you have after each step and reduce or increase the values to get better pathing when you have free cycles. Or you could dynamically adjust the values when a path is chosen that is really long, so you sacrifice accuracy for speed when having to calculate a long path. There’s plenty of options here to do optimisations with and as I said at the start, this is only the very tip of the iceberg when it comes to optimisation in A*.
TRY BEFORE YOU BUY
Now, I’ve created a GMS project that will let you see how A* works in action and also lets you alter the heuristic_scale
and the cost_scale
to see how different values effect the pathing you get. Here’s the download:
https://refreshertowelgames.files.wordpress.com/2021/01/a-star.zip
As the A* function runs, you can see what cells are waiting in the frontier
queue as an aqua blue rectangle that is a border only. The cells that have already been checked are marked as yellow rectangles and the current cell being checked is marked as an fuchsia (purple-pinkish) rectangle. I should not that the A* algorithm is constrained by an alarm here, to let you visibly see what is going on. If you implement the A* as it is written throughout this tutorial, the whole “finding the points and creating the path” process should happen pretty much instantaneously (depending on how far apart your start and end points are and what settings you have the heuristic_scale
and cost_scale
set to).
After a path has been found, it will show up on the little dialogue box (that also shows you the values you have for heuristic_scale
and cost_scale
). You can mouse over this path text to have the corresponding path highlighted on the map. This lets you see specific paths if multiple paths are overlapping. You can also drag this dialogue box around by holding the left mouse button down after you click on it and dragging it.
Left click to place a start position and then left click again to place an end position.
Right click to clear away the start and end positions and the rectangles that are drawn for the queue and stuff.
Press R to completely reset the paths.
W / S keys change the heuristic_scale
E / D keys change the cost_scale
You can add or remove the trees, road or walls by opening the room and deleting or placing the obj_tree
, obj_road
and obj_wall
objects.
Play around and try different scalings and different paths. Compare the same path multiple times with different scalings to see how the paths change. Try selecting a goal inside the boxed in wall segment to see how A* handles places it can’t reach, etc, etc. Have fun and learn.
THE NEXT STEP
Wow, what a mammoth post this was. It took me a day or two longer than I expected it to, but I finally finished it. Next post we’ll be going over room perturbation for dungeon creation and then we’ll be combining our A*, flood fill and room perturbation to create a full proc-gen dungeon!
Anyway, I’m gonna go collapse into my bed and sleep the deep sleep of the wretched. My fingers are getting sore. Ciao!
Read on with #7: To Perturb A Room (Dungeon Generation)
If you’ve got any comments, criticisms or advice don’t hesitate to comment and let me know, I’m always happy to hear feedback from readers. And if you like what I’m doing, consider supporting me through either a donation at the bottom of the page or buying my game Spaceslingers!
Sign up to get notified whenever I upload a blog post, so you don’t miss any of the next installments!
3 replies on “Procedural Generation in GMS #6: A* Is Born (Pathfinding’s Greatest Hits)”
Loving these tutorials! Just spent a couple hours reading parts 1-5 and really appreciate how in depth it is. Thank you!
LikeLiked by 1 person
No worries! Glad they were helpful!
LikeLike
It took a few days to get through this one, but I feel like I understand the algorithm now. Thank you for taking the time to write it all down and making the example project.
LikeLike