Advent of Code - Day 12

Adrien Foucart

2024-12-12

@=programming @=python @=adventofcode

[Go to index] [Advent of Code website] [My code on Gitlab]

Today is a puzzle about fencing. Not the swordfight kind of fencing though: the fence kind of fencing.

We have a map such as this one where each cell is marked by a letter:

RRRRIICCFF
RRRRIICCCF
VVRRRCCFFF
VVRCCCJFFF
VVVVCJJCFE
VVIVCCJJEE
VVIIICJJEE
MIIIIIJJEE
MIIISIJEEE
MMMISSJEEE

A region in that map is a set of positions which are connected (not diagonally) and have the same letter.

So for instance, the C in the example has two distinct regions:

......CC..
......CCC.
.....CC...
...CCC....
....C.....
....CC....
.....C....
..........
..........
..........

and

..........
..........
..........
..........
.......C..
..........
..........
..........
..........
..........

Part 1 - walking the perimeter

For the first part, we have to compute for each region its area (i.e. number of cells it covers) and its perimeter (i.e. number of cell borders which touch a cell from another region). To get the puzzle answer, we need to multiply for each region the area and the perimeter, then sum the results of all regions.

Labeling connected components and computing characteristics is a very common task in image analysis, and I immediately thought about just using scikit-image with label and regionprops. But in the spirit of the rules I set for myself, I will keep to vanilla Python solutions.

First thing first, let’s parse the inputs to get the grid as a dictionary mapping position to value (here: the letter). I also store the gridsize. I keep positions as namedtuple, as I’ve done in previous puzzles.

Position = namedtuple("Position", ["row", "column"])

def parse_input() -> tuple[dict[Position, str], 
                           Position]:
    with open("inputs/day12", 'r') as fp:
        data = fp.readlines()
    data [[c for c in row.strip()] for row in data if len(row) != 0]

    n_rows = len(data)
    n_cols = len(data[0])
    
    grid = {}

    for r, row in enumerate(data):
        for c, v in enumerate(row):
            p = Position(r, c)
            grid[p] = v
            
    return grid, Position(n_rows, n_cols)

Then we have to find the connected components. This is not complicated per se, but I still ended up with some spaghetti code that I’m not particularly proud of.

The gist of it is to give a uid to each distinct region:

  1. Go through the whole grid from top to bottom and left to right.
  2. For each cell:
    1. If we have a cell with the same value above or to the left, we copy their uid.
    2. If neither the cell above nor to the left have the same value as the current cell, we generate a new uid.
    3. If both the cell above and to the left have the same value as our cell but they are different values, we use the uid from the left and we merge the two conflicting uids.

At the end of the process, we have a new grid with uids of connected regions instead of the original letters. We also keep track of a list of positions for each uid, so it will be easier to browser per region later.

To make our grid navigation a bit easier, I also added a few helper lambda functions.

up = lambda p : Position(p.row-1, p.column)
down = lambda p : Position(p.row+1, p.column)
left = lambda p : Position(p.row, p.column-1)
right = lambda p : Position(p.row, p.column+1)

def connected_components(grid: dict[Position, str],
                         gridsize: Position) -> tuple[dict[Position, int],
                                                      dict[int, list[Position]]]:
    uids = {}
    pos_per_uids = defaultdict(list)
    uid_gen = uid_generator()

    for r in range(gridsize.row):
        for c in range(gridsize.column):
            p = Position(r, c)
            # v = letter of the cell
            v: str = grid[p]

            # since we go top->down and left->right, we only worry about what's left and up
            uid_up = uid_left = None
            if (n := grid.get(up(p))) is not None and n == v:
                uid_up = uids[up(p)]
            if (n := grid.get(left(p))) is not None and n == v:
                uid_left = uids[left(p)]
            
            # nothing up or left
            if (uid_up is None and uid_left is None):
                uids[p] = next(uid_gen)
                pos_per_uids[uids[p]].append(p)
            else:
                # up or left are both with the same letter as our cell...
                if uid_up is not None and uid_left is not None:
                    # ...and they have the same uid: great!
                    if uid_up == uid_left:
                        uids[p] = uid_up
                        pos_per_uids[uid_up].append(p)
                    else:
                        # ...they were given different uid, so we MERGE them.
                        uids[p] = uid_up
                        pos_per_uids[uid_up].append(p)
                        while len(pos_per_uids[uid_left]) > 0:
                            n = pos_per_uids[uid_left].pop()
                            uids[n] = uid_up
                            pos_per_uids[uid_up].append(n)
                        del pos_per_uids[uid_left]
                # only up is set
                elif uid_up is not None:
                    uids[p] = uid_up
                    pos_per_uids[uid_up].append(p)
                # only left is set
                else:
                    uids[p] = uid_left
                    pos_per_uids[uid_left].append(p)
    
    return uids, pos_per_uids

Phew, that’s a big function. Too big for sure, but I don’t have the time or energy to refactor: it works fine.

With our list of positions per uid, computing areas and perimeters is easy enough: the area is the length of the list, and the perimeter is the sum, for all positions in the list, of the number of neighbours that are not in the list.

neighbours = lambda p : [up(p), right(p), down(p), left(p)]

def get_areas_and_perimeters(pos_per_uids: dict[int, list[Position]]) -> list[tuple[int, int]]:
    areas_and_perimeters = []
    for _, positions in pos_per_uids.items():
        area = len(positions)
        perimeter = 0
        for p in positions:
            perimeter += sum(n not in positions for n in neighbours(p))
        areas_and_perimeters.append((area, perimeter))
    return areas_and_perimeters

Putting all together, we get our first answer:

grid, gridsize = parse_input()
uids, pos_per_uids = connected_components(grid, gridsize)
areas_and_perimeters = get_areas_and_perimeters(pos_per_uids)
print(f"Part 1 answer: {sum(a*p for a, p in areas_and_perimeters)}")

Part 2 - what makes a side a side?

The twist is a small but important change in the definition of a border: we no longer consider the borders per-cell, but the sides per-region. So if we have an object like this:

....
.OO..
..O..
..O..
.....

In part one we would have counted 10 borders, but now we need to count 6 sides.

So… what makes a side a side?

I considered a few options, such as walking the perimeter and counting changes of direction, or defining a set of corner patterns and counting corners… but I ultimately decided on a method that I found easier to implement.

A “left side” is an uninterrupted vertical section where the left neighbour is not in the object. So, for a given column, we can extract all the “cells on a left side” from the list of positions in a region with:

left_sides = [p for p in positions if p.column == col and left(p) not in positions]

And then, assuming the positions are sorted top-down and left-right (this is foreshadowing), we can count the number of uninterrupted sections:

def count_sides_with_gaps(items: list[Position]) -> int:
    if len(items) == 0:
        return 0
    if len(items) == 1:
        return 1
    sides = 1
    for p in items[:-1]:
        if down(p) not in items: # there is a gap -> there is a side
            sides += 1
    return sides

Now we have to do that for every column of the object for the left and right sides, and for every row for the tops and bottoms:

def count_sides_with_gaps(items: list[Position], direction: Callable) -> int:
    if len(items) == 0:
        return 0
    if len(items) == 1:
        return 1
    sides = 1
    for p in items[:-1]:
        if direction(p) not in items: # there is a gap -> there is a side
            sides += 1
    return sides

def get_areas_and_sides(pos_per_uids: dict[int, list[Position]]) -> list[tuple[int, int]]:
    areas_and_sides = []
    for _, positions in pos_per_uids.items():
        area = len(positions)
        positions = sorted(positions)   # make sure we go top->down left->right
        # find minrow, mincol, maxrow, maxcol of region
        minrow = mincol = 10000
        maxrow = maxcol = 0
        for p in positions:
            minrow = min(minrow, p.row)
            maxrow = max(maxrow, p.row)
            mincol = min(mincol, p.column)
            maxcol = max(maxcol, p.column)
        sides = 0
        # count vertical sides
        for col in range(mincol, maxcol+1):
            left_sides = [p for p in positions if p.column == col and left(p) not in positions]
            sides += count_sides_with_gaps(left_sides, down)
            right_sides = [p for p in positions if p.column == col and right(p) not in positions]
            sides += count_sides_with_gaps(right_sides, down)
        # count horizontal sides
        for row in range(minrow, maxrow+1):
            top_sides = [p for p in positions if p.row == row and up(p) not in positions]
            sides += count_sides_with_gaps(top_sides, right)
            bottom_sides = [p for p in positions if p.row == row and down(p) not in positions]
            sides += count_sides_with_gaps(bottom_sides, right)
        areas_and_sides.append((area, sides))

    return areas_and_sides

I initially didn’t put the positions = sorted(positions) in there, because I had computed the list of positions in the right order in connected_components – or so I thought. I didn’t immediately realize that it wasn’t necessarily the case when I had merged two connected regions. I had to fire up the debugger to get to the bottom of that one. Fortunately it was an easy fix, as tuples are sorted by their first value, then second, etc. by default, which is exactly what we need here.

With all of that, we can finally get our part 2 answer:

areas_and_sides = get_areas_and_sides(pos_per_uids)
print(f"Part 2 answer: {sum(a*s for a, s in areas_and_sides)}")

Done!

Conclusions

The puzzles are clearly becoming more complex. Which means they can no longer be solved in a short half-hour morning or evening session as I tended to do in the beginning. So I’m not sure that I’ll keep doing all of them going forward, as I occasionally like to sleep (when the 7 month-olds allows it). We’ll see!

Bonus: connected components with scikit-image

scikit-image is a great library for image processing. I knew I could do the “connected components” part of the puzzle very easily if I used it. How easily? Here’s the code to parse the input data and return a connected components array of uids:

from skimage.measure import label

with open("inputs/day12", 'r') as fp:
    data = fp.readlines()
data = np.array([[c for c in row.strip()] for row in data if len(row) != 0])
keys = np.unique(data)
grid = np.zeros(data.shape, dtype=int)
for key, v in zip(keys, range(1, len(keys)+1)):
    grid[data==key] = v
labels = label(grid)

The part that does everything in my connected_components method is the last line. It is a little bit shorter, I’ll admit.