Interlude 1 - Small steps

Advent of Code 2015

Adrien Foucart

2026-02-26

[Back to index]

After day 1 and day 2, I’m already feeling a lot more comfortable with the language. Before getting further, however, I think it’s worth making sure that I tidy up things a little bit and start building my “toolbox” for solving the rest of the puzzles. The main thing that I’m looking at is the “parsing” functions. There’s a lot of parsing going in in Advent of Code puzzles, so I think I want to extract parsing methods into a parser.c file. It’ll be a good opportunity to test that I really understand how strtok works, to avoid any mistakes in my string-handling.

Parse lines

Let’s start from the parse_lines method:

str_list_t* parse_lines(char* str){
    str_list_t *gifts = malloc(sizeof(str_list_t));
    gifts->n = strcount(str, '\n');
    gifts->str_list = malloc(sizeof(char*)*gifts->n);

    char *buffer = strtok(str, "\n");
    int i = 0;
    while(buffer){
        gifts->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
        strcpy(gifts->str_list[i], buffer);
        buffer = strtok(NULL, "\n");
        i++;
    }

    return gifts;
}

I can change the gift variable name to something more generic and put it in parser.c, alongside strcount, the function which counts the number of occurrences of a character in a string. Meanwhile, parser.h will hold the typedef for str_list_t.

// parser.h
typedef struct {
    size_t n;
    char** str_list;
} str_list_t;

str_list_t* parse_lines(char* str);
// parser.c
#include <stdlib.h>
#include <string.h>
#include "parser.h"

int strcount(char* str, char c){
    int n = 0;
    for (int i=0; i < strlen(str); i++)
        if (str[i] == c)
            n++;
    return n;
}

str_list_t* parse_lines(char* str){
    str_list_t *str_list = malloc(sizeof(str_list_t));
    str_list->n = strcount(str, '\n');
    str_list->str_list = malloc(sizeof(char*)*str_list->n);

    char *buffer = strtok(str, "\n");
    int i = 0;
    while(buffer){
        str_list->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
        strcpy(str_list->str_list[i], buffer);
        buffer = strtok(NULL, "\n");
        i++;
    }

    return str_list;
}

Now in day2.c I can include parser.h and remove those functions, so that I only keep the more puzzle-specific stuff inside.

There is one thing that I wasn’t fully understanding while writing this code, and that is the need for a +1 in the malloc here:

str_list->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
strcpy(str_list->str_list[i], buffer);

I thought that buffer, being filled by strtok, would include a terminating null character \0. It does. What I was missing is that strlen, logically, does not include the \0 in its count. So if I want the null character in str_list[i], I do need to allocate strlen(buffer)+1 characters.

Parse ints

Now I would very much like to do the same with my parse_dimensions method, which could be generalized to parse a string into ints using a separating character:

gift_t* parse_dimensions(char* str){
    char *tok_start = str;
    char *buffer = strtok(tok_start, "x");
    int ndim = 0;
    int dims[3];
    while (buffer){
        dims[ndim++] = atoi(buffer);
        buffer = strtok(NULL, "x");
    }

    gift_t* gift = malloc(sizeof(gift_t));
    gift->l = dims[0];
    gift->w = dims[1];
    gift->h = dims[2];

    return gift;
}

This require a bit more change, though, as we also need to generalize the return type to any number of ints. We can use the same format as for the str_list_t and, in parser.h, write:

// parser.h
typedef struct {
    int* int_list;
    int n;
} int_list_t;

Now we can write a parse_ints method from parse_dimensions. strtok requires a null-terminated string for the “separating character”, but our strcount function requires a char. It makes more sense to me to keep a char for parse_ints, at least at the moment, so I’ll create a temporary null-terminated string with char *sep_s = "" + sep to use in strtok.

// parser.c
int_list_t* parse_ints(char* str, char sep){
    char *tok_start = str;
    char *sep_s = "" + sep;

    int_list_t* int_list = malloc(sizeof(int_list_t));
    int_list->n = strcount(str, sep);
    int_list->int_list = malloc(sizeof(int)*int_list->n);

    int i = 0;
    char *buffer = strtok(tok_start, sep_s);
    while (buffer){
        int_list->int_list[i++] = atoi(buffer);
        buffer = strtok(NULL, sep_s);
    }

    return int_list;
}

In day2.c, we can try to use this new method in parse_dimensions. We should get 3 ints that we can convert to our l, w, h dimensions:

// day2.c
gift_t* parse_dimensions(char* str){
    int_list_t* int_list = parse_ints(str, 'x');
    if (int_list->n != 3){
        printf("ERROR: parse_ints should return 3 values, received %d!\n", int_list->n);
    }
    gift_t* gift = malloc(sizeof(gift_t));
    gift->l = int_list->int_list[0];
    gift->w = int_list->int_list[1];
    gift->h = int_list->int_list[2];

    return gift;
}

Everything compiles fine, but the dimensions are incorrect: we only receive 2 values. That’s weird, I didn’t think I had changed anything in the parsing itself.

It’s time to start writing tests, because I don’t like the way I’m currently debugging.

I create a test_parser.c file with first code to test parse_lines, which seems to be working as intended – but maybe the test will say otherwise.

// test_parser.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parser.h"

bool test_parse_lines(){
    char *test_s3 = "this text\nshould be parse\ninto 3 lines";
    char *test_s4 = "this text\nshould be parse\ninto 4 lines\n";

    str_list_t* str_list3 = parse_lines(test_s3);
    str_list_t* str_list4 = parse_lines(test_s4);

    return (str_list3->n == 3 && str_list4->n == 4);
}

int main(){
    printf("Starting tests for parser.c\n");
    if (test_parse_lines())
        printf("test_parse_lines OK\n");
    else
        printf("test_parse_lines FAIL\n");

    exit(0);
}

Running it and…

Starting tests for parser.c
Segmentation fault

Yay, first Segmentation fault! What did I mess up? First, let’s split the test just to see if both calls to parse_lines do it and to be more unit-test-like:

// test_parser.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parser.h"

bool test_parse_lines_normal(){
    char *test_s3 = "this text\nshould be parse\ninto 3 lines";
    str_list_t* str_list3 = parse_lines(test_s3);
    return (str_list3->n == 3);
}

bool test_parse_lines_empty_line(){
    char *test_s4 = "this text\nshould be parse\ninto 4 lines\n";
    str_list_t* str_list4 = parse_lines(test_s4);
    return (str_list4->n == 4);
}

int main(){
    printf("Starting tests for parser.c\n");
    if (test_parse_lines_normal())
        printf("test_parse_lines_normal OK\n");
    else
        printf("test_parse_lines_normal FAIL\n");

    if (test_parse_lines_empty_line())
        printf("test_parse_lines_empty_line OK\n");
    else
        printf("test_parse_lines_empty_line FAIL\n");

    exit(0);
}

Yes, still Segmentation fault.

After some printf debugging, I get two things: first, I forgot that my parse_lines method ignores the last line, because the input data always has an extra line. And I guess that must be what’s happening in the parse_ints method as well, but now I don’t understand why it worked before, so I’ll still need to investigate a little bit.

Second: it’s the first call to strtok that triggers the segmentation fault. Does the initialisation with char *str = "some string" not produce a null-terminated string?

Thanks to StackOverflow, I realize that indeed it does not. I need to do char str[] = "some string" instead. Noted.

Changing also the test so that it captures the current behaviour ignoring the last line, and we have it working. For parse_ints, we’ll also need now to change the initialization of the string separator. This time it’s W3Schools helping me out with a good way of doing it:

// parser.c
int_list_t* parse_ints(char* str, char sep){
    char *tok_start = str;
    char sep_s[] = {sep, '\0'};
    // ...
}

Now adding tests for parse_ints. This time I want to capture the intended behaviour, which I now will fail to begin with:

// test_parser.c
bool test_parse_ints_returns_n(){
    char test[] = "1x12x6";
    int_list_t* int_list = parse_ints(test, 'x');
    return (int_list->n == 3);
}

bool test_parse_ints_returns_correct_ints(){
    char test[] = "1x12x6";
    int_list_t* int_list = parse_ints(test, 'x');
    
    return (int_list->int_list[0] == 1 && 
            int_list->int_list[1] == 12 && 
            int_list->int_list[2] == 6);
}

int main(){
    printf("Starting tests for parser.c\n");
    // ...

    if (test_parse_ints_returns_n()){
        printf("test_parse_ints_returns_n OK\n");
        if (test_parse_ints_returns_correct_ints())
            printf("test_parse_ints_returns_correct_ints OK\n");
        else
            printf("test_parse_ints_returns_correct_ints FAIL\n");
    }
    else
        printf("test_parse_ints_returns_n FAIL\n");
    exit(0);
}

As expected, the first test fails, I only get two integers instead of three. As a sanity test before moving on, I copied my original parse_dimensions into the test file and check that I get the right result with my test string: it passes. So what’s different?

Let’s put the two versions side by side.

gift_t* parse_dimensions(char* str){
    char *tok_start = str;
    char *buffer = strtok(tok_start, "x");
    int ndim = 0;
    int dims[3];
    while (buffer){
        dims[ndim++] = atoi(buffer);
        buffer = strtok(NULL, "x");
    }

    gift_t* gift = malloc(sizeof(gift_t));
    gift->l = dims[0];
    gift->w = dims[1];
    gift->h = dims[2];

    return gift;
}
int_list_t* parse_ints(char* str, char sep){
    char *tok_start = str;
    char sep_s[] = {sep, '\0'};

    int_list_t* int_list = malloc(sizeof(int_list_t));
    int_list->n = strcount(str, sep);
    int_list->int_list = malloc(sizeof(int)*int_list->n);

    int i = 0;
    char *buffer = strtok(tok_start, sep_s);
    while (buffer){
        int_list->int_list[i++] = atoi(buffer);
        buffer = strtok(NULL, sep_s);
    }

    return int_list;
}

Wait, I just screwed up the int_list->n computation: now I am taking the last one, so it’s the number of separators +1.

test_parse_ints_returns_n OK
test_parse_ints_returns_correct_ints OK

Great.

Recompiling day 2 and running it gives me the correct results.

From parse_lines to parse_strings

I realize now that I’m not actually ignoring the last line in parse_lines. I’m storing the empty line, but I’m just not counting it in str_list->n. I should. And while I’m there, I should generalize this parse_lines into a parse_strings with the same behaviour as parse_ints, so that I can later split by something else than a \n.

str_list_t* parse_strings(char* str, char sep){
    char *tok_start = str;
    char sep_s[] = {sep, '\0'};
    
    str_list_t *str_list = malloc(sizeof(str_list_t));
    str_list->n = strcount(str, sep)+1;
    str_list->str_list = malloc(sizeof(char*)*str_list->n);

    char *buffer = strtok(str, sep_s);
    int i = 0;
    while(buffer){
        str_list->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
        strcpy(str_list->str_list[i], buffer);
        buffer = strtok(NULL, sep_s);
        i++;
    }

    return str_list;
}

And to begin with, I’ll change parse_lines to use that function to check that I haven’t broken anything. Don’t forget to remove 1 from the count:

str_list_t* parse_lines(char* str){
    str_list_t *str_list = parse_strings(str, '\n');
    str_list->n--;

    return str_list;
}

Running the tests, everything is fine. Running day2.c as well. Now let’s make parse_strings available in parser.h and remove parse_lines.

I need to change my tests to use the new function, and to change the intended behaviour: now I want the correct line count!

Also changing the call in day2, I now get a Segmentation fault, I’m guessing because I try to parse an empty line. I don’t want to assume that there will be one, so let’s add a check. Problem: strlen also breaks here. I’m not certain what’s in that last str_list[i]. Is it just the \0 character, and is such a string not valid with strlen?

Creating a new test to understand what’s going on, with some debugging info:

bool test_parse_strings_empty_line_is_null(){
    char test[] = "last empty line\n";
    str_list_t* str_list = parse_strings(test, '\n');
    printf("First character: %c\n", str_list->str_list[1][0]);
    printf("Number of characters: %d\n", strlen(str_list->str_list[1]));
    printf("Full string: %s\n", str_list->str_list[1]);

    return (str_list->str_list[1][0] == '\0');
}

The test fails. I get:

First character: 0
Number of characters: 6
Full string: 0p#BÛ☺

Yet when I add some debug info to the parse_strings function to see what is being allocated, I only get one malloc for the 16 characters of the first line "last empty line", and no further passage through the while(buffer) loop. Uh. Also, here it doesn’t crash in my test.

Let’s try to formulate an hypothesis of what’s going on: parse_strings counts the number of \n and allocates a char* for the empty line, so in my test str_list->n is 2 and str_list->str_list is an array of char* of size 2. Then strtok ignores the last empty line, as while(buffer) if False for an empty string. No memory is therefore allocated for that last line. When I try to read it, in my test file I always get a random mix of characters, but somehow there’s always a valid 6 characters string in there that strlen and printf("%s") can understand. In my day2.c file, which executes more things, accessing that part of the memory causes a crash.

Now how can I get a predictable behaviour? I can try adding something to parse_strings so that, if I haven’t filled the whole str_list array, I fill it with empty strings:

// parser.c
str_list_t* parse_strings(char* str, char sep){
    // ...
    int i = 0;
    while(buffer){
        str_list->str_list[i] = malloc(sizeof(char)*(strlen(buffer)+1));
        strcpy(str_list->str_list[i], buffer);
        buffer = strtok(NULL, sep_s);
        i++;
    }
    while (i < str_list->n){
        str_list->str_list[i] = malloc(sizeof(char)*1);
        strcpy(str_list->str_list[i], "");
        i++;
    }

    return str_list;

After that, my new test succeeds, which is a good sign. And day2.c fails, but in a good way: ERROR: parse_ints should return 3 values!, received 1. So now I can add the strlen check.

// day2.c
int main(){
    // ...
    for (int i=0; i < gifts->n; i++){
        if (strlen(gifts->str_list[i]) == 0)
            continue;
        // ...
    }
    // ...
}

And we are finally good again.

I strongly suspect that I’m going to run into problems if I get an input with an empty line somewhere in the middle, but I’m going to wait until that happens to deal with it.

Conclusions

This was a lot of work to not solve any new puzzle. But I understand the code a lot better, and my toolbox is getting better, so it was certainly not wasted time.

I’ve been really enjoying Ron Jeffries’ blog lately, and his small steps, test-driven approach to coding and refactoring. I’m starting to really look at it as a sort of antidote to vibe coding. It pushes you towards developing a much deeper understanding of what your code is doing and why, and that in turns makes the code a lot easier to maintain and grow in the long term.

Hopefully.

Anyway, there will be more refactoring to come, but let’s maybe try the next puzzle now!

Code after first interlude