Advent of Code 2015
2026-02-26
After day 1 and day
2, I’m already feeling a lot more comfortable with the language.
Before getting further, however, I think it’s worth making sure that I
tidy up things a little bit and start building my “toolbox” for solving
the rest of the puzzles. The main thing that I’m looking at is the
“parsing” functions. There’s a lot of parsing going in in Advent of Code
puzzles, so I think I want to extract parsing methods into a
parser.c file. It’ll be a good opportunity to test that I
really understand how strtok
works, to avoid any mistakes in my string-handling.
Let’s start from the parse_lines method:
str_list_t* parse_lines(char* str){
str_list_t *gifts = malloc(sizeof(str_list_t));
gifts->n = strcount(str, '\n');
gifts->str_list = malloc(sizeof(char*)*gifts->n);
char *buffer = strtok(str, "\n");
int i = 0;
while(buffer){
gifts->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
strcpy(gifts->str_list[i], buffer);
buffer = strtok(NULL, "\n");
i++;
}
return gifts;
}I can change the gift variable name to something more
generic and put it in parser.c, alongside
strcount, the function which counts the number of
occurrences of a character in a string. Meanwhile, parser.h
will hold the typedef for str_list_t.
// parser.h
typedef struct {
size_t n;
char** str_list;
} str_list_t;
str_list_t* parse_lines(char* str);// parser.c
#include <stdlib.h>
#include <string.h>
#include "parser.h"
int strcount(char* str, char c){
int n = 0;
for (int i=0; i < strlen(str); i++)
if (str[i] == c)
n++;
return n;
}
str_list_t* parse_lines(char* str){
str_list_t *str_list = malloc(sizeof(str_list_t));
str_list->n = strcount(str, '\n');
str_list->str_list = malloc(sizeof(char*)*str_list->n);
char *buffer = strtok(str, "\n");
int i = 0;
while(buffer){
str_list->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
strcpy(str_list->str_list[i], buffer);
buffer = strtok(NULL, "\n");
i++;
}
return str_list;
}Now in day2.c I can include parser.h and
remove those functions, so that I only keep the more puzzle-specific
stuff inside.
There is one thing that I wasn’t fully understanding while writing
this code, and that is the need for a +1 in the
malloc here:
str_list->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
strcpy(str_list->str_list[i], buffer);I thought that buffer, being filled by
strtok, would include a terminating null character
\0. It does. What I was missing is that strlen,
logically, does not include the \0 in its count.
So if I want the null character in str_list[i], I do need
to allocate strlen(buffer)+1 characters.
Now I would very much like to do the same with my
parse_dimensions method, which could be generalized to
parse a string into ints using a separating character:
gift_t* parse_dimensions(char* str){
char *tok_start = str;
char *buffer = strtok(tok_start, "x");
int ndim = 0;
int dims[3];
while (buffer){
dims[ndim++] = atoi(buffer);
buffer = strtok(NULL, "x");
}
gift_t* gift = malloc(sizeof(gift_t));
gift->l = dims[0];
gift->w = dims[1];
gift->h = dims[2];
return gift;
}This require a bit more change, though, as we also need to generalize
the return type to any number of ints. We can use the same
format as for the str_list_t and, in parser.h,
write:
// parser.h
typedef struct {
int* int_list;
int n;
} int_list_t;Now we can write a parse_ints method from
parse_dimensions. strtok requires a
null-terminated string for the “separating character”, but our
strcount function requires a char. It makes
more sense to me to keep a char for
parse_ints, at least at the moment, so I’ll create a
temporary null-terminated string with
char *sep_s = "" + sep to use in strtok.
// parser.c
int_list_t* parse_ints(char* str, char sep){
char *tok_start = str;
char *sep_s = "" + sep;
int_list_t* int_list = malloc(sizeof(int_list_t));
int_list->n = strcount(str, sep);
int_list->int_list = malloc(sizeof(int)*int_list->n);
int i = 0;
char *buffer = strtok(tok_start, sep_s);
while (buffer){
int_list->int_list[i++] = atoi(buffer);
buffer = strtok(NULL, sep_s);
}
return int_list;
}In day2.c, we can try to use this new method in
parse_dimensions. We should get 3 ints that we
can convert to our l, w, h dimensions:
// day2.c
gift_t* parse_dimensions(char* str){
int_list_t* int_list = parse_ints(str, 'x');
if (int_list->n != 3){
printf("ERROR: parse_ints should return 3 values, received %d!\n", int_list->n);
}
gift_t* gift = malloc(sizeof(gift_t));
gift->l = int_list->int_list[0];
gift->w = int_list->int_list[1];
gift->h = int_list->int_list[2];
return gift;
}Everything compiles fine, but the dimensions are incorrect: we only receive 2 values. That’s weird, I didn’t think I had changed anything in the parsing itself.
It’s time to start writing tests, because I don’t like the way I’m currently debugging.
I create a test_parser.c file with first code to test
parse_lines, which seems to be working as intended – but
maybe the test will say otherwise.
// test_parser.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parser.h"
bool test_parse_lines(){
char *test_s3 = "this text\nshould be parse\ninto 3 lines";
char *test_s4 = "this text\nshould be parse\ninto 4 lines\n";
str_list_t* str_list3 = parse_lines(test_s3);
str_list_t* str_list4 = parse_lines(test_s4);
return (str_list3->n == 3 && str_list4->n == 4);
}
int main(){
printf("Starting tests for parser.c\n");
if (test_parse_lines())
printf("test_parse_lines OK\n");
else
printf("test_parse_lines FAIL\n");
exit(0);
}Running it and…
Starting tests for parser.c
Segmentation fault
Yay, first Segmentation fault! What did I mess up? First,
let’s split the test just to see if both calls to
parse_lines do it and to be more unit-test-like:
// test_parser.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parser.h"
bool test_parse_lines_normal(){
char *test_s3 = "this text\nshould be parse\ninto 3 lines";
str_list_t* str_list3 = parse_lines(test_s3);
return (str_list3->n == 3);
}
bool test_parse_lines_empty_line(){
char *test_s4 = "this text\nshould be parse\ninto 4 lines\n";
str_list_t* str_list4 = parse_lines(test_s4);
return (str_list4->n == 4);
}
int main(){
printf("Starting tests for parser.c\n");
if (test_parse_lines_normal())
printf("test_parse_lines_normal OK\n");
else
printf("test_parse_lines_normal FAIL\n");
if (test_parse_lines_empty_line())
printf("test_parse_lines_empty_line OK\n");
else
printf("test_parse_lines_empty_line FAIL\n");
exit(0);
}Yes, still Segmentation fault.
After some printf debugging, I get two things: first, I
forgot that my parse_lines method ignores the last line,
because the input data always has an extra line. And I guess that must
be what’s happening in the parse_ints method as well, but
now I don’t understand why it worked before, so I’ll still need to
investigate a little bit.
Second: it’s the first call to strtok that triggers the
segmentation fault. Does the initialisation with
char *str = "some string" not produce a null-terminated
string?
Thanks to StackOverflow,
I realize that indeed it does not. I need to do
char str[] = "some string" instead. Noted.
Changing also the test so that it captures the current behaviour
ignoring the last line, and we have it working. For
parse_ints, we’ll also need now to change the
initialization of the string separator. This time it’s W3Schools helping
me out with a good way of doing it:
// parser.c
int_list_t* parse_ints(char* str, char sep){
char *tok_start = str;
char sep_s[] = {sep, '\0'};
// ...
}Now adding tests for parse_ints. This time I want to
capture the intended behaviour, which I now will fail to begin
with:
// test_parser.c
bool test_parse_ints_returns_n(){
char test[] = "1x12x6";
int_list_t* int_list = parse_ints(test, 'x');
return (int_list->n == 3);
}
bool test_parse_ints_returns_correct_ints(){
char test[] = "1x12x6";
int_list_t* int_list = parse_ints(test, 'x');
return (int_list->int_list[0] == 1 &&
int_list->int_list[1] == 12 &&
int_list->int_list[2] == 6);
}
int main(){
printf("Starting tests for parser.c\n");
// ...
if (test_parse_ints_returns_n()){
printf("test_parse_ints_returns_n OK\n");
if (test_parse_ints_returns_correct_ints())
printf("test_parse_ints_returns_correct_ints OK\n");
else
printf("test_parse_ints_returns_correct_ints FAIL\n");
}
else
printf("test_parse_ints_returns_n FAIL\n");
exit(0);
}As expected, the first test fails, I only get two integers instead of
three. As a sanity test before moving on, I copied my original
parse_dimensions into the test file and check that I get
the right result with my test string: it passes. So what’s
different?
Let’s put the two versions side by side.
gift_t* parse_dimensions(char* str){
char *tok_start = str;
char *buffer = strtok(tok_start, "x");
int ndim = 0;
int dims[3];
while (buffer){
dims[ndim++] = atoi(buffer);
buffer = strtok(NULL, "x");
}
gift_t* gift = malloc(sizeof(gift_t));
gift->l = dims[0];
gift->w = dims[1];
gift->h = dims[2];
return gift;
}int_list_t* parse_ints(char* str, char sep){
char *tok_start = str;
char sep_s[] = {sep, '\0'};
int_list_t* int_list = malloc(sizeof(int_list_t));
int_list->n = strcount(str, sep);
int_list->int_list = malloc(sizeof(int)*int_list->n);
int i = 0;
char *buffer = strtok(tok_start, sep_s);
while (buffer){
int_list->int_list[i++] = atoi(buffer);
buffer = strtok(NULL, sep_s);
}
return int_list;
}Wait, I just screwed up the int_list->n computation:
now I am taking the last one, so it’s the number of separators
+1.
test_parse_ints_returns_n OK
test_parse_ints_returns_correct_ints OK
Great.
Recompiling day 2 and running it gives me the correct results.
parse_lines
to parse_stringsI realize now that I’m not actually ignoring the last line
in parse_lines. I’m storing the empty line, but I’m just
not counting it in str_list->n. I should. And while I’m
there, I should generalize this parse_lines into a
parse_strings with the same behaviour as
parse_ints, so that I can later split by something else
than a \n.
str_list_t* parse_strings(char* str, char sep){
char *tok_start = str;
char sep_s[] = {sep, '\0'};
str_list_t *str_list = malloc(sizeof(str_list_t));
str_list->n = strcount(str, sep)+1;
str_list->str_list = malloc(sizeof(char*)*str_list->n);
char *buffer = strtok(str, sep_s);
int i = 0;
while(buffer){
str_list->str_list[i] = malloc(sizeof(char)*strlen(buffer)+1);
strcpy(str_list->str_list[i], buffer);
buffer = strtok(NULL, sep_s);
i++;
}
return str_list;
}And to begin with, I’ll change parse_lines to use that
function to check that I haven’t broken anything. Don’t forget to remove
1 from the count:
str_list_t* parse_lines(char* str){
str_list_t *str_list = parse_strings(str, '\n');
str_list->n--;
return str_list;
}Running the tests, everything is fine. Running day2.c as
well. Now let’s make parse_strings available in
parser.h and remove parse_lines.
I need to change my tests to use the new function, and to change the intended behaviour: now I want the correct line count!
Also changing the call in day2, I now get a
Segmentation fault, I’m guessing because I try to parse an
empty line. I don’t want to assume that there will be one, so let’s add
a check. Problem: strlen also breaks here. I’m not certain
what’s in that last str_list[i]. Is it just the
\0 character, and is such a string not valid with
strlen?
Creating a new test to understand what’s going on, with some debugging info:
bool test_parse_strings_empty_line_is_null(){
char test[] = "last empty line\n";
str_list_t* str_list = parse_strings(test, '\n');
printf("First character: %c\n", str_list->str_list[1][0]);
printf("Number of characters: %d\n", strlen(str_list->str_list[1]));
printf("Full string: %s\n", str_list->str_list[1]);
return (str_list->str_list[1][0] == '\0');
}The test fails. I get:
First character: 0
Number of characters: 6
Full string: 0p#BÛ☺
Yet when I add some debug info to the parse_strings
function to see what is being allocated, I only get one
malloc for the 16 characters of the first line
"last empty line", and no further passage through the
while(buffer) loop. Uh. Also, here it doesn’t
crash in my test.
Let’s try to formulate an hypothesis of what’s going on:
parse_strings counts the number of \n and
allocates a char* for the empty line, so in my test
str_list->n is 2 and
str_list->str_list is an array of char* of
size 2. Then strtok ignores the last empty
line, as while(buffer) if False for an empty
string. No memory is therefore allocated for that last line. When I try
to read it, in my test file I always get a random mix of characters, but
somehow there’s always a valid 6 characters string in there that
strlen and printf("%s") can understand. In my
day2.c file, which executes more things, accessing that
part of the memory causes a crash.
Now how can I get a predictable behaviour? I can try adding something
to parse_strings so that, if I haven’t filled the whole
str_list array, I fill it with empty strings:
// parser.c
str_list_t* parse_strings(char* str, char sep){
// ...
int i = 0;
while(buffer){
str_list->str_list[i] = malloc(sizeof(char)*(strlen(buffer)+1));
strcpy(str_list->str_list[i], buffer);
buffer = strtok(NULL, sep_s);
i++;
}
while (i < str_list->n){
str_list->str_list[i] = malloc(sizeof(char)*1);
strcpy(str_list->str_list[i], "");
i++;
}
return str_list;After that, my new test succeeds, which is a good sign. And
day2.c fails, but in a good way:
ERROR: parse_ints should return 3 values!, received 1. So
now I can add the strlen check.
// day2.c
int main(){
// ...
for (int i=0; i < gifts->n; i++){
if (strlen(gifts->str_list[i]) == 0)
continue;
// ...
}
// ...
}And we are finally good again.
I strongly suspect that I’m going to run into problems if I get an input with an empty line somewhere in the middle, but I’m going to wait until that happens to deal with it.
This was a lot of work to not solve any new puzzle. But I understand the code a lot better, and my toolbox is getting better, so it was certainly not wasted time.
I’ve been really enjoying Ron Jeffries’ blog lately, and his small steps, test-driven approach to coding and refactoring. I’m starting to really look at it as a sort of antidote to vibe coding. It pushes you towards developing a much deeper understanding of what your code is doing and why, and that in turns makes the code a lot easier to maintain and grow in the long term.
Hopefully.
Anyway, there will be more refactoring to come, but let’s maybe try the next puzzle now!