Day 8 - Counting characters

Advent of Code 2015

Adrien Foucart

2026-03-14

[Back to index]

Puzzle: https://adventofcode.com/2015/day/8

My solution: https://codeberg.org/adfoucart/aocode15/src/branch/main/src/day8.c

Another string parsing puzzle, seemingly. We have a bunch of strings, and we need to count the difference between the number of characters “in code” and the number of characters “in memory”, with the examples:

"": 2 characters in code (the two quotes), 0 characters in memory (empty string)
"abc": 5 chars in code, 3 in memory
"aaa\"aaa": 10 in code, 7 in memory (six a, one escaped quote)
"\x27": 6 in code, 1 in memory (hexadecimal-encoded single-quote)

The value for this test input would be 23-11 = 12. Let’s put that in a failing test to get started:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
#include "futils.h"
#include "parser.h"
#include "testutils.h"

#define TEST_INPUT_FILE "./data/day8.sample.txt"
#define INPUT_FILE "./data/day8.txt"

int process_input(char* input){
    return 0;
}

void run_test(){
    char *input = freadall(TEST_INPUT_FILE);
    assert(process_input(input)==12, "test input works");
}

int main(){
    run_test();
}

We’re in business, let’s get parsing.

Counting characters

The “number of characters in code” is just the number of chars in the string, aka the result of strlen. So we have:

int process_input(char* input){
    str_list_t* list = parse_strings(input, '\n');
    printf("%d lines in input\n", list->n);

    int in_code = 0;
    for (int i=0; i < list->n; i++){
        in_code += strlen(list->str_list[i]);
    }
    printf("Total in code: %d\n", in_code);
    return 0;
}

Output: Total in code: 23.

The sequences to escape are:

We also have to always remove the two surrounding quotes.

So I think the best strategy is simply to count the number of occurrences of \\ and \" and remove one for each; of \x and remove three for each, and then remove two per line for the surrounding quotes, and we should be good. I think.

Let’s add a “count number of occurrences of a substring in a string” function in parser:

// parser.c
int strcountseq(char* str, char* seq){
    int i = 0;
    char* nxt = strstr(str, seq);
    while (nxt != NULL){
        i++;
        nxt = strstr(nxt+1, seq);
    }
    return i;
}

A test never hurts:

// test_parser.c
bool test_count_str(){
    char test[] = "how many :x: are there here? There should be two :x:s!";
    if (strcountseq(test, ":x:") != 2)
        return false;

    char test2[] = "how many are there here? none!";
    if (strcountseq(test2, ":x:") != 0)
        return false;

    return true;
}

It passes. Now we can use it in process_input. First attempt. The difficult part is to correctly escape the characters in my search sequence:

int process_input(char* input){
    str_list_t* list = parse_strings(input, '\n');
    printf("%d lines in input\n", list->n);

    int in_code = 0;
    int to_remove = 0;
    for (int i=0; i < list->n; i++){
        int l = strlen(list->str_list[i]);
        if (l == 0)
            continue;
        in_code += l;
        to_remove += 2 + strcountseq(list->str_list[i], "\\\\")
                    + strcountseq(list->str_list[i], "\\\"")
                    + (strcountseq(list->str_list[i], "\\x")*3);

    }
    
    printf("Total in code: %d\n", in_code);
    printf("Total to remove: %d\n", to_remove);
    return in_code-to_remove;
}

Gets me 12 to remove instead of 11, because I didn’t return the right thing. The “total to remove” is what I’m supposed to return, so I don’t even have to compute the total “in code”. Simplify:

int process_input(char* input){
    str_list_t* list = parse_strings(input, '\n');
    printf("%d lines in input\n", list->n);

    int to_remove = 0;
    for (int i=0; i < list->n; i++){
        if (strlen(list->str_list[i]) == 0)
            continue;
        int dbl_bck = strcountseq(list->str_list[i], "\\\\");
        int bck_quo = strcountseq(list->str_list[i], "\\\"");
        int hex = strcountseq(list->str_list[i], "\\x");
        to_remove += 2 + dbl_bck + bck_quo + (hex*3);

    }
    
    return to_remove;
}

Test input works. The real thing?

Bug fixing

No, wrong answer. The test input didn’t have a \\ sequence, so let’s add one to check if we don’t have a problem there.

""
"abc"
"aaa\"aaa"
"\x27"
"\\a"

Now I should get 2+2+3+5+3=15. I do. Let me look at the real input.

Ah, I see, I think. I have for instance this sequence:

"...\\\xa6..."

Where I will count two \\ instead of one, because I only offset my strstr by one after finding an instance, which is wrong. I should offset by strlen(seq). Fix, try again, and… still too high.

Printing all the numbers for the different sequences found, I don’t see anything wrong. What did I misread again?

Wait, no, there is another edge case:

"...\\x\\..."

Since I’m looking at each sequence separately, both \\ will be found, and \x will be found as well. I had ignored the “hexadecimal” part to make things easier, but I can’t. \x\\ shouldn’t count. And in this case even if it was hexadecimal, I shouldn’t count it:

"...\\x1e..."

The first \ escapes the second \, but the x isn’t escaped. I need a new strategy:

This is too puzzle-specific to get in the parser.c, so I’ll put it in day8 directly. Since I’m looking for a single char, I should use strchr instead of strstr this time.

The process is a tiny bit more hacky:

int process_string(char* str){
    int count = 0;
    char* end = str + strlen(str);
    char* nxt = strchr(str, '\\');
    while (nxt != NULL){
        if (nxt+1 >= end)
            break;
        if (*(nxt+1) == '\\' || *(nxt+1) == '\"'){
            count++;
            nxt += 2;
        }
        else if (*(nxt+1) == 'x'){
            if (nxt+3 >= end)
                break;
            if (((*(nxt+2) >= '0' && *(nxt+2) <= '9') 
              || (*(nxt+2) >= 'a' && *(nxt+2) <= 'f')) && 
               ((*(nxt+3) >= '0' && *(nxt+3) <= '9') 
              || (*(nxt+3) >= 'a' && *(nxt+3) <= 'f'))){
                count += 3;
                nxt += 4;
              }
        }
        else {
            nxt++;
        }

        nxt = strchr(nxt, '\\');
    }

    return count+2;
}

But it works, both on the test and on the real puzzle input.

Part two - reverse!

Part two is the reverse problem: find how many characters to add in order to encode the string. So "aaa\"aaa" should become "\"aaa\\\"aaa\"".

This is actually fairly easy, and we don’t need to compute the actual encoded string. The rules are simply:

Since we already detect all those cases, we just have to add an element of response. Let’s create a response type:

typedef struct {
    int to_remove;
    int to_add;
} response_t;

response_t* response_t_create(int to_remove, int to_add){
    response_t* res = malloc(sizeof(response_t));
    res->to_remove = to_remove;
    res->to_add = to_add;
    return res;
}

And modify the process:


response_t* process_string(char* str){
    int to_remove = 0;
    int to_add = 0;
    // ...
    while (nxt != NULL){
        if (*(nxt+1) == '\\' || *(nxt+1) == '\"'){
            to_remove++;
            to_add += 2;
            nxt += 2;
        }
        else if (*(nxt+1) == 'x'){
            // ...
                to_remove += 3;
                to_add += 1;
                nxt += 4;
        // ...
    }

    return response_t_create(to_remove+2, to_add+4);
}

And modify the rest to use the new type. Success! ⭐⭐