Joel Eriksson
CEO/Founder of ClevCode. Vulnerability researcher, exploit developer and reverse-engineer. Previous CTO and co-founder of Bitsec, which was acquired by Nixu, and Cycura which was acquired by WELL Technologies. Have spoken at BlackHat, DefCon and the RSA conference. CTF player. Puzzle solver (Cicada 3301, Boxen) / GCHQ Challenge Solution – Stage 1

I heard about “the code” at during the first friday of december (2011-12-02), and cracked the final stage on sunday two days later. The reason for not cracking it all during friday evening was, unfortunately, not because it presented much of a challenge but because I was out partying pretty much 24/7 during friday and saturday (including after parties until dawn both nights). :)

I didn’t want to publish any writeup until the challenge was over, even though I heard that there were others who didn’t bother waiting. As a man who loves challenges of all kinds, I would hate to be the one spoiling it for someone who actually wanted to give it a try though.

The first part of the code was this picture:

I recognized it as x86-assembler at the first glance, because of the “eb 04” (jmp $+6), the multiple “31 cX” (xor reg, reg), the “90”:s in the end (NOPs) and last but not least the “cd 80” (int 0x80) to mention a few of the immediately recognizable opcodes. The int 0x80 also tells me that this is most likely a Linux-based payload, since 0x80 is the interrupt used for making system calls in Linux. Looking at the code, I could see that it actually just calls exit(0) after the relevant code has been executed.

By looking at the code in a disassembler I could see that it has a decode-loop, that looks very much like the RC4-cipher (initializing a 256 byte buffer with values 0 to 255, swapping values around based on what corresponds to the RC4-key (0xdeadbeef) followed by a decryption loop. Not that it really matters which crypto is being used, since the key is embedded into the code.

Looking closer at the decryption loop we can see that it is actually slightly different from the original RC4 cipher, even though the same key scheduling is used.

Original RC4 can be implemented like this:

i = j = 0;
ptr = buf;
while (len-- > 0) {
        i = (i + 1) & 0xff;
        j = (j + state[i]) & 0xff;
        temp = state[i];
        state[i] = state[j];
        state[j] = temp;
        *ptr++ ^= state[(state[i] + state[j]) & 0xff];

This implementation changes j to the xor-byte in each iteration:

i = j = 0;
ptr = buf;
while (len-- > 0) {
        i = (i + 1) & 0xff;
        j = (j + state[i]) & 0xff;
        temp = state[i];
        state[i] = state[j];
        state[j] = temp;
        j = state[(state[i] + state[j]) & 0xff];
        *ptr++ ^= j;

This is actually pretty funny, considering that GCHQ themselves have published an (very brief) explanation of the challenge now after it’s over, which describes the crypto as RC4. So from what I can tell, this is an unintentional bug rather than by design. A mistake that is easily made when you’re hand coding assembler, and not being careful of which registers are used for what, but a rather silly mistake nonetheless.

Anyway. Analyzing the code further we can see that the payload seems to be missing some important parts, like the buffer that is to be decrypted for instance. This, on the other hand, is definitely by design. :)

See the “AAAA” (41 41 41 41) at the end of the payload? That’s a tag that is checked by this piece of code:

00000042 58          pop eax             ; Read last four bytes of original payload
00000043 3D41414141  cmp eax,0x41414141  ; Compare it to 0x41414141 ("AAAA")
00000048 7543        jnz 0x8d            ; Jump to exit(0) code if no match

So far so good. If the tag isn’t there, it jumps ahead to the part of the code that calls exit(0):

0000008D 31DB        xor ebx,ebx         ; ebx = System call argument 1 = 0
0000008F 89D8        mov eax,ebx         ; eax = System call number = 0
00000091 FEC0        inc al              ; eax = eax + 1 = 1 = SYS_exit
00000093 CD80        int 0x80            ; Do system call: exit(0)

Then it continues with checking for another tag (“BBBB” = 42 42 42 42), this tag is _not_ included in the payload:

0000004A 58          pop eax             ; Read four bytes more, beyond the original payload
0000004B 3D42424242  cmp eax,0x42424242  ; Compare it to 0x42424242 ("BBBB")
00000050 753B        jnz 0x8d            ; Jump to exit(0) code if no match

Of course, we can just append this tag to the code to bypass it. This doesn’t really help us much though, since the code then continues with reading a byte count to decrypt, followed by the actual encrypted data:

00000052 5A          pop edx             ; Get size of encrypted buffer
00000053 89D1        mov ecx,edx         ; Set ecx = Size of encrypted buffer
00000055 89E6        mov esi,esp         ; Set esi = Start of encrypted buffer
00000057 89DF        mov edi,ebx         ; Set edi = Pointer to the not-quite-RC4 state buffer
00000059 29CF        sub edi,ecx         ; Allocate space before the not-quite-RC4 state buffer
0000005B F3A4        rep movsb           ; Copy the encrypted buffer to the allocated space

After this follows the code that actually performs the decryption, and finally it calls exit(0) just like when the tag on the stack is not found. The exit(0) could be replaced by code that writes the decrypted data to stdout, or we can simply set a breakpoint there (or manually insert a 0xcc = int3 there instead) so we can read the decrypted data in a debugger.

So, where is the missing data? My first thought was that the data is probably either hidden in the HTML code of the page, or that it is hidden within the image of the payload. Hiding it within the image of the payload seemed pretty likely, since it actually was a bit odd that they used an image at all instead of plain text that would allow for copy & paste instead of manually writing the payload down from the image.

There are several ways to hide data within an image, some more refined than others. The easiest way to do it is to simply inject the data to the hide into some form of meta data. Using “exiftool” we can extract the following text field (although I first noticed it with a simple “strings”):

je@isis:~$ exiftool cyber.png | grep Comment
Comment : QkJCQjIAAACR2PFtcCA6q2eaC8SR+8dmD/zNzLQC+td3tFQ4qx8O447TDeuZw5P+0SsbEcYR.78jKLw==

This is quite obviously base64-encoded data, judging from the “==” at the end and the range of characters being used. The following command line reveals its decoded contents:

je@isis:~$ exiftool cyber.png \
| grep Comment | awk '{ print $3 }' \
| perl -MMIME::Base64 -e 'print decode_base64(<>)' > cyber.bin
je@isis:~$ xxd -g1 cyber.bin
0000000: 42 42 42 42 32 00 00 00 91 d8 f1 6d 70 20 3a ab :.
0000010: 67 9a 0b c4 91 fb c7 66 0f fc cd cc b4 02 fa d7 g......f........
0000020: 77 b4 54 38 ab 1f 0e e3 8e d3 0d eb 99 c3 93 fe w.T8............
0000030: d1 2b 1b 11 c6 11 ef c8 ca 2f                   .+......./

As you can see, it starts with the “BBBB” that the payload looked for after the end of the payload, followed by an 32-bit LSB integer (32 00 00 00 = 0x00000032 = 50 = The size of the encrypted buffer), and finally the 50 bytes of encrypted data.

In other words, the full payload is as follows:

eb 04 af c2 bf a3 81 ec 00 01 00 00 31 c9 88 0c
0c fe c1 75 f9 31 c0 ba ef be ad de 02 04 0c 00
d0 c1 ca 08 8a 1c 0c 8a 3c 04 88 1c 04 88 3c 0c
fe c1 75 e8 e9 5c 00 00 00 89 e3 81 c3 04 00 00
00 5c 58 3d 41 41 41 41 75 43 58 3d 42 42 42 42
75 3b 5a 89 d1 89 e6 89 df 29 cf f3 a4 89 de 89
d1 89 df 29 cf 31 c0 31 db 31 d2 fe c0 02 1c 06
8a 14 06 8a 34 1e 88 34 06 88 14 1e 00 f2 30 f6
8a 1c 16 8a 17 30 da 88 17 47 49 75 de 31 db 89
d8 fe c0 cd 80 90 90 e8 9d ff ff ff 41 41 41 41
42 42 42 42 32 00 00 00 91 d8 f1 6d 70 20 3a ab
67 9a 0b c4 91 fb c7 66 0f fc cd cc b4 02 fa d7
77 b4 54 38 ab 1f 0e e3 8e d3 0d eb 99 c3 93 fe
d1 2b 1b 11 c6 11 ef c8 ca 2f

By embedding this payload into a small C program (available here), and injecting a breakpoint at the exit(0) call by manually changing the “cd” in “cd 80” (int 0x80, the system call interrupt) to “cc” (int3) lets me read the decrypted string in a debugger.

je@isis:~$ gcc -o x x.c -m32; execstack -s x
je@isis:~$ gdb -q x
Reading symbols from /home/je/x...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/je/x

Program received signal SIGTRAP, Trace/breakpoint trap.
0xffffd2b6 in ?? ()
(gdb) x/s $edi-50
0xffffd0da: "GET /15b436de1f9107A3778aad525e5d0b20.js HTTP/1.1"

This gives us the URL for the next stage of the challenge (available here).