I heard about “the code” at www.canyoucrackit.co.uk during the first friday of december (2011-12-02), and cracked the final stage on sunday two days later. The reason for not cracking it all during friday evening was, unfortunately, not because it presented much of a challenge but because I was out partying pretty much 24/7 during friday and saturday (including after parties until dawn both nights). :)
I didn’t want to publish any writeup until the challenge was over, even though I heard that there were others who didn’t bother waiting. As a man who loves challenges of all kinds, I would hate to be the one spoiling it for someone who actually wanted to give it a try though.
The first part of the code was this picture:
I recognized it as x86-assembler at the first glance, because of the “eb 04” (jmp $+6), the multiple “31 cX” (xor reg, reg), the “90”:s in the end (NOPs) and last but not least the “cd 80” (int 0x80) to mention a few of the immediately recognizable opcodes. The int 0x80 also tells me that this is most likely a Linux-based payload, since 0x80 is the interrupt used for making system calls in Linux. Looking at the code, I could see that it actually just calls exit(0) after the relevant code has been executed.
By looking at the code in a disassembler I could see that it has a decode-loop, that looks very much like the RC4-cipher (initializing a 256 byte buffer with values 0 to 255, swapping values around based on what corresponds to the RC4-key (0xdeadbeef) followed by a decryption loop. Not that it really matters which crypto is being used, since the key is embedded into the code.
Looking closer at the decryption loop we can see that it is actually slightly different from the original RC4 cipher, even though the same key scheduling is used.
Original RC4 can be implemented like this:
i = j = 0; ptr = buf; while (len-- > 0) { i = (i + 1) & 0xff; j = (j + state[i]) & 0xff; temp = state[i]; state[i] = state[j]; state[j] = temp; *ptr++ ^= state[(state[i] + state[j]) & 0xff]; }
This implementation changes j to the xor-byte in each iteration:
i = j = 0; ptr = buf; while (len-- > 0) { i = (i + 1) & 0xff; j = (j + state[i]) & 0xff; temp = state[i]; state[i] = state[j]; state[j] = temp; j = state[(state[i] + state[j]) & 0xff]; *ptr++ ^= j; }
This is actually pretty funny, considering that GCHQ themselves have published an (very brief) explanation of the challenge now after it’s over, which describes the crypto as RC4. So from what I can tell, this is an unintentional bug rather than by design. A mistake that is easily made when you’re hand coding assembler, and not being careful of which registers are used for what, but a rather silly mistake nonetheless.
Anyway. Analyzing the code further we can see that the payload seems to be missing some important parts, like the buffer that is to be decrypted for instance. This, on the other hand, is definitely by design. :)
See the “AAAA” (41 41 41 41) at the end of the payload? That’s a tag that is checked by this piece of code:
00000042 58 pop eax ; Read last four bytes of original payload 00000043 3D41414141 cmp eax,0x41414141 ; Compare it to 0x41414141 ("AAAA") 00000048 7543 jnz 0x8d ; Jump to exit(0) code if no match
So far so good. If the tag isn’t there, it jumps ahead to the part of the code that calls exit(0):
0000008D 31DB xor ebx,ebx ; ebx = System call argument 1 = 0 0000008F 89D8 mov eax,ebx ; eax = System call number = 0 00000091 FEC0 inc al ; eax = eax + 1 = 1 = SYS_exit 00000093 CD80 int 0x80 ; Do system call: exit(0)
Then it continues with checking for another tag (“BBBB” = 42 42 42 42), this tag is _not_ included in the payload:
0000004A 58 pop eax ; Read four bytes more, beyond the original payload 0000004B 3D42424242 cmp eax,0x42424242 ; Compare it to 0x42424242 ("BBBB") 00000050 753B jnz 0x8d ; Jump to exit(0) code if no match
Of course, we can just append this tag to the code to bypass it. This doesn’t really help us much though, since the code then continues with reading a byte count to decrypt, followed by the actual encrypted data:
00000052 5A pop edx ; Get size of encrypted buffer 00000053 89D1 mov ecx,edx ; Set ecx = Size of encrypted buffer 00000055 89E6 mov esi,esp ; Set esi = Start of encrypted buffer 00000057 89DF mov edi,ebx ; Set edi = Pointer to the not-quite-RC4 state buffer 00000059 29CF sub edi,ecx ; Allocate space before the not-quite-RC4 state buffer 0000005B F3A4 rep movsb ; Copy the encrypted buffer to the allocated space
After this follows the code that actually performs the decryption, and finally it calls exit(0) just like when the tag on the stack is not found. The exit(0) could be replaced by code that writes the decrypted data to stdout, or we can simply set a breakpoint there (or manually insert a 0xcc = int3 there instead) so we can read the decrypted data in a debugger.
So, where is the missing data? My first thought was that the data is probably either hidden in the HTML code of the www.canyoucrackit.co.uk page, or that it is hidden within the image of the payload. Hiding it within the image of the payload seemed pretty likely, since it actually was a bit odd that they used an image at all instead of plain text that would allow for copy & paste instead of manually writing the payload down from the image.
There are several ways to hide data within an image, some more refined than others. The easiest way to do it is to simply inject the data to the hide into some form of meta data. Using “exiftool” we can extract the following text field (although I first noticed it with a simple “strings”):
je@isis:~$ exiftool cyber.png | grep Comment Comment : QkJCQjIAAACR2PFtcCA6q2eaC8SR+8dmD/zNzLQC+td3tFQ4qx8O447TDeuZw5P+0SsbEcYR.78jKLw==
This is quite obviously base64-encoded data, judging from the “==” at the end and the range of characters being used. The following command line reveals its decoded contents:
je@isis:~$ exiftool cyber.png \ | grep Comment | awk '{ print $3 }' \ | perl -MMIME::Base64 -e 'print decode_base64(<>)' > cyber.bin je@isis:~$ xxd -g1 cyber.bin 0000000: 42 42 42 42 32 00 00 00 91 d8 f1 6d 70 20 3a ab BBBB2......mp :. 0000010: 67 9a 0b c4 91 fb c7 66 0f fc cd cc b4 02 fa d7 g......f........ 0000020: 77 b4 54 38 ab 1f 0e e3 8e d3 0d eb 99 c3 93 fe w.T8............ 0000030: d1 2b 1b 11 c6 11 ef c8 ca 2f .+......./
As you can see, it starts with the “BBBB” that the payload looked for after the end of the payload, followed by an 32-bit LSB integer (32 00 00 00 = 0x00000032 = 50 = The size of the encrypted buffer), and finally the 50 bytes of encrypted data.
In other words, the full payload is as follows:
eb 04 af c2 bf a3 81 ec 00 01 00 00 31 c9 88 0c 0c fe c1 75 f9 31 c0 ba ef be ad de 02 04 0c 00 d0 c1 ca 08 8a 1c 0c 8a 3c 04 88 1c 04 88 3c 0c fe c1 75 e8 e9 5c 00 00 00 89 e3 81 c3 04 00 00 00 5c 58 3d 41 41 41 41 75 43 58 3d 42 42 42 42 75 3b 5a 89 d1 89 e6 89 df 29 cf f3 a4 89 de 89 d1 89 df 29 cf 31 c0 31 db 31 d2 fe c0 02 1c 06 8a 14 06 8a 34 1e 88 34 06 88 14 1e 00 f2 30 f6 8a 1c 16 8a 17 30 da 88 17 47 49 75 de 31 db 89 d8 fe c0 cd 80 90 90 e8 9d ff ff ff 41 41 41 41 42 42 42 42 32 00 00 00 91 d8 f1 6d 70 20 3a ab 67 9a 0b c4 91 fb c7 66 0f fc cd cc b4 02 fa d7 77 b4 54 38 ab 1f 0e e3 8e d3 0d eb 99 c3 93 fe d1 2b 1b 11 c6 11 ef c8 ca 2f
By embedding this payload into a small C program (available here), and injecting a breakpoint at the exit(0) call by manually changing the “cd” in “cd 80” (int 0x80, the system call interrupt) to “cc” (int3) lets me read the decrypted string in a debugger.
je@isis:~$ gcc -o x x.c -m32; execstack -s x je@isis:~$ gdb -q x Reading symbols from /home/je/x...(no debugging symbols found)...done. (gdb) r Starting program: /home/je/x Program received signal SIGTRAP, Trace/breakpoint trap. 0xffffd2b6 in ?? () (gdb) x/s $edi-50 0xffffd0da: "GET /15b436de1f9107A3778aad525e5d0b20.js HTTP/1.1"
This gives us the URL for the next stage of the challenge (available here).