After cracking stage 1 of the GCHQ challenge, we get the URL to the Javascript code available here.
It defines a virtual machine with four general registers, two segment registers, one flag register and of course the instruction pointer. It also includes an array of two values named “firmware”, which seems rather strange. These values are not used, nor mentioned in the definition of the virtual machine that is included in a comment further down in the javascript code. The use for these values will actually become apparent in the third and final stage.
Besides specifying the register values and an array with the current memory contents, there is a 54 line comment describing the virtual machine architecture, the instruction encoding and the instruction set:
exec: function() { // virtual machine architecture // ++++++++++++++++++++++++++++ // // segmented memory model with 16-byte segment size (notation seg:offset) // // 4 general-purpose registers (r0-r3) // 2 segment registers (cs, ds equiv. to r4, r5) // 1 flags register (fl) // // instruction encoding // ++++++++++++++++++++ // // byte 1 byte 2 (optional) // bits [ 7 6 5 4 3 2 1 0 ] [ 7 6 5 4 3 2 1 0 ] // opcode - - - // mod - // operand1 - - - - // operand2 - - - - - - - - // // operand1 is always a register index // operand2 is optional, depending upon the instruction set specified below // the value of mod alters the meaning of any operand2 // 0: operand2 = reg ix // 1: operand2 = fixed immediate value or target segment (depending on instruction) // // instruction set // +++++++++++++++ // // Notes: // * r1, r2 => operand 1 is register 1, operand 2 is register 2 // * movr r1, r2 => move contents of register r2 into register r1 // // opcode | instruction | operands (mod 0) | operands (mod 1) // -------+-------------+------------------+----------------- // 0x00 | jmp | r1 | r2:r1 // 0x01 | movr | r1, r2 | rx, imm // 0x02 | movm | r1, [ds:r2] | [ds:r1], r2 // 0x03 | add | r1, r2 | r1, imm // 0x04 | xor | r1, r2 | r1, imm // 0x05 | cmp | r1, r2 | r1, imm // 0x06 | jmpe | r1 | r2:r1 // 0x07 | hlt | N/A | N/A // // flags // +++++ // // cmp r1, r2 instruction results in: // r1 == r2 => fl = 0 // r1 < r2 => fl = 0xff // r1 > r2 => fl = 1 // // jmpe r1 // => if (fl == 0) jmp r1 // else nop throw "VM.exec not yet implemented"; }
As you can see, it then throws an exception saying that VM.exec is not yet implemented. Our job is to implement this function, based on the information provided in the comment, and executing the embedded code in our virtual machine to see what it reveals. Although I don’t really see how this would provide a demonstration of anything but quite basic skills, I decided to play along and continue with the challenge. I decided to make a C implementation instead though.
First, I decided to only implement a disassembler and reverse-engineer the code by hand instead of actually implementing the virtual machine. After finding out that the initial piece of code actually decrypts the next part of the code, and that the decrypted code decrypts a string, I decided to actually implement the VM instead of just a disassembler.
0000: movr r1, 0x4 ; r1 = 4 = Start of loop 0002: movr r3, 0xaa ; r3 = 170 = Value to XOR with 0004: movm r0, [ds:r2] ; Get encrypted byte 0006: xor r0, r3 ; Use XOR with r3 to decrypt 0008: movm [ds:r2], r0 ; Write decrypted byte back to memory 000a: add r2, 0x1 ; Increment pointer to next encrypted byte 000c: add r3, 0x1 ; Increment value to XOR with 000e: cmp r2, 0x50 ; Compare pointer to end of encrypted data 0010: movr r0, 0x14 ; Point r0 to instruction after decryption loop 0012: jmpe r0 ; Jump to r0 if decryption is completed (r2 == 0x50) 0013: jmp r1 ; Jump to r1 (start of loop = 0004) 0014: xor r0, r0 ; Decryption completed. Set r0 = 0 0016: jmp 0x10:r0 ; Jump to 0x10:0 = 0x0100 = The buffer that was just decrypted
Corresponding C-code:
key = 0xaa; for (i = 0; i < 0x50; i++) buf[i] ^= key++;
The decrypted code is as follows, and quite similar to the previous code:
0100: movr r2, 0x0 ; r2 = 0 0102: add ds, 0xc ; ds = ds + 0xc = 0x1C 0104: movr r1, 0x8 ; r1 = 8 = Start of loop 0106: movr r3, 0x32 ; r3 = 50 = Value to XOR with 0108: movm r0, [ds:r2] ; Get encrypted byte 010a: xor r0, r3 ; Use XOR with r3 to decrypt 010c: movm [ds:r2], r0 ; Write decrypted byte back to memory 010e: add r2, 0x1 ; Increment pointer to next encrypted byte 0110: add r3, 0x3 ; Increment value to XOR with by three 0112: cmp r2, 0x0 ; Compare pointer with 0, indicating that it has wrapped (it's only a byte) 0114: jmpe r3 ; Jump to r3 if it has wrapped. This will never happen... 0115: cmp r0, 0x0 ; Compare decrypted byte with 0 0117: movr r0, 0x1b ; r0 = 0x1b 0119: jmpe r0 ; If decrypted byte was 0, jump to 0x011b = Instruction after the loop 011a: jmp r1 ; Jump to r1 (start of loop = 0x0108) 011b: hlt ; Halt execution
Corresponding C-code (sort of):
key = 0x32; do { buf[i] ^= key; key += 3; } while (buf[i]);
The instructions at 0x0112-0x0114 are not necessary, and could either be there for misdirection or possibly to indicate a hidden challenge. Further signs of this lies in the fact that there are a couple of instructions further down that are never executed:
0132: add ds, 0x10 0134: jmp r1
There is also a 0xcc at 0x0140, that isn't even a valid a valid instruction in the instruction set for the virtual machine, but corresponds to a breakpoint in x86 assembler. Last but not least there are large chunks of memory that are neither read, written or executed. First of all, there are some zero bytes that are obviously just there for filling. What is more suspicious are these:
0150: 7D 1F 15 60 4D 4D 52 7D 0E 27 6D 10 6D 5A 06 56 }..`MMR}.'m.mZ.V 0160: 47 14 42 0E B6 B2 B2 E6 EB B4 83 8E D7 E5 D4 D9 G.B............. 0170: C3 F0 80 95 F1 82 82 9A BD 95 A4 8D 9A 2B 30 69 .............+0i 0180: 4A 69 65 55 1C 7B 69 1C 6E 04 74 35 21 26 2F 60 JieU.{i.n.t5!&/` 0190: 03 4E 37 1E 33 54 39 E6 BA B4 A2 AD A4 C5 95 C8 .N7.3T9......... 01a0: C1 E4 8A EC E7 92 8B E8 81 F0 AD 98 A4 D0 C0 8D ................ 01b0: AC 22 52 65 7E 27 2B 5A 12 61 0A 01 7A 6B 1D 67 ."Re~'+Z.a..zk.g ... 0200: 37 7A 07 11 1F 1D 68 25 32 77 1E 62 23 5B 47 55 7z....h%2w.b#[GU 0210: 53 30 11 42 F6 F1 B1 E6 C3 CC F8 C5 E4 CC C0 D3 S0.B............ 0220: 85 FD 9A E3 E6 81 B5 BB D7 CD 87 A3 D3 6B 36 6F .............k6o 0230: 6F 66 55 30 16 45 5E 09 74 5C 3F 29 2B 66 3D 0D ofU0.E^.t\?)+f=. 0240: 02 30 28 35 15 09 15 DD EC B8 E2 FB D8 CB D8 D1 .0(5............ 0250: 8B D5 82 D9 9A F1 92 AB E8 A6 D6 D0 8C AA D2 94 ................ 0260: CF 45 46 67 20 7D 44 14 6B 45 6D 54 03 17 60 62 .EFg }D.kEmT..`b 0270: 55 5A 4A 66 61 11 57 68 75 05 62 36 7D 02 10 4B UZJfa.Whu.b6}..K 0280: 08 22 42 32 BA E2 B9 E2 D6 B9 FF C3 E9 8A 8F C1 ."B2............ 0290: 8F E1 B8 A4 96 F1 8F 81 B1 8D 89 CC D4 78 76 61 .............xva 02a0: 72 3E 37 23 56 73 71 79 63 7C 08 11 20 69 7A 14 r>7#Vsqyc|.. iz. 02b0: 68 05 21 1E 32 27 59 B7 CF AB DD D5 CC 97 93 F2 h.!.2'Y......... 02c0: E7 C0 EB FF E9 A3 BF A1 AB 8B BB 9E 9E 8C A0 C1 ................ 02d0: 9B 5A 2F 2F 4E 4E 00 00 00 00 00 00 00 00 00 00 .Z//NN..........
There are some interesting patterns in this data, especially when interpreting it as three 112 bytes chunks. Each chunk consists of 20 bytes < 0x80, 25 bytes >= 0x80, 26 bytes < 0x80, 26 bytes >= 0x80 and finally 15 bytes < 0x80. In other words, there is a pattern in which bytes have the most significant bit set in each of these chunks. This pattern could be the result of each chunk being XOR-encoded using the same algorithm as has been used in the previous code, with some specific start and step values. I decided not to waste any more time on this track unless the challenge seemed to lead to a dead end though. After the second decryption loop it will reveal the following string at 0x01b0:
GET /da75370fe15c4148bd4ceec861fbdaa5.exe HTTP/1.0
Yet another URL, this time to an x86 Windows/cygwin executable.
To get this string I could just have dumped the virtual machine memory after execution, since I had already reverse-engineered the code I decided to get each character after it has been decoded by reading r0 at ip = 0x010c (0x010c == 268), using the code available here.
Note that I have not bothered to implement any bounds checking when reading values from memory, so running the VM on arbitrary/random code will most likely result in a crash. :) In this case I had already seen that the code was “well behaved” and just wanted to go on to the next stage of the challenge though.