Category: Uncategorized

ScummVM – Comprehend

I’ve been working on porting my implementation of the Comprehend adventure game engine (https://github.com/RyanMallon/recomprehend) to ScummVM. The Comprehend engine was developed by Polarware and used to create a number of graphical/text adventure games including Transylvania, Crimson Crown and Oo-Topos. The game Transylvania holds a bit of nostalgia for me as it was the very first adventure game I ever played way back on the Apple ][. The games were later ported to a variety of platforms including DOS, which is the version I reverse engineered.

dosbox.png

The DOS version of Transylvania

Currently the only game supported is Transylvania, and though there is still a bunch of work to support the engine properly a full play through of the game is possible.

I’m not a C++ developer, so I’ve been stumbling my way through the porting work. I have found the ScummVM source base very cleanly designed and easy to work with though.

If you want to give it a try (you will need a copy of the DOS version of Transylvania):

git clone https://github.com/RyanMallon/scummvm.git
git checkout comprehend
./configure
make
./scummvm
scummvm

Transylvania running in ScummVM

There are a bunch of things that still need to be done:

  • The code is still a bit of a mess. I’ll need to tidy it up before attempting to get this merged.
  • Only Transylvania is currently supported (and only one of its handful of DOS variants). My Re-Comprehend implementation supports Crimson Crown and Oo-Topos also, so those games will eventually be added.
  • The title, before game and ending screens are not implemented for Transylvania.
  • The graphics don’t emulate the CGA correctly. The DOS version of the game use a 128 colour palette which maps to dithering fills for the 4 colour CGA mode. I simplified my implementation of the graphics code by just using solid colours. Some of the colour choices need tweaking.
  • The graphics bleed in a few places. This is probably caused by the line algorithm provided by ScummVM not exactly matching the one used by the original Comprehend games.
  • The sentence parser is very basic. It does not yet handle word pairs (the original engine can convert some pairs of words into a single token) or conjunctions.
  • Pressing enter to flip between full text and graphics modes doesn’t work as it did in the original games.

Finally, a massive thanks to Mark Pelczarksi, one of the developers of the original Comprehend games. I contacted him after doing the reverse engineering work, and he took the time to dig out some old material and give me some insight on the design of the original engine.

Old School Blizzard: Sprites, Maps and Palettes

I’ve been reverse engineering two of Blizzard’s early DOS games, The Lost Vikings and Blackthorne. I’ve previously written about the virtual machine that The Lost Vikings uses for implementing game objects here and here. Blizzard has now made both The Lost Vikings and Blackthorne freely available on Battle.net. Note that Blizzard went by the name Silicon and Synapse for their early games including The Lost Vikings. Blackthorne was the first game released under the Blizzard brand.

This post is about how the sprite formats and tile engine are implemented for the games. The two games use very similar engines, with Blackthorne having a few improvements over The Lost Vikings. I’ll refer to the game engine for both games as the “Blizzard engine”.

The games store all of their data in a single pack file called DATA.DAT. The pack file is an archive which contains chunks for sprites, levels, sounds etc. Most of the chunks in the pack file are compressed using a variant of the LZSS compression algorithm.

I’ve developed a set of basic tools which can be used to view sprites, levels, etc from The Lost Vikings and Blackthorne. You can download the tools from GitHub. I’ve included the command invocations to view the various examples throughout this blog post. All of the screenshots in this post were captured using these tools.

splash

./sprite_view DATA.DAT -fraw -s -u -w344 -p 0x17b 0x17c
./level_view --blackthorne DATA.DAT 1 -h0x6e

Graphics Mode

Both games use the popular “Mode X” VGA mode. Mode X is a 256 colour, 320×240 planar graphics mode. Planar graphics means that, rather than pixels being arranged linearly as they are in the 320×200 VGA Mode 13h they are divided into a set of planes. Planar graphics were originally developed to boost graphics throughput by allowing multiple memory chips to store independent planes and deliver them in parallel. This Shikadi.net article gives a good explanation of how planar graphics work.

Mode X uses four planes. The first plane stores pixels 0, 4, 8, etc. The second stores pixels 1, 5, 9, etc. So for an 8×2 sprite rather than the pixels being stored linearly as:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15

Planar Mode X stores them as:

0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15

I did most of the reverse engineering of the sprite formats by looking at the data in the pack file chunks rather than looking at the rendering code in IDA. I have a very rudimentary understanding of DOS and VGA programming and rendering code for old games tends use every optimization trick in the book and clever handcoded assembler, which is difficult (for me) to understand.

Raw Sprite Format

The first sprite format used by the games is just raw encoded planar data. Raw sprites can have any width and height that is divisible by four. Drawing raw sprites is simple:

for (plane = 0; plane < 4; plane++) {
    for (i = 0; i < (sprite_width * sprite_height) / 4; i++) {
        offset = (i * 4) + plane;
        y = offset / sprite_width;
        x = offset % sprite_width;

        pixel = *sprite++;
        dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel;
    }
}

Optionally a colour index could be chosen as an transparent colour. Raw sprites of a given width and height always have the same data size, but they are not particularly space efficient. This is especially true for the Blizzard game engine because sprites use only sixteen colours (always values 0x0 to 0xf) meaning that four bytes per pixel are wasted. The reason for using only sixteen colours per sprite is to enable some clever palette tricks which are explained later.

Unpacked/Masked Sprite Format

The second sprite format allows for transparency without sacrificing a colour index at the cost of a more complex drawing algorithm and slightly larger data size. In the tools I developed I’ve called this format “unpacked”, but it “masked” might be a better term.

In this sprite format, each set of eight pixels is preceeded by a mask byte which specifies which pixels to are to be drawn. The transparent pixels are still encoded with the value 0x0 but are skipped over when drawing. This allows the colour index 0x0 to be used as an additional colour.

The algorithm for drawing these sprites looks like this:

for (plane = 0; plane < 4; plane++) {
    x = plane;
    y = 0;
    for (i = 0; i < (sprite_width * sprite_height) / 4 / 8; i++) {
        mask = *sprite++;
        for (bit = 7; bit >= 0; bit--) {
            pixel = *sprite++;

            if (mask & (1 << bit))
                dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel;

            x += 4;
            if (x >= sprite_width) {
                y++;
                x = plane;
            }
        }
    }
}

The various collectible items in The Lost Vikings are 16×16 unpacked/masked sprites. They can be viewed with:

./sprite_view DATA.DAT -l2 -funpacked -w16 -h16 0x12f

tlv_items

Packed 32×32 Sprites

The final sprite format is only used in The Lost Vikings and is optimized for drawing 32×32 sprites. Like the unpacked/masked format each set of pixels is preceeded by a mask byte indicating which pixels to draw. However the packed format does not store the transparent pixels and packs two pixels into each byte since only sixteen colours are used. If the number of pixels to be drawn is odd, then the final half byte is zero.

This format is more space efficient the other two formats, but has a more complex drawing algorithm and variable length sprites. Pack file chunks which contain packed format sprites start with a header of 16-bit values indicating the starting offset of each sprite.

The algorithm for drawing the packed sprites is as follows:

for (plane = 0; plane < 4; plane++) {
    for (y = 0; y < 32; y++) {
        num_pixels = 0;
        mask = *sprite++;
        for (bit = 7; bit >= 0; bit--) {
            if (mask & (1 << bit)) {
                pixel = *sprite;
                if (num_pixels & 1)
                    sprite++;
                else
                    pixel >>= 4;
                pixel &= 0xf;

                x = ((7 - bit) * 4) + plane;
                dst[((dst_y + y) * dst_width) + (dst_x + x)] = pixel;

                num_pixels++;
            }
        }
        if (num_pixels & 1)
            sprite++;
    }
}

While writing this blog post I found this sprite in the header for The Lost Vikings second level which uses the packed 32×32 format, however I don’t remember ever seeing it in the game. Does anybody know if it is a secret or an unused leftover asset? View it with:

./sprite_view DATA.DAT -l2 -fpacked32 0xec -b0x10

tlv_secret_guy

Why Multiple Sprite Formats?

This is a question I can’t fully answer. I am by no means an expert in the intricacies of optimized VGA programming. It’s possible that the different formats are used to improve rendering speed for sprites that need to be updated frequently vs infrequently. Tiles for the maps are always raw format. The raw format is also used for some HUD (head-up dislay) sprites. Sprites using the packed 32×32 format including the vikings and many enemies. The unpacked sprite format is used for some less animated enemies like the gun turret, and other objects like switches and elevators.

Blackthorne doesn’t appear to use the 32×32 packed format at all, probably due to the fact that many of Blackthorne’s sprites are larger than 32×32. Most of the highly animated sprites use the raw format. Blackthorne does not have scrolling levels (it is screen to screen like the original Prince of Persia), which will make rendering quicker.

One thing I don’t yet fully understand is how the Blizzard engine knows which sprite format and size to draw for each object. There is some information in the level headers which corresponds to this, but it isn’t enough to be able to draw all objects correctly. I suspect that parts of the rendering information are managed by the virtual machine.

Sprite Layouts

As mentioned previously, Blackthorne uses larger sprites than The Lost Vikings. Although an algorithm for rendering arbitrarily sized raw sprites can be written, the Blizzard engine takes a different approach. Larger sprites are rendered by composing multiple smaller sprites in a fixed layout. For example, the titular Blackthorne character uses 32×48 sprites, which consist of two 16×16 sprites for the head, and a single 32×32 sprite for the body:

bt_layout_guy

Note that the background colour used here is actually the index 0 colour for the Blackthorne sprites, which is used as the transparent colour in game. You can view the full set of the Blackthorne player sprites with:

./sprite_view --blackthorne DATA.DAT -fraw -w32 -h48 -l2 -b0x80 0x42

It possible that this approach was taken because Blizzard already had written optimized rendering loops for 16×16 and 32×32 sprites, making it faster to draw large sprites as set of smaller sprites rather than one large sprite.

Tilemap Levels

Both The Lost Vikings and Blackthorne are use tilemap levels with 16×16 tiles (though I think the art in both games does a really good job of making it look like this isn’t the case). The sprites used for the tiles are actually 8×8. Each tile has a structure, which I called a “prefab”, that details how a set of 8×8 sprites from a 16×16 tile. Each component piece can be vertically and/or horizontally flipped, allowing the component tiles to be reused.

For example in the full tileset for first level of The Lost Vikings you can see several tiles like the ladders which have mirrored versions, but also a number of tiles which reuse a single corner piece like the riveted blue panels.

tlv_level1_tileset.png

You can view this tileset with:

./tileset_view DATA.DAT 1

The pack file contains a chunk for each tileset describing the prefabs. Each prefab is 8-bytes long, with each tile encoded as a 16-bit value. For each 16-bit value, the sprite index is encoded as 10-bit value, with the remaining 6 bits used for flags. The flags indicate if the sprite is horizontally and/or vertically flipped, and whether this component of the tile is foreground or background. Encoding a foreground/background bit into the prefabs allows the Blizzard engine to use a single tile map to render both layers.

Blackthone extends the engine the prefabs in two ways. First the three unused flag bits are used as a colour base. The Lost Vikings only allocates 16 colours for all the map tiles. By adding the colour base bits Blackthorne is able to use 128 colours for the tiles, though each individual component sprite of the tiles is still limited to 16 colours.

Blackthorne also adds an additional background map layer. This allows the game to have more detailed backgrounds and skies against the jagged foreground tiles. For simplicitly I’ll refer to the secondary background layer as the sky layer, and the main tilemap as the foreground and background layers. The sky layer does not use the foreground/background flag bits, so can be considered as a single
layer.

The following image shows how the tilemap for the first level of Blackthorne is composited:

bt_level.png

You can view this with (note the level number 3 is used, because levels 1 and 2 are the intro cutscene and practice stage):

./level_view --blackthorne DATA.DAT 3

There are some interesting things to note here:

  • Some parts of the sky layer are completely obscured by the foreground layers. This might be due to Blizzard’s development process or tools.
  • The waterfalls are partially drawn on the sky layer and partially on the background. For example, in the bottom right corner, the base of the waterfall is drawn on the sky layer. This is because there is a rock covering the waterfall in the foreground layer. Because of the shape of the rock, this cannot be drawn on using a single map with correct foreground/background handling.
  • There are some areas of the map the appear to be missing. For example the waterfall in the upper middle abruptly ends. This is because Blackthorne uses static rooms instead of scrolling. The empty areas of the map are never visible when playing the game. On some levels this is quite wasteful as entire screens are unused. Blizzard probably chose to implement the screens on a single tilemap even though the game doesn’t scroll so that they could reuse existing code from The Lost Vikings engine.

Everything is a Level

One clever trick Blizzard has used is that almost everything in the game, splash screens, cutscenes, menus, is a level. This undoubtedly saved on development time. Because all the hard work of implementing code for game levels with animation and moving objects is already done, it makes sense to reuse that same code to handle cutscenes and menus. I believe the game engine also uses the virtual machine (see my previous blog post) to implement animation in the cutscene levels. One notable exception to this is The Lost Vikings splash screen at the beginning of this blog, which is encoded as a large, uncompressed raw sprite.

For example, when the game starts up a number of splash screens are shown. Two of these are encoded into a single level (as two rooms). The left of the image below shows the tileset for the level, while the right hand side shows the level itself. In the game, the silhoutte is shown first (which glows, more on this later). The Blackthorne logo is displayed above the main menu.

bt_splash.png

I’m not sure why there are two copies of the Blackthorne logo tiles. Note that they are slighly different colourisation. The TM is black and inside the logo in the top set, and white and outside the logo in the bottom set for example. The tileset may be reused for a different splash screen at the end of the game?

You can view the tileset and level for this splash screen with:

./tileset_view --blackthorne DATA.DAT 1 -c 0x6e 
./level_view --blackthorne DATA.DAT 1 -h0x6e

Palette Management

Earlier I talked about how sprites only use sixteen colours each, always encoding the values as 0x0 – 0xf. This is used to allow each level to define its own palette while allowing sprites to be reused across levels.

In The Lost Vikings there are several worlds including a space world, prehistoric world, and a wacky candy world. Each has its own colour scheme and enemies. The player controlled vikings only use two sixteen colour palette sets (Olaf and Baleog share a green and yellow colour scheme, Erik has a red and blue scheme). The HUD uses another sixteen colour set, and one is also used for items such as keys and health pickups that are used in all of the worlds. This leaves a sizable portion of the 256 colour palette to be defined by the level. Most of the levels load an 128 colour base palette, followed by a set of eight 16 colour palettes.

The per-level palettes also allow sprites to be reused with a different colour scheme. This was a popular trick in era of 8-bit colour and limited storage space. The Mortal Kombat games famously have a number of palette swapped ninja characters. The following image shows the same dinosaur sprites with the palette setting from two different levels:

tlv_dinosaur

To view the two different versions of the dinosaur sprite:

./sprite_view DATA.DAT -fpacked32 -b0xc0 -l5  0xf8
./sprite_view DATA.DAT -fpacked32 -b0xc0 -l10 0xf8

Note that the level number (-l) and base palette offset (-b) arguments are passed to the sprite viewer program. The level number is used to determine which palette chunks to load by parsing the level header chunk. The base palette index has been determined experimentally. Again, this is something I don’t fully understand in the Blizzard engine, and again I suspect that the base palette index may be specified by instructions in the object’s virtual machine program.

Palette Animations

Another popular trick from the DOS days is animating graphics by changing colours in the palette. Animating an object by redrawing the pixels is quite expensive, especially for something large than needs to be updated frequently like the background waterfalls in Blackthorne. Instead of modifying the pixels themselves it is much less expensive to modify colours in the palette, which is immediately updates all pixels of the corresponding colour. This technique is mostly useful for simple cyclical animations like waterfalls and the blinking lights in The Lost Vikings spaceship levels. As mentioned previosly, palette animations are used in Blackthorne to make the silhoutte glow in the startup splash screen.

The Blizzard engine implements palette animations in the level header. Each palette animator has a 8-bit speed value and two 8-bit colour index values. If the two indexes are not equal then the animation cycles the colour values between the two indexes. This format is good for animating things like the waterfalls and blinking lights which appear to move in a pattern.

If the two indexes are equal, then the palette animation is for a single colour and the level header specifies a list of 16-bit values for the animation. The 16-bit values each encode a colour value in RGB-555 format (meaning 5 bits per colour, so one bit wasted). The normal VGA palette, and the Blizzard engine is capable of 6 bits per colour. The palette animations lose the least significant bit by shifting each of the colour values left by one. This palette animation format is useful for animating something like a pulsing light.

You can press ‘A’ in the level viewer to show the palette animations.

Game Over

That’s all for now:

./level_view DATA.DAT 48
./level_view --blackthorne DATA.DAT 23

game_over.png

Recompiling the Lost Vikings

In my previous blog post I talked about reverse engineering the virtual machine used to implement objects in the Lost Vikings. The Lost Vikings was released in 1993 by Silicon and Synapse, now better know as Blizzard.

At the time I empirically tested some opcodes by manually patching the original data chunks with a hex editor. This works, but it fairly tedious, and doesn’t scale well to experimenting with larger programs, or programs that contain loops or jumps. In the first blog post I suggested that creating a simple language and compiler would be useful for further reverse engineering the virtual machine. So, I did.

Building a Compiler

To assist reverse engineering the virtual machine, and also allow for easy creation of new programs I decided to build a compiler. To keep things simple I opted for a single pass, recursive descent parser.

The compiler design is very much based on the PL/0 type compilers often taught in university compiler courses (see PL/0). I won’t go into all the details of writing a basic compiler as there are tons of resources available which already cover this. Basically the compiler takes a source program, performs lexical analysis to create a stream of tokens, and then parses the program, generating code as it goes.

Lost Vikings C

I created a very simple C-like language for implementing object programs in. One advantage of using a C-like language is being able to reuse the existing C preprocessor to support defining constants and macros. The language does not allow for defining variables, you are restricted to the object fields and globals defined by the virtual machine. Built in functions provide support for generating opcodes such yield and spawn object operations. The language makes a few minor deviations from C just to keep the lexer and parser simple.

Sticking with the gun turret object from the first level that I modified in the first blog post, a turret which shoots arrows can be implemented in Lost Vikings C as follows:

#include "vikings.h"

#define timer      field_30
#define DELAY_TIME 40

function main
{
    call set_gfx_prog(0x37bc);

    this.timer = 0;
    while (this.timer != 0xffff) {
        call update_obj();
        call yield();

        if (this.timer != 0) {
            this.timer = this.timer - 1;

        } else {
            // Calculate the offset of the projectile based on the
            // turret's direction.
            if (this.flags & OBJ_FLAG_FLIP_HORIZ) {
                g_tmp_a = this.x - 12;
            } else {
                g_tmp_a = this.x + 12;
            }
            g_tmp_b = this.y - 12;

            // Update the turret animation
            call set_gfx_prog(0x37c8);

            // Spawn an arrow projectile
            call spawn_obj(g_tmp_a, g_tmp_b, 0, 0, 7);

            // Delay for next shot
            this.timer = DELAY_TIME;
        }
    }
}

Each object needs a main loop, which should never terminate. The main loop must call the yield() function to allow the virtual machine processing loop to exit. Failing to do this will cause the game engine to hang. The update_obj() call does basically what it says.

Many of the object fields uses are left up to the specific object. In this case I’ve used field_30 as a timer to control how quickly the turret shoots. When the timer hits zero, the turret fires, and then rearms the timer.

The only thing left to do now is generate code for this program.

Code Generation

The Lost Vikings virtual machine is not an ideal target for a simple compiler. PL/0 type compilers typically generate code for an abstract stack machine, so a simple statement like:

this.x = this.y + 1;

Would generate code like:

push this.y    ; push this.y onto the stack
push 1         ; push 1 onto the stack
add            ; pop the top two values, add them and push the result
pop this.x     ; pop the result into this.x

The Lost Vikings Virtual Machine however does not have a stack. It has a single temporary register, object fields and globals. The above program could be translated as follows for the Lost Vikings virtual machine:

52 20          ; var = this.y
56 1e          ; this.x = var
51 0001        ; var = 0x0001
59 1e          ; this.y += var

This is much more difficult to generate code for when compiling from a general purpose language. It would probably require generating an intermediate representation in the first pass, and then re-ordering instructions, etc in a second pass to generate the final code.

Abstract Machine

The solution I came up with was to build an abstract stack machine, which could be easily compiled for on top of the Lost Vikings virtual machine. This is made possible by the opcodes for loading and storing globals. These opcodes take an offset in the games data segment (DS) as their operand. This allows the opcodes to be used to load or store any arbitrary address in DS.

The compiler generates code by placing a fake stack at a high location (0xf004) in the data segment. Two locations at the base of the fake stack are reserved for special register: 0xf000 is a zero register, and 0xf002 is a flag register used for comparisons. The above statement can now be compiled as:

52 20        ; var = this.x
57 f004      ; ds[f004] = var
51 0001      ; var = 0x0001
57 f006      ; ds[f006] = var
53 f006      ; var = ds[f006]
5a f004      ; ds[f004] += var
53 f004      ; var = ds[f004]
56 1e        ; this.y = var

While not particularly high performance this code is easy to generate. My goal is to build a compiler for testing small programs out to determine what opcodes or combinations of opcodes do. I don’t care too much for efficiency, either in time or code size, for the moment.

A second problem in crafting a language for the original virtual machine is that many of the opcodes conditionally call a function. For example opcode 0x1a checks if the current object has collided with a viking, and if so emits a call instruction. It is more desirable to let the programmer decide whether to use a call or a jump (e.g. an if statement).

I implemented this by emitting code for a generic helper function which sets a flag register and returns. When generating code for an opcode which makes a conditional call the flag register is first cleared. If the opcode makes the call then the flag register will be set. The flag can then be tested to emit a jump call. So, the following code:

if (call collided_with_viking(0x01)) {
    this.x = 1;
}

Generates the following code:

[c009] 51 0000      ; var = 0x0000
[c00c] 57 f002      ; ds[f002] = var, clear the flag register
[c00f] 1a 01 c0a5   ; if (collided_with_viking(0x01)) call c0a5
[c013] 53 f002      ; var = ds[f002]
[c016] 74 f000 c026 ; if (var == ds[f000]) goto c026
[c01b] 51 0001      ; var = 0x0001
[c01e] 57 f004      ; ds[f004] = var
[c021] 53 f004      ; var = ds[f004]
[c024] 56 1e        ; this.field_x = var
[c026] ...          ; next instruction after the if block

; Set flag helper function
[c0a5] 51 0001      ; var = 0x0001
[c0a8] 57 f002      ; ds[f002] = var, set the flag register
[c0ab] 06           ; return

The zero register is used here to compare the flag value against. The compiler emits instructions at the beginning of each program to clear the zero register.

Patching Programs

The virtual machine programs for a single world in the game are all packed into a single chunk in the data file. The compiler takes the simple approach of appending the generated code to the end of the chunk and then modifying the object’s program header to point to the patched in program.

This all works in theory, but there is one slight problem. The game stores the virtual machine programs in the extra segment (ES) at a location which is allocated using the DOS memory allocation API. The memory allocation function looks like this:

alloc_mem_func

The call to allocate memory for the virtual machine programs looks like this:

call_alloc_mem

So 0xc000 bytes (48K) is allocated for the programs. The problem is that the chunk for the space world programs is already 48972 bytes in size, leaving only 180 bytes at the end to patch a new program in. Not much when dealing with a compiler which generates extremely verbose code.

I discovered this problem when attempting to compile a larger program. The game started behaving erratically, either hanging or the gun turret object would randomly vanish. These sorts of issues can be difficult to chase down because it could be a bug in my compiler or generated code, a misunderstanding of how some opcode is meant to work, or a bug or limitation in the original game.

The quick solution is to patch the game binary to extend the size of the program allocation. Raising the allocation size to 0xd000 bytes (52k) works fine, and should be plenty of extra space for experimenting with simple programs. I’ve included instructions for this with the compiler.

Empirical Reversing

The goal in building a compiler was to make it easier to reverse the virtual machine. Testing a new opcode involves adding an entry to a dictionary of functions in the code generator class and providing a function for emitting instructions for the opcode. For example, one of the first opcodes I experimented with was 0x41, which appears in a few disassembled programs like the button/question box object.

It’s dictionary entry in the compiler initially looked like:

"vm_func_41" : (False, 4, emit_builtin_vm_func_41),

The tuple specifies whether the function returns a value, the number of arguments it takes and the code emitter function. Functions are marked as returning a value if the opcode conditionally makes a call or jump.

I knew from looking at the opcode in IDA that opcode 0x41 was unconditionally executed and took four arguments. Its arguments use the variable type encoding that I discussed in the previous blog post. Its emit function looks like this:

def emit_builtin_vm_func_41(self, reg_list):
    operands = self.pack_args(reg_list, 4)
    self.emit(0x41, *operands)

The pack_args helper function handles the packing of variable type arguments.

It can now be called in a quick test program. For testing I use the collided_with_viking function (opcode 0x1a) so that the opcode I am testing is only triggered when a viking touches the turret. Note that the collided_with_viking function takes a single byte operand which I don’t know the use of, but the value 0x01 seems to work. I also use field_32 of the object as a one-time trigger so that the opcode I am testing is only run the first time a viking touches the object.

My test program looks like this:

function main
{
    call set_gfx_prog(0x37bc);

    this.field_32 = 0;
    while (this.field_32 != 0xffff) {
        call update_obj();
        call yield();

        if (call collided_with_viking(0x01)) {
            if (this.field_32 == 0) {
                call vm_func_41(1, 1, 1, 1);
                this.field_32 = 1;
            }
        }
    }
}

This results in the following graphical corruption in game:

dialog_corrupt_gfx.png

It appears the opcode is used to display a dialog box, but it is unclear why the graphical corruption is occurring. Looking at some of the disassembled programs from the original game shows that opcode 0x41 is usually followed by the unknown opcodes 0xcb and 0x42, neither of which take any arguments. Adding builtins for those opcodes to the compiler and re-running results in a the game showing the dialog box, waiting for a keypress, and then the removing the dialog box. So, opcode 0x41 is show dialog, opcode 0xcb is wait for key, and opcode 0x42 is clear dialog.

Further experimentation with opcode 0x41 shows that its first argument is the index of the string for the dialog. The strings are, somewhat unfortunately for modding, stored in the game binary. The second argument is still unknown, and the third and fourth arguments control the x and y offset, relative to the object to display the dialog at.

A good example of why this type of experimental reversing can be useful is to look at the opcode 0xcb in IDA:

func_cb.png

Although this function is small, and what it does is clear (it modifies some globals), it isn’t obvious what effect that has on the game engine. Looking at the cross references for each of the globals in IDA isn’t particularly helpful either. Each of the globals is used in a number of places, none of which make their use immediately obvious. I could have spent a long time in IDA attempting to figure out that setting those globals tells the game engine to wait for a keypress on the next game loop. Empirical testing allowed me to figure it out very quickly. The downside is that I still don’t actually know the exact purpose of each of the globals in the function.

Interesting Fields

One of the interesting things I discovered by experimenting with the compiler is how objects can reference each other. I already knew from my initial reversing that there was some support for this, but the compiler allowed me to quickly work out more of the details.

Field 0x3c in each object specifies a current target object. Some opcodes, such as the collided_with_viking (0x1a) opcode will automatically set this field, but it can also be set manually. The vikings are always objects 0 (Baleog), 1 (Erik) and 2 (Olaf). One a target object is set it can be manipulated via the target fields. Each opcode has variants for modifying the current object field, or the target field. For example opcode 0x59 adds the temporary register to a field in the current object, whereas opcode 0x5b adds the temporary register to a field in the target object.

Other objects can manipulate fields in the viking objects to control their behaviour. For example field 0x32 for the vikings is the amount of pending damage. An object can damage a viking by adding to that field. The vikings themselves are partially implemented by programs in the virtual machine. The next time the viking’s program runs it will check the pending damage field apply damage accordingly.

Other fields of interest are 0x12 and 0x14, which are the x and y velocity for all objects. Prior to writing the compiler I had thought that objects were moved either by adding or subtracting from their x and y offset fields (0x1e and 0x20 respectively) or possibly using a function type opcode. Instead the game uses a pair of velocity fields. Some objects, such as the vikings, will automatically modify their own velocity. So, for example, if an object sets the a vikings velocity to propel it upwards, then on subsequent game loops the viking will adjust its own velocity to cause it to fall back down again.

It seems that there weren’t enough fields in the viking objects to store everything, so each of the vikings four inventory slots are globals. An object can test if an inventory slot is zero by testing if the corresponding global is zero, and give an item to a viking by directly assigning to an inventory slot global.

Advanced Game Hacking

To experiment with the virtual machine, and demonstrate how flexible it is I wrote a slightly more complex program for the gun turret in the first level. The turret checks which viking has touched it and reacts differently. If Erik touches the turret it gives him items. If Olaf touches it he gets bounced into the air. When Baleog touches it the turret flips around. It looks like this in action:

It’s pretty impressive to see this level of flexibility in a game developed in the early nineties. The source code for this program is included with the compiler in the examples directory.

Future Work

There is still a large number of opcodes, fields and globals that I do not yet understand the function of. The compiler has lots of missing features, for example it currently only supports the == and != comparison operators, and lacks most of the bitwise operators. It’s also a bit cumbersome to use, since it requires separately unpacking the original program chunk from the game data file, patching a new program in, and repacking the data file.

The tools are all open source/public domain, and available on github at: https://github.com/RyanMallon/TheLostVikingsTools. You can also follow me on twitter @ryiron.

Finding The Lost Vikings – Reversing a Virtual Machine

After having fun reverse engineering the Comprehend Adventure game (Recomprehend) I was looking for a new DOS game reverse engineering project. Over the years various people have reversed a lot of the old popular games and published specifications and tools for them. The website shikadi.net for example has a of load of information about games I played as a kid.

I discovered that the Blizzard (then Silicon and Synapse) game The Lost Vikings didn’t seem to have had any serious reverse engineering attempts. It was released in 1993 at the tail end of the DOS era, and was a game I really enjoyed playing when I was younger. The Lost Vikings is a puzzle/platformer where the player controls three vikings with different skills. The vikings need to combine their skills to solve their way through a number of themed worlds including a spaceship, a prehistoric world and an Egyptian world. The following image shows the first level of the game (source: Strategy Wiki):

tlv_map01

The game seemed like it would be relatively simple to reverse. The levels are tile map based and contain some simple puzzles such as buttons that activate or disable objects, movable boxes, and a crane that can pick up objects. Indeed, much of the reverse engineering project was straight forward. There is a single data pack file, which contains compressed file chunks. The chunks encode various game assets such as sprites, maps, sounds, etc. I’ve written a collection of tools which can be used to view the game assets: The Lost Vikings Tools.

The Virtual Machine

One interesting aspect of how the game engine works is that the objects in the game use a template class system. There is a chunk in the data file for each world that defines a set of object classes. For example, in the first level image above both of the hazard doors are class type 0x4f. Reverse engineering the object class code lead to this function:

func_caller

Ignoring for now the alternative path into location 0x142a2, this code is using si as an index into the object classes array structure (each class template is 0x15 bytes). The word at offset 0x3 in this structure is used as an address in the ES segment. The ES segment actually contains the entire object class template chunk data. The code then moves into the loop at address 0x142a6. This loop fetches the next byte from the ES segment and uses it as an index to a function table. At each step in the loop bx contains the address in the ES segment, and si contains the current opcode.

An interesting thing to note here is that the loop is using an unconditional jump and therefore will never terminate. At least, not normally. It appears that each object class has some form of opcode based program associated with it, which is interpreted by this function. My initial approach was to just start looking at some of the functions in the call table. The first one looks like this in IDA’s graph mode:

vm_func_01_graph

IDA’s stack analysis has failed here, and with good reason. Popping ax as the first instruction in a function is pretty weird. The x86 call instruction pushes the instruction pointer (ip) onto the stack, and the ret instruction pops it to return to the calling function. The pop instruction here is effectively discarding the return address, which means the next ret instruction will jump back two stack frames instead of one. This opcode will exit the infinite interpreter loop.

To see how the opcode is really implemented we need to switch to the linear view in IDA:

vm_func_01_linear

I’ve annotated the names of the functions with their corresponding opcode numbers to make this code section clearer. There is a little bit of clever code reuse going on here. It’s easiest to start by looking at opcode 0x03. Recall the the loop which is handling opcodes has the object classes chunk data loaded in the ES segment, and bx is being used as the VM instruction pointer. Opcode 0x03 is therefore an unconditional jump instruction. It loads the word at the current instruction pointer, sets the instruction pointer to it, and then returns to the opcode handler loop.

Working backwards, opcode 0x05 skips the next word operand and stores the next address into an array. Reversing other functions shows that word_28522 is the index of the current object instance, so this opcode can store a single address for each object in the level. It then restores the value of bx and falls through into the code for opcode 0x03 (jump). So this opcode appears to be a very limited call instruction with only a single level of stack.

Opcode 0x00 is storing the current instruction pointer in a global array. Note that this is different to opcode 0x05 which is storing the address after the call address operand. It then falls into the call opcode handler. The instruction however won’t actually be executed because of the initial pop instruction. Instead the calling loop will exit. The address saved by this instruction is used as the alternate entry path to the function calling loop that I ignore earlier. Opcode 0x00 is there acting as a sort of co-routine style exit. It stores the current position in the program, and exits the opcode handler loop, allowing the game engine to do other things such as update graphic, receive input, etc before returning where the virtual machine program left of. This allows a virtual machine program to perform complex tasks without stalling the rest of the game engine.

Disassembling Opcodes

At this point I had a general feel for how the virtual machine opcode handlers worked. Looking at the function call table there are around 215 implemented opcodes, so rather than just reversing them in order I started writing a simple script to decompile a program for a given object class so that I could focus on just the opcodes being called by objects in the first level.

At this point I am mostly trying to determine how many operands an opcode has, and what its basic functionality is. If an opcode handler is a short then I will generally try to fully understand what it does. If it is more than a few blocks of code then I will just try to figure out how many operands it has and whether or not it does a conditional jump or call. The reason for this is that sometimes an opcode’s function because apparent from surrounding context in the disassembled program, which is often quicker than reversing the code.

The following is a simple opcode:

vm_func_51

This opcode take a word operand and stores it in a global. Reversing other functions shows that this global is a general purpose temporary storage or register. The virtual machine only has one such register, but it also has this opcode:

vm_func_57

This opcode stores the current value of the general purpose register at the address in the data segment (DS) specified by the word operand. The programs in the game’s data file use this opcode to write to specific parts of the game data such as the inventory slots for the vikings. A few DS offsets are also used as additional temporary values when more than one is required. Offsets 0x0206 and 0x0208 are often used as temporary x and y coordinates of an object for example.

What makes this opcode and its counterpart opcode 0x53 which reads from DS into the general purpose register interesting is that it can read and write any game data in DS, not just what was intended by the original game creators. It makes extending the original game an interesting prospect.

A more complex opcode is 0x14:

vm_func_14

At first glance this looks straight forward to reverse. It fetches a byte operand, and calls a pair of sub-functions. It then fetches another byte operand and calls the same sub-functions again. Looking at the first sub-function (sub_15473) though things get a bit more complicated:

vm_func_14_internal

IDA’s stack analysis has failed again, and it shows four exits from this block which I have cropped out because they are all incorrect. What is actually happening is that the first byte operand is passed into this function in ax, which is then used as an index into a jump table. Although ax is and’ed with 7, there are actually only 4 valid entries in the jump table. Looking at type 0 in the jump table we see:

vm_func_14_jump_00

Which loads an immediate word operand. This function snippet does a retn, which returns from the top-level opcode 0x14 handler. It’s not hard to see why IDA is struggling with disassembling this code correctly.

The other entries in the jump table load values in different ways, and then each also do a retn. Type 1 takes a byte operand which is used an object field index. Object has fields for things like the x and y offset, flags, etc. Type 2 takes a word operand and loads a value from DS (like opcode 0x53 above). Type 3 takes a byte operand and loads the field of a target object for actions such as collision detection.

Moving back to the top-level opcode 0x14 handler the next thing which is called is sub_15470. This is only three bytes before the first sub-function which was called. Again it is a clever code optimisation. This sub-function simply shifts ax right by 3 and then falls into the code we just reversed above. So the first byte operand of opcode 0x14 is used to encode how two further arguments are fetched. Reversing the handling of the second byte operand shows that it functions the same way.

After fetching all its (variable length) operands the handler for opcode 0x14 then calls sub_13809. I won’t paste the code here because the function is somewhat large, and calls a bunch of other sub-functions, all of which are also large. This is one of those instances where I just didn’t bother reversing what the function did because I was later able to guess it from context.

CISC Virtual Machine

The instructions in The Lost Vikings virtual machine are variable length. In the case of opcode 0x14, and other similar opcodes, determining the length requires decoding some of the operands. Having variable length instructions means that programs need to be iteratively disassembled, determining at least the number of operands for each new opcode in the program. If the opcodes where all fixed length it would be possible to skip opcodes that were unknown.

There are also instructions which perform common operations that many programs use. For example many programs add or subtract from the current position of the object based on its direction. This could be encoded as several instructions, but opcode 0x91 takes a single word operand as an offset in DS and then adds or subtracts the current temporary register value from it based on the current objects horizontal direction (flag 0x0040). The following pseudo code illustrates the operation of opcode 0x90.

if (this.flags & 0x0040)
    DS[operand] -= var;
else
    DS[operand] += var;

This results in far fewer bytes in the encoding, especially since this operation is performed by several of the programs. Back in the DOS days when disk space and computation time where at a premium this makes perfect sense. For the modern day reverser it simply adds a bunch more opcodes that need to be understood.

Disassembly Tool

I wrote a simple Python script which can disassemble the virtual machine programs into something that looks vaguely like C. It works by linearly decoding instructions in the program, storing jump and call addresses. When it hits a function which terminates or redirects code-flow, such as a return or unconditional jump, it stops and checks to see if there are any unvisited addresses in the program.

The developers have used similar optimisation tricks in the virtual machine programs to the ones used in game binary. For example, some exit paths of subroutines in the virtual machine programs are reused by jumping to them from other subroutines. The disassembler works around this by decoding the instructions twice, so two subroutines may have code with the same addresses.

The disassembler is able to squash accesses to the temporary register to make the resulting code more readable. For example the virtual machine programs often contain sequences like:

var = 0x1000;
obj->field_10 = var;

Which the disassembler will refactor to:

obj->field_10 = 0x1000;

The disassembler does not attempt to refactor code blocks into constructs like if or while blocks, so the resulting programs are spaghetti code. Some of the larger programs become a reversing project in their own right.

A complete program

I started with the program for the gun turret in the top-left of the first (object class 0x04). It seemed like it should be a fairly simple program. The gun turret shoots bullets at a fixed interval, and it cannot be destroyed by the vikings. The following shows the output of my dissassembler. Above each opcode is a comment showing the address in square brackets, the opcode in parens and then the operands.

The turret program looks like this:

main_374d:
{
    // [374d] (19) 37bc
    this.field_21 = 0x37bc;
    this.field_22 = 0x0001;

    // [3750] (97) 00 01
    if (0x0001 & (1 << 0))
        var = 0x0001;

    // [3753] (00)
    this.save_state();

    // [3754] (9c) 18 08
    if (!(var))
        this.db_flags &= ~(1 << 12);

label_3757:
    // [3757] (2f)
    vm_func_2f();

    // [3758] (00)
    this.save_state();

    // [3759] (01)
    nop();

    // [375a] (51) 0000
    // [375d] (8c) 30 3798
    if (this.field_30 != 0x0000)
        call sub_3798;

    // [3761] (52) 1c
    // [3763] (77) 0000 3757
    if (this.update_time != 0x0000)
        jump label_3757;

    // [3768] (19) 37c8
    this.field_21 = 0x37c8;
    this.field_22 = 0x0001;

    // [376b] (52) 1e
    // [376d] (57) 0206
    g_tmp_a = this.xoff;

    // [3770] (51) 0004
    // [3773] (91) 0206
    if (this.flags & 0x0040)
        g_tmp_a -= 0x0004;
    else
        g_tmp_a += 0x0004;

    // [3776] (52) 20
    // [3778] (57) 0208
    g_tmp_b = this.yoff;

    // [377b] (51) fff8
    // [377e] (5a) 0208
    g_tmp_b += 0xfff8;

    // [3781] (51) 001e
    // [3784] (56) 1c
    this.update_time = 0x001e;

    // [3786] (02) 3030
    vm_func_02(0x3030);

    // [3789] (14) 12 0206 0208 00 0000 0000 05
    new.x = g_tmp_a;
    new.y = g_tmp_b;
    new.unknown_a = 0x0000;
    new.unknown_b = 0x0000;
    new.unknown_c &= 0x801;
    new.type = 0x05;
    spawn_obj(new);

    // [3795] (03) 3757
    jump label_3757;

}

sub_3798:
{
    // [3798] (51) 000a
    // [379b] (73) 30 37a5
    if (this.field_30 == 0x000a)
        jump label_37a5;

    // [379f] (51) 0000
    // [37a2] (56) 30
    this.field_30 = 0x0000;

    // [37a4] (06)
    return;

label_37a5:
    // [37a5] (52) 32
    // [37a7] (69) 0a 37ba
    if (this.argument < this.field_32)
        jump label_37ba;

    // [37ab] (52) 32
    // [37ad] (5c) 0a
    this.argument -= this.field_32;

    // [37af] (51) 0000
    // [37b2] (56) 30
    this.field_30 = 0x0000;

    // [37b4] (51) 0000
    // [37b7] (56) 32
    this.field_32 = 0x0000;

    // [37b9] (06)
    return;

label_37ba:
    // [37ba] (0e)
    vm_func_0e();

    // [37bb] (10)
    this.update();
}

There are a few things to note here:

  • The this.save_state() function is opcode 0x00. It acts a yield allowing the program to save where it was and return to the game engine code.
  • My dissasembler names some of the object fields that I had worked out. In this program you can see the xoff, yoff, flags and update_time are named.
  • Similarly DS offsets that are known are also named. g_tmp_a and g_tmp_b are DS offsets 0x0206 and 0x0208.
  • Opcode 0x14 is used here. I’ve named it spawn_obj(), because from context it is used to spawn a new bullet object each time the turret fires.
  • Opcode 0x19 is assigning to object fields 0x21 and 0x22. Notably field 0x21 is being assigned a value that looks like a program address. From reading the code/experimentation this address is actually an offset for instructions in a smaller secondary virtual machine which controls how the objects graphics are displayed. I’ve not yet reversed this code.

Refactoring the program makes it easier to understand what is going on:

main_374d:
{
    /* Turret initialisation */
    this.set_graphics_prog(0x37bc);

    if (0x0001 & (1 << 0))
        var = 0x0001;

    this.save_state();

    if (!(var))
        this.db_flags &= ~(1 << 12);

label_3757:
    /*
     * Main loop for the turret. This yields to the main
     * game engine on each iteration.
     */
    while (1) {
        vm_func_2f();
        this.save_state();

        /*
         * The functionality of this is unknown. The this.argument
         * field can be set for an object instance in the level
         * header. It may be used to make some turrets behave
         * slightly differently.
         */
        if (this.field_30 != 0x0000) {
            if (this.field_30 != 0x000a) {
                this.field_30 = 0x0000;
            } else {
                if (this.argument < this.field_32) {
                     vm_func_0e();
                     this.update();
                } else {
                    this.argument -= this.field_32;
                    this.field_30 = 0x0000;
                    this.field_32 = 0x0000;
                }
            }
        }

        /* If the update time reaches zero then fire a bullet */
        if (this.update_time == 0x0000) {
            this.set_graphics_prog(0x37c8);
             
            /* Reset the update time */
            this.update_time = 0x001e;
        
            vm_func_02(0x3030);
                          
            /*
             * Spawn a bullet object 4 pixels in front of the
             * turret and 8 pixels above its center line.
             */
            if (this.flags & OBJ_FLAG_FLIP_HORIZONTAL)
                new.x -= 4;
            else
                new.x += 4;
            new->y = this.yoff - 8;
            new->unknown_a = 0x0000;
            new->unknown_b = 0x0000;
            new->unknown_c &= 0x801;
            new->type_d = 0x05;
            spawn_obj(new);
        }
    }
}

Note that there are still a few sections of the program and some opcodes that I have not identified the function of, but the behaviour of the turret can be easily seen. The update_time object field is used to encode the rate the turret fires at. The program itself does not (at least obviously) decrement this value. Possibly one of the unknown opcodes performs this operation, or some part of the main game engine may do it.

Game Hacking

Being able to view the original programs used by the game is interesting, but the cool part about having a virtual machine implementing the object behaviour is the possibility of modifying or creating new programs. This would be much easier if I were to write a compiler (todo list), but for now this requires poking values manually in a hex editor.

Again I chose to modify the gun turret program since it is quite simple. My first idea was to change the object type that the turret shoots. I had mixed success with this. Attempting to make it shoot some objects, like the green alien, will just result in the game crashing or freezing. Other object types just result in the turret shooting nothing. I was able to make it shoot fire arrows (these are normally a power-up that one of the vikings can use). It is also easy to make the turret shoot more quickly, by lowering the value of the update_time field.

The second modification I made a bit more ambitious: making the turret walk instead of shoot. The opcode (0x14) for spawning the bullet is quite long:

14 12 0206 0208 00 0000 0000 05

The virtual machine (quite generously) provides a single byte nop instruction (0x01), which can be used to remove this instruction. The instructions above have already set DS[0x0206] to the current x position of the turret plus or minus 4 based on its direction, so I just needed to add instructions to set the current x position to that value. This requires only two instructions:

53 0206   // var = DS[0x0206];
56 1e     // this.x = var;

With this change the turret will walk, but will not be obstructed by anything. I did attempt to make it do collision detection and turn around, but I couldn’t get it work properly. More reversing required.

I put together a small video showing the modifications:

I’ve also made my dissassembler tool available:

https://github.com/RyanMallon/TheLostVikingsTools/blob/master/dissassembler/lv_vm_disasm.py