Reverse Engineering Stranded

It’s time to start reversing another text adventure. For this one I wanted to go back to one I remember playing when I was a sprog: Stranded, published on the BBC Micro by Superior Software and by English Software on other platforms.

It isn’t the best adventure, but I remember it, mainly because it was one of the few BBC text adventures with some graphics.

I’m starting this from the beginning to demonstrate exactly how I go about reversing adventures.

Starting

The first step is to look at the files and gain an idea of how the data is structured. This involved a short play through to remind myself of the game and check for the complexity of the parser and size of descriptions etc.

Looking at the files, the disk image contains two real files:

  • Strand, the loader which loads in at &2500 and draws the intro image before *RUNing Strand1
  • STR1_2, which contains everything for the adventure

The adventure is unusual because it runs in screen MODE 1, this is a rarely used screen mode, because it provides a high (for the 1980s) resolution of 320 x 256; which, although is not the highest the BBC can handle, is higher than the C64 (320 x 200) or the Spectrum (256 x 192). There are two downsides to MODE 1: it only allows 4 colours and it uses a massive 20 KB of memory just for the screen.

This means that once the screen has been taken into account, there is only 12 KB of memory left on the Beeb. That’s 12 KB for the whole game, including graphics, code, descriptions and all the gubbins that the OS needs!

The file is close to that size: being &2c28 bytes in size: which is a tad over 11 KB. It loads into &1100, which is about as low as you can go with DFS. This causes some over-writing of screen memory. It then executes code from &3d00 (which is in screen memory) to move everything from &1100 to &400.

File analysis

I normally start this by looking from the hex dump of the file and looking for text, verbs etc.

At &380 are the OSCLI commands for saving the game. The save is useful because it tells us where the game flags and inventory will be in memory:

*SAVE"ADVDATA"B2F B38

So the memory between &B2F and &B38 (or &72F to &738 in the file). This is rather small for an adventure, so if probably just the object locations and therefore the game may not have any flags. Using a debugger on a BBC emulator shows that this memory space does indeed store the object locations, with the default locations stored in &738

I can see verbs and nouns starting at &500:

Hex dump showing verbs and nouns
Hexdump of STR1_2

The list is unusual as it looks like whole commands are defined, with both words lock to four characters. We can see:

"GO NORT"
"PICKLOCK"
"INVE "

This may make it easier later to work out actions.

These are immediately followed, at &660, by the list of objects with NOUN names:

PARAFUELLASELOCKKEY SUITCRYS

Once again, these are four characters in length.

After this, at &688 are the long forms of the object names, each separated by a &FF byte. Followed at &6E0 by the names of the directions (probably used for displaying).

The system messages and intro text can be found at &1580.

System Messages

We’re missing one important thing: the description locations, which means they’re compressed or obfuscated. Normally I’d expect to see a dictionary of words, but can’t. There is one clue though, at &760, for &40 bytes there is the string:

" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12345'!()."

This looks like a lookup character table, it’s &40 bytes long, so this means that you only need 6 bits for each character.

At this point we can search through the text for known text patterns. We know that the description of the starting room is:

"You are standing on a barren plateau. Exits lead SOUTH EAST WEST"

If we assume that the string at &760 is a lookup table, we can manually plot out the start of the string to:

Y = &19
o = &29
u = &2f

A hex search for these bytes finds them at &f68, with a &FF separater; looks like this has legs! We can look for words, if we process the whole file through the lookup table, using a simple bit of python:

from future import print_function
dictionary=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12345'!().\n"

fin=open("STR1_2","rb")
fin.seek(0x900)
data=fin.read()

for x in data:
  if ord(x) == 0xff:
  print()
  char=ord(x) & 0x3f
  print(dictionary[char], end='')
  if (ord(x) & 0x40):
    print(" ", end='')

I’ve tuned this to start at &900, which is the start of the descriptions:

C:\Users\dave\Desktop\Stranded>decode.py | more
A B C D E barren plateau.
A B D E rocky mountain path.
A B D F top G E steep H.
I space shuttle J visible below.
A B D E ledge halfway down F
H face.
[…]
You
are
standing
on
a
the

And here we can sell the room descriptions and messages. Note the A B C D E in the first description, this matches with the words in the room description. So we can assume the strings at 0xf68 are a dictionary of words for compression.

Looking at &900 in a hex editor, we can see those A B C D E all have bit 7 set:

C1 C2 C3 C4 C5 1C

So we can make an assumption of how the strings are created:

for byte in data:
  if byte == 0xff:
    # print the sentence
    print(sentence)
    sentence=""
  else if byte & 0x80:
    # it's a dictionary word
    sentence+=dict[byte & 0x3f]
  else if byte & 0x40:
    # add a space to the end
    sentence+=" "
  else
    # read from the table
    sentence+=table[byte & 0x3f]

A quick bit of python allows the extraction of all these strings, which show them to be not just the room descriptions, but also the messages.

C:\Users\dave\Desktop\Stranded>decode.py
1: You are standing on a barren plateau.
2: You are on a rocky mountain path.
3: You are on the top of a steep cliff.
A space shuttle is visible below.
4: You are on a ledge halfway down the
cliff face.
5: You are at the base of the cliff.
6: You hit the ground with a thud.

There are also some terrible spelling errors, including “vegitation” and “does’nt”.

An assumption must be made: as the descriptions are just messages, then somewhere there needs to be a matching of rooms to description. This will probably have the exits as well. The game supports six directions: N, S, E, W, U, D.

This means we’re looking for some data that will say room 1 has a description string of 1 and exits that match the following (I’m making an assumption that rooms start from 1):

  • SOUTH to room 3
  • EAST to room 2
  • WEST to room 42 (or a room with description 42)

And there’s a possibility at the start of the file:

01 00 00 03 03 02 02 03 04 3D FF 02 01 00 03 05
01 06 06 FF 03 02 00 03 01 01 C1 07 04 81 07 07

This springs out as:

01 description
00 picture
00
03 Flags, 0x2 = CLS
03 02 EAST to room 02
02 03 SOUTH to room 03
04 3D WEST to room 61
02 description
01 picture
00 03 unknown
05 01 UP to room 01
06 06 DOWN to room 06

This mostly works, but there’re some strange extra bytes on some entries, e.g. room 3 has:

03 02 00 03 01 01 C1 07 04 81 07 07

That C1 07 04 81 07 07 doesn’t match to anything and there are similar in other rooms. Messing with the value in an emulator doesn’t do anything exciting, so this may have to be something we do when we bring out the disassembler.

I’d recently moved to Ghidra as my primary disassembler, and as I was trying to get used to it, I used this to try and work out how stuff worked. So first off I loaded the STR1_2 binary into Ghidra and set it load where it was relocated (&0400) then I set a variable Start to the execution address at &3100 and then let Ghidra analyse from there.

At that point it took many hours of stepping through the code and trying to understand what the code meant.

The easiest way of doing this was to search for the OS calls, as they are easy to find. I spent a while entering the labels and then following references. The majority of OS calls are OSWRCH (OS Write Character) and OSASCI (OS ASCII, which is basically an OSWRCH that performs a linefeed if it sees a CR. There are a lot of them and most will be around messages.

It was easier to look for some of the other calls, especially OSWORD, this performs multiple tasks from a defined block, one of these tasks is to read keyboard data into a buffer. A reference search of OSWORD leads to two sources, where we have:

        1442 a2 50           LDX        #0x50
        1444 a0 00           LDY        #0x0
        1446 a9 00           LDA        #0x0
        1448 20 f1 ff        JSR        OSWORD

And

        1cf0 a9 09           LDA        #0x9
        1cf2 a2 51           LDX        #0x51
        1cf4 a0 00           LDY        #0x0
        1cf6 20 f1 ff        JSR        OSWORD

So OSWORD takes the type of operation in A (in this case 0x00 and 0x09) and a block defined at the address pointed to by the X and Y registers (in this case 0x5000 and 0x5100). OSWORD 0x09 is read pixel value, so is likely used in the floodfill. OSWORD 0x00 is read line – this is useful.

Then this continues in this route, but finding calls and then working out what the call does and working out the code. My disassembled version can be found on github at https://github.com/tautology0/textadventurestuff/tree/master/Stranded

Tracing this through allowed me to work out the data stored for the rooms. Each room consists of the following:

uchar description,
uchar image,
uchar unknown=0,
uchar flags=0 // 2 == room

Then there are a selection of destinations:

[verb] [room/message]

These can be preceded by a byte with bit 7 set which indicates an condition in bits 6 and 5:

  • bits 6 and 5 set: object not in room
  • bit 6 set: object in room
  • bit 5 set: object in inventory
  • bits 6 and 5 clear: object not in inventory

bits 0 – 4 contain an index into an object.

This is followed by the two bytes of [verb] [room/message]

To break down an example, at &0414 we have:

03 02 00 03 01 01 c1 07 04 81 07 07 ff

Which breaks into:

Message 03 "You are on the top of a steep cliff. A space shuttle is visible below."
Image 02
Padding 00
Flags 03
Actions
VERB 01 (NORTH)
GOTO 01
As C1 has bit 7 and 6 set, but bit 5 clear which is in room
VERB 07 (JUMP)
OBJECT 01 (Parachute) inroom
GOTO 04 "You are on a ledge halfway down the cliff face."
As 81 has bit 7 set, but bit 5 and 6 clear then not inventory
VERB 07 (JUMP)
OBJECT 01 (Parachute) not in inv
GOTO 07 "You are lost in the forest."

The final bit is the graphics. Playing through we can see that the graphics are drawn lines which are then filled. On the BBC drawing lines can be done through PLOT codes. The plot codes are sent through VDU codes, which are values less than &20 sent to OSWRCH.

Some mucking around lead to this code. I’ve renamed the branches to be sensible:

                       DoGraphicsCommand                                
        1c30 48              PHA
        1c31 29 07           AND        #0x7
        1c33 aa              TAX
        1c34 68              PLA
        1c35 e0 04           CPX        #0x4
        1c37 b0 13           BCS        imageXGreater4
        1c39 29 20           AND        #0x20
        1c3b c9 00           CMP        #0x0
        1c3d f0 03           BEQ        graphicsCommand
        1c3f 20 4b 1d        JSR        SetGraphicsBGColour
                         graphicsCommand                                  
        1c42 b1 70           LDA        (LookupVector),Y 
        1c44 e0 00           CPX        #0x0
        1c46 f0 0c           BEQ        graphicsDrawCommand
        1c48 e0 02           CPX        #0x2
        1c4a f0 0e           BEQ        graphicsMoveCommand
                         imageXGreater4
        1c4c e0 05           CPX        #0x5
        1c4e f0 23           BEQ        graphicsFloodFillCommand
        1c50 e0 07           CPX        #0x7
        1c52 f0 0c           BEQ        graphicsTextCommand

So, basically there’s each image has colours set. The background colour is set at the start, then there’s a selection of commands defined by the value of a byte, which works out as:

00 draw a line from current cursor to value
02 move to point
05 floodfill
07 plot a graphics character (defined separately)

The top two bits has the colour (0 – 3) for the line. As an experiment I wrote a simple BBC basic file to plot the graphics (minus the special graphics characters).

After working all this out I wrote a basic Python script to decode it to a JSON file. If theory this looks like the whole database, although I haven’t tried to “run” it in anger. This script along with my annotated disassembly can be found on my github.