It’s time to start reversing another text adventure. For this one I wanted to go back to one I remember playing when I was a sprog: Stranded, published on the BBC Micro by Superior Software and by English Software on other platforms.
It isn’t the best adventure, but I remember it, mainly because it was one of the few BBC text adventures with some graphics.
I’m starting this from the beginning to demonstrate exactly how I go about reversing adventures.
Starting
The first step is to look at the files and gain an idea of how the data is structured. This involved a short play through to remind myself of the game and check for the complexity of the parser and size of descriptions etc.
Looking at the files, the disk image contains two real files:
- Strand, the loader which loads in at &2500 and draws the intro image before *RUNing Strand1
- STR1_2, which contains everything for the adventure
The adventure is unusual because it runs in screen MODE 1, this is a rarely used screen mode, because it provides a high (for the 1980s) resolution of 320 x 256; which, although is not the highest the BBC can handle, is higher than the C64 (320 x 200) or the Spectrum (256 x 192). There are two downsides to MODE 1: it only allows 4 colours and it uses a massive 20 KB of memory just for the screen.
This means that once the screen has been taken into account, there is only 12 KB of memory left on the Beeb. That’s 12 KB for the whole game, including graphics, code, descriptions and all the gubbins that the OS needs!
The file is close to that size: being &2c28 bytes in size: which is a tad over 11 KB. It loads into &1100, which is about as low as you can go with DFS. This causes some over-writing of screen memory. It then executes code from &3d00 (which is in screen memory) to move everything from &1100 to &400.
File analysis
I normally start this by looking from the hex dump of the file and looking for text, verbs etc.
At &380 are the OSCLI commands for saving the game. The save is useful because it tells us where the game flags and inventory will be in memory:
*SAVE"ADVDATA"B2F B38
So the memory between &B2F and &B38 (or &72F to &738 in the file). This is rather small for an adventure, so if probably just the object locations and therefore the game may not have any flags. Using a debugger on a BBC emulator shows that this memory space does indeed store the object locations, with the default locations stored in &738
I can see verbs and nouns starting at &500:
The list is unusual as it looks like whole commands are defined, with both words lock to four characters. We can see:
"GO NORT"
"PICKLOCK"
"INVE "
This may make it easier later to work out actions.
These are immediately followed, at &660, by the list of objects with NOUN names:
PARAFUELLASELOCKKEY SUITCRYS
Once again, these are four characters in length.
After this, at &688 are the long forms of the object names, each separated by a &FF byte. Followed at &6E0 by the names of the directions (probably used for displaying).
The system messages and intro text can be found at &1580.
We’re missing one important thing: the description locations, which means they’re compressed or obfuscated. Normally I’d expect to see a dictionary of words, but can’t. There is one clue though, at &760, for &40 bytes there is the string:
" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12345'!()."
This looks like a lookup character table, it’s &40 bytes long, so this means that you only need 6 bits for each character.
At this point we can search through the text for known text patterns. We know that the description of the starting room is:
"You are standing on a barren plateau. Exits lead SOUTH EAST WEST"
If we assume that the string at &760 is a lookup table, we can manually plot out the start of the string to:
Y = &19
o = &29
u = &2f
A hex search for these bytes finds them at &f68, with a &FF separater; looks like this has legs! We can look for words, if we process the whole file through the lookup table, using a simple bit of python:
from future import print_function dictionary=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12345'!().\n" fin=open("STR1_2","rb") fin.seek(0x900) data=fin.read() for x in data: if ord(x) == 0xff: print() char=ord(x) & 0x3f print(dictionary[char], end='') if (ord(x) & 0x40): print(" ", end='')
I’ve tuned this to start at &900, which is the start of the descriptions:
C:\Users\dave\Desktop\Stranded>decode.py | more A B C D E barren plateau. A B D E rocky mountain path. A B D F top G E steep H. I space shuttle J visible below. A B D E ledge halfway down F H face. […] You are standing on a the
And here we can sell the room descriptions and messages. Note the A B C D E in the first description, this matches with the words in the room description. So we can assume the strings at 0xf68 are a dictionary of words for compression.
Looking at &900 in a hex editor, we can see those A B C D E all have bit 7 set:
C1 C2 C3 C4 C5 1C
So we can make an assumption of how the strings are created:
for byte in data: if byte == 0xff: # print the sentence print(sentence) sentence="" else if byte & 0x80: # it's a dictionary word sentence+=dict[byte & 0x3f] else if byte & 0x40: # add a space to the end sentence+=" " else # read from the table sentence+=table[byte & 0x3f]
A quick bit of python allows the extraction of all these strings, which show them to be not just the room descriptions, but also the messages.
C:\Users\dave\Desktop\Stranded>decode.py 1: You are standing on a barren plateau. 2: You are on a rocky mountain path. 3: You are on the top of a steep cliff. A space shuttle is visible below. 4: You are on a ledge halfway down the cliff face. 5: You are at the base of the cliff. 6: You hit the ground with a thud.
There are also some terrible spelling errors, including “vegitation” and “does’nt”.
An assumption must be made: as the descriptions are just messages, then somewhere there needs to be a matching of rooms to description. This will probably have the exits as well. The game supports six directions: N, S, E, W, U, D.
This means we’re looking for some data that will say room 1 has a description string of 1 and exits that match the following (I’m making an assumption that rooms start from 1):
- SOUTH to room 3
- EAST to room 2
- WEST to room 42 (or a room with description 42)
And there’s a possibility at the start of the file:
01 00 00 03 03 02 02 03 04 3D FF 02 01 00 03 05
01 06 06 FF 03 02 00 03 01 01 C1 07 04 81 07 07
This springs out as:
01 description
00 picture
00
03 Flags, 0x2 = CLS
03 02 EAST to room 02
02 03 SOUTH to room 03
04 3D WEST to room 61
02 description
01 picture
00 03 unknown
05 01 UP to room 01
06 06 DOWN to room 06
This mostly works, but there’re some strange extra bytes on some entries, e.g. room 3 has:
03 02 00 03 01 01 C1 07 04 81 07 07
That C1 07 04 81 07 07 doesn’t match to anything and there are similar in other rooms. Messing with the value in an emulator doesn’t do anything exciting, so this may have to be something we do when we bring out the disassembler.
I’d recently moved to Ghidra as my primary disassembler, and as I was trying to get used to it, I used this to try and work out how stuff worked. So first off I loaded the STR1_2 binary into Ghidra and set it load where it was relocated (&0400) then I set a variable Start to the execution address at &3100 and then let Ghidra analyse from there.
At that point it took many hours of stepping through the code and trying to understand what the code meant.
The easiest way of doing this was to search for the OS calls, as they are easy to find. I spent a while entering the labels and then following references. The majority of OS calls are OSWRCH (OS Write Character) and OSASCI (OS ASCII, which is basically an OSWRCH that performs a linefeed if it sees a CR. There are a lot of them and most will be around messages.
It was easier to look for some of the other calls, especially OSWORD, this performs multiple tasks from a defined block, one of these tasks is to read keyboard data into a buffer. A reference search of OSWORD leads to two sources, where we have:
1442 a2 50 LDX #0x50
1444 a0 00 LDY #0x0
1446 a9 00 LDA #0x0
1448 20 f1 ff JSR OSWORD
And
1cf0 a9 09 LDA #0x9
1cf2 a2 51 LDX #0x51
1cf4 a0 00 LDY #0x0
1cf6 20 f1 ff JSR OSWORD
So OSWORD takes the type of operation in A (in this case 0x00 and 0x09) and a block defined at the address pointed to by the X and Y registers (in this case 0x5000 and 0x5100). OSWORD 0x09 is read pixel value, so is likely used in the floodfill. OSWORD 0x00 is read line – this is useful.
Then this continues in this route, but finding calls and then working out what the call does and working out the code. My disassembled version can be found on github at https://github.com/tautology0/textadventurestuff/tree/master/Stranded
Tracing this through allowed me to work out the data stored for the rooms. Each room consists of the following:
uchar description,
uchar image,
uchar unknown=0,
uchar flags=0 // 2 == room
Then there are a selection of destinations:
[verb] [room/message]
These can be preceded by a byte with bit 7 set which indicates an condition in bits 6 and 5:
- bits 6 and 5 set: object not in room
- bit 6 set: object in room
- bit 5 set: object in inventory
- bits 6 and 5 clear: object not in inventory
bits 0 – 4 contain an index into an object.
This is followed by the two bytes of [verb] [room/message]
To break down an example, at &0414 we have:
03 02 00 03 01 01 c1 07 04 81 07 07 ff
Which breaks into:
Message 03 "You are on the top of a steep cliff. A space shuttle is visible below."
Image 02
Padding 00
Flags 03
Actions
VERB 01 (NORTH)
GOTO 01
As C1 has bit 7 and 6 set, but bit 5 clear which is in room
VERB 07 (JUMP)
OBJECT 01 (Parachute) inroom
GOTO 04 "You are on a ledge halfway down the cliff face."
As 81 has bit 7 set, but bit 5 and 6 clear then not inventory
VERB 07 (JUMP)
OBJECT 01 (Parachute) not in inv
GOTO 07 "You are lost in the forest."
The final bit is the graphics. Playing through we can see that the graphics are drawn lines which are then filled. On the BBC drawing lines can be done through PLOT codes. The plot codes are sent through VDU codes, which are values less than &20 sent to OSWRCH.
Some mucking around lead to this code. I’ve renamed the branches to be sensible:
DoGraphicsCommand
1c30 48 PHA
1c31 29 07 AND #0x7
1c33 aa TAX
1c34 68 PLA
1c35 e0 04 CPX #0x4
1c37 b0 13 BCS imageXGreater4
1c39 29 20 AND #0x20
1c3b c9 00 CMP #0x0
1c3d f0 03 BEQ graphicsCommand
1c3f 20 4b 1d JSR SetGraphicsBGColour
graphicsCommand
1c42 b1 70 LDA (LookupVector),Y
1c44 e0 00 CPX #0x0
1c46 f0 0c BEQ graphicsDrawCommand
1c48 e0 02 CPX #0x2
1c4a f0 0e BEQ graphicsMoveCommand
imageXGreater4
1c4c e0 05 CPX #0x5
1c4e f0 23 BEQ graphicsFloodFillCommand
1c50 e0 07 CPX #0x7
1c52 f0 0c BEQ graphicsTextCommand
So, basically there’s each image has colours set. The background colour is set at the start, then there’s a selection of commands defined by the value of a byte, which works out as:
00 draw a line from current cursor to value
02 move to point
05 floodfill
07 plot a graphics character (defined separately)
The top two bits has the colour (0 – 3) for the line. As an experiment I wrote a simple BBC basic file to plot the graphics (minus the special graphics characters).
After working all this out I wrote a basic Python script to decode it to a JSON file. If theory this looks like the whole database, although I haven’t tried to “run” it in anger. This script along with my annotated disassembly can be found on my github.