A bit of different breaking of code protection this time. Many, many years ago my parents bought our first computer, a Sharp MZ-700. This was initially a disappointment to me as it had no decent games, but this had the side effect that meant that to get something out of it I had to teach myself how it worked. I taught myself how to program and even started learning Z80 assembler before we got a different computer with better games.
The platform with all its weird quirks has always held a special meaning for me, I even have one in a waterproof box in the garage.
As it’s a relatively rare system, there is only limited information about the platform around the Internet and still quite a bit of software that is still missing in action. One of these games was recently found and a high quality recording, cleaned-up image taken by the Twitter user @sharpworksMZ.
The game was a text adventure called Nuclear War Games from Softworx, which appeared to be a port of the Sega SC-3000 game known as Thermonuclear War Games and unofficially based on the film War Games.
What’s more interesting is that it encodes the main program so it can’t easily be viewed and copied as files that work with the emulators. In an attempt to extract the program, there’re some interesting features in how the tape files work.
Tapes?
Back in the day, software was sold on tape cassettes – typically the same tapes that would be used for audio, albeit shorter. Although this did make it easy to copy games if you had two tapes decks, it did mean that it took ages to load something as there was lots of redundancy to cope with flaws in the media.
If we load the wav file into audacity we can see how the system differentiates between a 1 and a 0. As can be seen below, the length of a “pulse” causes this difference, with a short pulse (shown in green) being a 0 and a long pulse (shown in red) being a 1.
Wav files consist of the digital representation of the sound wave, this is performed by sampling the wave multiple times a second and storing the current value. The original wav I received was recorded at 44100 Hz with 8-bit values, i.e. 44100 samples (and bytes) represent 1 second of real time. Using the Python wave library, this allows the treatment of a wav file as a file stream, and allows me to manage each sample individually.
The best reference for the tape format is at the Sharp MZ site. If you scroll down that site, you’ll see a section about short an long pulse timings. For the MZ-700 (and both the MZ-80K and MZ-80A) then the level is read 368 μs after a rising edge. If this level is high, then it is a long pulse (which translates as a bit value of 1); if it is low then it is a short pulse (which translates as a bit value of 0).
A sampling rate of 44100 works out that every sample is ~ 22.7 μs; so we need to take a sample roughly 16 samples after we detect a rising edge . This leads to an easy algorithm: look for values, if the value is greater than 128 + a noise threshold then we have a rising edge, skip the next 15 samples. If the next one is > 128 then it’s a 1; if not then it is a 0.
This leads to some very simple Python which returns 0, 1; and -1 for an error.
def read_pulse(wfile):
value=0
try:
read_edge=wfile.readframes(1)[0]
while read_edge > 138:
read_edge=wfile.readframes(1)[0]
read_edge=wfile.readframes(1)[0]
while read_edge < 118:
read_edge=wfile.readframes(1)[0]
# We have a rising edge, now get to sample point
dummy=wfile.readframes(15)
read_point=wfile.readframes(1)
if read_point[0] > 148:
value=1
except:
# end of stream
value=-1
return value
I could just read this as bytes, but I wanted to abstract tape reading from converting the bits, so I decided to read the whole tape in one go and convert it to a really long bitstream. This allows manipulation where the data isn’t on a byte boundary.
Data on the tape
Now we can convert the tape sound as a bitstream we can interpret the data on there. Looking at the linked document above shows that each file on the tape has a header and a file body. A file header has the following components
- LGAP (long gap), which is 22000 long pulses
- LTM (long tape mark) which is 40 long pulse, 40 short pulses
- L (long) which is 1 long pulse
- HDR the actual header
- CHKH, a 16-bit checksum for the header (count of the number of bits set to 1 in the header)
- L again
Then, there’s a duplication, which is only loaded if the header fails its checksum; this wasn’t present on this tape:
- 256S, or 256 short pulses
- HDRC, a copy of the header
- CHKH, a 16-bit checksum of the header copy
- L again
A file structure is very similar, except the LGAP and LTM are SGAPs and STMs:
- SGAP (short gap), which is 11000 long pulses
- STM (short tape mark) which is 20 long pulse, 20 short pulses
- L (long) which is 1 long pulse
- FILE the actual file
- CHKF, a 16-bit checksum for the file (count of the number of bits set to 1 in the header)
- L again
Then, there’s a duplication, which is only loaded if the file fails its checksum; this wasn’t present on this tape:
- 256S, or 256 short pulses
- FILEC, a copy of the file
- CHKF, a 16-bit checksum of the file copy
- L again
This seems strange, but makes sense when you bear in mind how bad tape players could be in the early 80s. It simple way of differentiating the file data and file header and providing duplicate data to cover any tape errors. The file section is just the raw bytes for the file, whether it’s BASIC or machine code.
The HDR section needs some further defining, using the RFC type ASCII diagrams, it looks like this:
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T| NAME | S | L | E | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + | COMMENT | + + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
With the following fields:
- T is file type, from $01 for a machine code program to $05 for an MZ-700 BASIC program
- NAME is the file name, terminated by a $0D
- S is the file size
- L is the file load address
- E is the file execution address
- COMMENT is a comment
How the game stores its program
Now we understand how the data is stored I wrote a simple bit of Python to read all the various bits. The first file is a BASIC file called NWG-INTRO of size $0ffc. I wrote a detokeniser for MZ-700 BASIC many years ago. Running the extracted file through this reveals the intro, it ends with this line of BASIC:
300 PRINT"1.PRESS RESET2.TYPE L'3.PRESS "
This tells us the to play the game we have to load BASIC (more on this later), run the intro and then press the reset button and use the monitor’s L command to load the next file. This is common with machine code games on the MZ-700; but why have we wasted our time loading BASIC? To give context, the MZ-700 was relatively unique in that it was a “clean computer” – it didn’t have BASIC in ROM, all it had was a simple BIOS, known as the monitor, if you want to program in BASIC or load a BASIC program you would need to load the BASIC interpreter first, taking about 6 minutes.
Anyway, let’s look at the next file. It’s a tiny ($41 bytes) machine code file, with a rather strange comment field (image). What’s more, this is the last file header on the tape, following it is another file with strange random data; but no header. Curious!
Type: Machine code program
Filename: NWG-MAIN
Size: $0041
Load: $6b7f
Exec: $6b7f
Comment: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00=\x80\xcfk\x00\x00MAIN\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00CONTACT:":PRINT:PRINT:PRINTSPC(3);:PRINT[2, ]"AUSTRALIA"')
00000000: 00 00 00 00 00 00 00 00 00 00 3D 80 CF 6B 00 00 ..........=..k..
00000010: 4D 41 49 4E 00 00 00 00 00 00 00 00 00 00 00 00 MAIN............
00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030: 43 4F 4E 54 41 43 54 3A 22 3A 50 52 49 4E 54 3A CONTACT:":PRINT:
00000040: 50 52 49 4E 54 3A 50 52 49 4E 54 53 50 43 28 33 PRINT:PRINTSPC(3
00000050: 29 3B 3A 50 52 49 4E 54 5B 32 2C 20 5D 22 41 55 );:PRINT[2, ]"AU
00000060: 53 54 52 41 4C 49 41 22 STRALIA"
Let’s have a look at the machine code in the file. Loading it into Ghidra and applying manual comments from leads to two sections. The first one loads the following file into memory and then calls the second section which I’ve called DecodeProgram.
6b7f 2abf6a LD HL,(BasicProgramStart) ;6ABF = start of BASIC RAM
6b82 ed4b1211 LD BC,(COMNT+12) ;taken from COMNT header = 803d - size
;
6b86 d3e0 OUT (MemLowDRAM),A ;Set $0000-$0FFF to D-RAM
6b88 d3e1 OUT (MemHighDRAM),A ;Set $D000-$FFFF to D-RAM
6b8a cd2a00 CALL RDDAT ;Read $803d bytes from tape into $6abf
6b8d daa56b JP C,LoadFailed ;Reset if load fails
6b90 cdaa6b CALL DecodeProgram ;undefined DecodeProgram()
6b93 ed7bb96a LD SP,(BasicStackPointer) ;Set up Basic stack
6b97 2abf6a LD HL,(BasicProgramStart) ;Set up the program memory
6b9a ed5b1211 LD DE,(COMNT+12)
6b9e 19 ADD HL,DE
6b9f 22b36a LD (BasicProgramEnd),HL ;Set end of program
6ba2 c3711c JP BasicCommandRun ;RUN the program
LoadFailed:
6ba5 d3e4 OUT (MemReset),A ;Return to power on state
6ba7 c30701 JP ?ER ;Loading error monitor routine
This loops through the read data and subtracts a value (starting with $BD and increasing by 1) from each byte. Following this process returns a BASIC program which is the actual game. To help this process it stores size information in that weird comment field.
DecodeProgram()
6baa 2a1211 LD HL,(COMNT+12) ;Size of data
6bad 7d LD A,L
6bae 84 ADD A,H ;Add the bytes of size together
6baf 57 LD D,A ;Put this in D
6bb0 e5 PUSH HL
6bb1 c1 POP BC ;Set BC to the value of HL
6bb2 2abf6a LD HL,(BasicProgramStart) ;Set HL to the start of data
DecodeLoop:
6bb5 7e LD A,HL ;Load A with the byte at HL
6bb6 92 SUB D ;Subtract D from it.
6bb7 77 LD HL,A ;Put it back
6bb8 23 INC HL
6bb9 14 INC D
6bba 0b DEC BC ;Change counters
6bbb 78 LD A,B
6bbc b1 OR C
6bbd 20f6 JR NZ,DecodeLoop ;If BC > 0 loop
6bbf c9 RET
6bbf c9 RET
When reading the code above it’s useful to remember that the convention for Z80 register use is that HL is the source, DE is the destination and BC is the count for any operation.
But why BASIC?
So why did it load BASIC first? This is to intialise RAM to be ready to run an encoded BASIC program. The way that the MZ-700 manages RAM is interesting. The MZ-700 has 64 kB of memory, but it also has separate video RAM and a token amount of ROM (the monitor referred to above).
It has a basic method of paging out memory, using the Z80 OUT instruction to manage which bits of memory are where. This is used by MZ-700 BASIC, which swaps out the ROM portion of memory to RAM. To help BASIC with the standard hardware routines it implements a version of the ROM monitor in RAM. When the reset button is pressed it also does not wipe memory.
So the process is roughly:
- User loads BASIC, this means that BASIC and RAM monitor is in memory
- User loads NWG-INTRO
- User resets the MZ-700, low memory is now mapped to ROM monitor, but RAM monitor and BASIC is still in memory, but cannot be accessed
- User type L (load command) which loads the loader code
- Loader code swaps all possible RAM in and then uses the RAM monitor to load the encoded BASIC game
- Loader code then calls the BASIC function for RUN
So it’s complex, but still relatively easy to implement, and easier to undo with some simple Python.