Cracking Code Protection on Tape

A bit of different breaking of code protection this time. Many, many years ago my parents bought our first computer, a Sharp MZ-700. This was initially a disappointment to me as it had no decent games, but this had the side effect that meant that to get something out of it I had to teach myself how it worked. I taught myself how to program and even started learning Z80 assembler before we got a different computer with better games.

The platform with all its weird quirks has always held a special meaning for me, I even have one in a waterproof box in the garage.

As it’s a relatively rare system, there is only limited information about the platform around the Internet and still quite a bit of software that is still missing in action. One of these games was recently found and a high quality recording, cleaned-up image taken by the Twitter user @sharpworksMZ.

The game was a text adventure called Nuclear War Games from Softworx, which appeared to be a port of the Sega SC-3000 game known as Thermonuclear War Games and unofficially based on the film War Games.

What’s more interesting is that it encodes the main program so it can’t easily be viewed and copied as files that work with the emulators. In an attempt to extract the program, there’re some interesting features in how the tape files work.

Tapes?

Back in the day, software was sold on tape cassettes – typically the same tapes that would be used for audio, albeit shorter. Although this did make it easy to copy games if you had two tapes decks, it did mean that it took ages to load something as there was lots of redundancy to cope with flaws in the media.

If we load the wav file into audacity we can see how the system differentiates between a 1 and a 0. As can be seen below, the length of a “pulse” causes this difference, with a short pulse (shown in green) being a 0 and a long pulse (shown in red) being a 1.

Wav files consist of the digital representation of the sound wave, this is performed by sampling the wave multiple times a second and storing the current value. The original wav I received was recorded at 44100 Hz with 8-bit values, i.e. 44100 samples (and bytes) represent 1 second of real time. Using the Python wave library, this allows the treatment of a wav file as a file stream, and allows me to manage each sample individually.

The best reference for the tape format is at the Sharp MZ site. If you scroll down that site, you’ll see a section about short an long pulse timings. For the MZ-700 (and both the MZ-80K and MZ-80A) then the level is read 368 μs after a rising edge. If this level is high, then it is a long pulse (which translates as a bit value of 1); if it is low then it is a short pulse (which translates as a bit value of 0).

A sampling rate of 44100 works out that every sample is ~ 22.7 μs; so we need to take a sample roughly 16 samples after we detect a rising edge . This leads to an easy algorithm: look for values, if the value is greater than 128 + a noise threshold then we have a rising edge, skip the next 15 samples. If the next one is > 128 then it’s a 1; if not then it is a 0.

This leads to some very simple Python which returns 0, 1; and -1 for an error.

def read_pulse(wfile):
	value=0
	try:
		read_edge=wfile.readframes(1)[0]
		while read_edge > 138:
			read_edge=wfile.readframes(1)[0]
		read_edge=wfile.readframes(1)[0]
		while read_edge < 118:
			read_edge=wfile.readframes(1)[0]
			
		# We have a rising edge, now get to sample point
		dummy=wfile.readframes(15)
		read_point=wfile.readframes(1)
		if read_point[0] > 148:
			value=1
	except:
		# end of stream
		value=-1

	return value

I could just read this as bytes, but I wanted to abstract tape reading from converting the bits, so I decided to read the whole tape in one go and convert it to a really long bitstream. This allows manipulation where the data isn’t on a byte boundary.

Data on the tape

Now we can convert the tape sound as a bitstream we can interpret the data on there. Looking at the linked document above shows that each file on the tape has a header and a file body. A file header has the following components

  • LGAP (long gap), which is 22000 long pulses
  • LTM (long tape mark) which is 40 long pulse, 40 short pulses
  • L (long) which is 1 long pulse
  • HDR the actual header
  • CHKH, a 16-bit checksum for the header (count of the number of bits set to 1 in the header)
  • L again

Then, there’s a duplication, which is only loaded if the header fails its checksum; this wasn’t present on this tape:

  • 256S, or 256 short pulses
  • HDRC, a copy of the header
  • CHKH, a 16-bit checksum of the header copy
  • L again

A file structure is very similar, except the LGAP and LTM are SGAPs and STMs:

  • SGAP (short gap), which is 11000 long pulses
  • STM (short tape mark) which is 20 long pulse, 20 short pulses
  • L (long) which is 1 long pulse
  • FILE the actual file
  • CHKF, a 16-bit checksum for the file (count of the number of bits set to 1 in the header)
  • L again

Then, there’s a duplication, which is only loaded if the file fails its checksum; this wasn’t present on this tape:

  • 256S, or 256 short pulses
  • FILEC, a copy of the file
  • CHKF, a 16-bit checksum of the file copy
  • L again

This seems strange, but makes sense when you bear in mind how bad tape players could be in the early 80s. It simple way of differentiating the file data and file header and providing duplicate data to cover any tape errors. The file section is just the raw bytes for the file, whether it’s BASIC or machine code.

The HDR section needs some further defining, using the RFC type ASCII diagrams, it looks like this:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|T|               NAME              | S | L | E |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
|                                                               |
+                                                               +
|                            COMMENT                            |
+                                                               +
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

With the following fields:

  • T is file type, from $01 for a machine code program to $05 for an MZ-700 BASIC program
  • NAME is the file name, terminated by a $0D
  • S is the file size
  • L is the file load address
  • E is the file execution address
  • COMMENT is a comment

How the game stores its program

Now we understand how the data is stored I wrote a simple bit of Python to read all the various bits. The first file is a BASIC file called NWG-INTRO of size $0ffc. I wrote a detokeniser for MZ-700 BASIC many years ago. Running the extracted file through this reveals the intro, it ends with this line of BASIC:

300 PRINT"1.PRESS RESET2.TYPE L'3.PRESS "

This tells us the to play the game we have to load BASIC (more on this later), run the intro and then press the reset button and use the monitor’s L command to load the next file. This is common with machine code games on the MZ-700; but why have we wasted our time loading BASIC? To give context, the MZ-700 was relatively unique in that it was a “clean computer” – it didn’t have BASIC in ROM, all it had was a simple BIOS, known as the monitor, if you want to program in BASIC or load a BASIC program you would need to load the BASIC interpreter first, taking about 6 minutes.

Anyway, let’s look at the next file. It’s a tiny ($41 bytes) machine code file, with a rather strange comment field (image). What’s more, this is the last file header on the tape, following it is another file with strange random data; but no header. Curious!

Type:     Machine code program
Filename: NWG-MAIN
Size:     $0041
Load:     $6b7f
Exec:     $6b7f
Comment:  bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00=\x80\xcfk\x00\x00MAIN\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00CONTACT:":PRINT:PRINT:PRINTSPC(3);:PRINT[2, ]"AUSTRALIA"')

00000000: 00 00 00 00 00 00 00 00  00 00 3D 80 CF 6B 00 00  ..........=..k..
00000010: 4D 41 49 4E 00 00 00 00  00 00 00 00 00 00 00 00  MAIN............
00000020: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
00000030: 43 4F 4E 54 41 43 54 3A  22 3A 50 52 49 4E 54 3A  CONTACT:":PRINT:
00000040: 50 52 49 4E 54 3A 50 52  49 4E 54 53 50 43 28 33  PRINT:PRINTSPC(3
00000050: 29 3B 3A 50 52 49 4E 54  5B 32 2C 20 5D 22 41 55  );:PRINT[2, ]"AU
00000060: 53 54 52 41 4C 49 41 22                           STRALIA"

Let’s have a look at the machine code in the file. Loading it into Ghidra and applying manual comments from leads to two sections. The first one loads the following file into memory and then calls the second section which I’ve called DecodeProgram.

6b7f    2abf6a          LD          HL,(BasicProgramStart)     ;6ABF = start of BASIC RAM
6b82    ed4b1211        LD          BC,(COMNT+12)              ;taken from COMNT header = 803d - size
                                                               ;
6b86    d3e0            OUT         (MemLowDRAM),A             ;Set $0000-$0FFF to D-RAM
6b88    d3e1            OUT         (MemHighDRAM),A            ;Set $D000-$FFFF to D-RAM
6b8a    cd2a00          CALL        RDDAT                      ;Read $803d bytes from tape into $6abf
6b8d    daa56b          JP          C,LoadFailed               ;Reset if load fails
6b90    cdaa6b          CALL        DecodeProgram              ;undefined DecodeProgram()
6b93    ed7bb96a        LD          SP,(BasicStackPointer)     ;Set up Basic stack
6b97    2abf6a          LD          HL,(BasicProgramStart)     ;Set up the program memory
6b9a    ed5b1211        LD          DE,(COMNT+12)              
6b9e    19              ADD         HL,DE                      
6b9f    22b36a          LD          (BasicProgramEnd),HL       ;Set end of program
6ba2    c3711c          JP          BasicCommandRun            ;RUN the program
                    LoadFailed: 
6ba5    d3e4            OUT         (MemReset),A               ;Return to power on state
6ba7    c30701          JP          ?ER                        ;Loading error monitor routine

This loops through the read data and subtracts a value (starting with $BD and increasing by 1) from each byte. Following this process returns a BASIC program which is the actual game. To help this process it stores size information in that weird comment field.

                    DecodeProgram()
6baa    2a1211          LD          HL,(COMNT+12)              ;Size of data
6bad    7d              LD          A,L                        
6bae    84              ADD         A,H                        ;Add the bytes of size together
6baf    57              LD          D,A                        ;Put this in D
6bb0    e5              PUSH        HL                         
6bb1    c1              POP         BC                         ;Set BC to the value of HL
6bb2    2abf6a          LD          HL,(BasicProgramStart)     ;Set HL to the start of data
                    DecodeLoop:
6bb5    7e              LD          A,HL                       ;Load A with the byte at HL
6bb6    92              SUB         D                          ;Subtract D from it.
6bb7    77              LD          HL,A                       ;Put it back
6bb8    23              INC         HL                         
6bb9    14              INC         D                          
6bba    0b              DEC         BC                         ;Change counters
6bbb    78              LD          A,B                        
6bbc    b1              OR          C                          
6bbd    20f6            JR          NZ,DecodeLoop              ;If BC > 0 loop
6bbf    c9              RET
6bbf    c9              RET

When reading the code above it’s useful to remember that the convention for Z80 register use is that HL is the source, DE is the destination and BC is the count for any operation.

But why BASIC?

So why did it load BASIC first? This is to intialise RAM to be ready to run an encoded BASIC program. The way that the MZ-700 manages RAM is interesting. The MZ-700 has 64 kB of memory, but it also has separate video RAM and a token amount of ROM (the monitor referred to above).

It has a basic method of paging out memory, using the Z80 OUT instruction to manage which bits of memory are where. This is used by MZ-700 BASIC, which swaps out the ROM portion of memory to RAM. To help BASIC with the standard hardware routines it implements a version of the ROM monitor in RAM. When the reset button is pressed it also does not wipe memory.

So the process is roughly:

  1. User loads BASIC, this means that BASIC and RAM monitor is in memory
  2. User loads NWG-INTRO
  3. User resets the MZ-700, low memory is now mapped to ROM monitor, but RAM monitor and BASIC is still in memory, but cannot be accessed
  4. User type L (load command) which loads the loader code
  5. Loader code swaps all possible RAM in and then uses the RAM monitor to load the encoded BASIC game
  6. Loader code then calls the BASIC function for RUN

So it’s complex, but still relatively easy to implement, and easier to undo with some simple Python.