Deciphering GW-BASIC / BASICA protected programs

Christophe Lenclud

New Member
Publisher

Purpose

We describe the deciphering algorithm needed to unprotect source programs stored by GW-BASIC / BASICA.COM.

History

Microsoft BASIC interpreters were a common way of programming early IBM PC and compatible computers (1981 onwards). Several versions of Microsoft BASIC were used: first IBM PC came with “Cassette BASIC” in ROM, while “Disk BASIC” (BASIC.COM) and “Advanced BASIC” (BASICA.COM) shipped with DOS floppies. IBM compatible computers ran the equivalent GW-BASIC bundled with MS-DOS. They were eventually replaced by the QBasic interpreter with MS-DOS 5.0, or by the QuickBASIC compiler.

Source programs used by BASICA and GW-BASIC interpreters could be saved under three different formats:
  • The command SAVE “PROGRAM.BAS”, A stores an ASCII version of the program which could be read by or exported to other BASIC dialects.
  • The command SAVE “PROGRAM.BAS” stores the program as binary tokens to spare scarce memory and disk space. Althought not officialy published by Microsoft, the comprehensive format of these “tokenized” programs has been described by Norman De Forest and Dan Tobias. Detokenizers for Microsoft BASICs have been written to convert these programs back to ASCII.
  • The command SAVE “PROGRAM.BAS”, P was aimed at protecting the program by furthermore enciphering this tokenized file. Such protected programs cannot be listed or modified anymore under GW-BASIC / BASICA.

Unprotecting enciphered BASIC source programs

To unprotect the programs saved with the P option, we studied the code of GWBASIC.EXE (version 3.22, file size 80502 bytes, date/time 24 july 1987 00:00:02, MD5 = AB25516575579185CCA865D89E3E1A31). This particular file was shipped with Microsoft MS-DOS 3.30 for Wyse Technology OEM. The deciphering code embedded in GW-BASIC is equivalent to the following Assembly language listing:
Code:
; Input: DS:SI -> Data to be deciphered.
;          DS:DX -> After end of data.
Decipher_GWBASIC proc near
    mov    cx, 0D0Bh
    mov    di,si
    mov    bh,0
    cld
@@LoopDecipher:
    cmp    si,dx
    jz    short @@EndOfFile
    ; Decipher one byte...
    mov    bl,ch
    lodsb
    sub    al,cl
    xor    al, [bx + offset Key1 - 1]
    mov    bl,cl
    xor    al, [bx + offset Key2 - 1]
    add    al,ch
    stosb
    ; Next byte...
    dec    cl
    jnz    short @@NotZ1
    mov    cl,0Bh
@@NotZ1:
    dec    ch
    jnz    @@LoopDecipher
    mov    ch,0Dh
    jmp    @@LoopDecipher
@@EndOfFile:
    ret
Decipher_GWBASIC endp

Key1    db    9Ah, 0F7h, 19h, 83h,  24h, 63h, 43h, 83h,  75h, 0CDh, 8Dh, 84h, 0A9h
Key2    db    7Ch,  88h, 59h, 74h, 0E0h, 97h, 26h, 77h, 0C4h,  1Dh, 1Eh
This PROCEDURE is 45 bytes long. The actual code from Microsoft is 50% longer, but performs exactly the same operation. This deciphering computation has to begin with the second byte of the file. The very first byte has to be changed from FEh (signature of a protected file) to FFh (signature of an unprotected tokenized file). The last significant byte of the file will be set to 1Ah, but early versions of BASIC seems to leave some junk bytes as slack space at the end of the file.

The main deciphering computation apply XOR between the file and two keys embedded in GWBASIC.EXE. As these keys are 13 and 11 bytes long, this is equivalent to a key repeated every 13 * 11 = 143 bytes.

Including the key in the program violates Kerckhoffs's principle, so in fact this is more a source code obfuscation than a true strong cipher.

Knowing this deciphering algorithm, it is easy to write a program to unprotect the files. This is exactly what does “Unprotect Basic Version 1.10”, a 535-bytes long program by John Thomason (21 december 1990).

Which versions of Microsoft BASIC used this cipher?

By searching the binary signature of the two keys in an incomplete collection of DOS versions and IBM PC ROM images, we found them in several versions of Microsoft BASIC, suggesting that they all supported the same cipher:
  • BASICA.EXE (size 54272 bytes, 13 May 1983 12:00:00, MD5 = 28E22CAA7EC534A78D37AA3314690758) from "The COMPAQ Personal Computer DOS, Version 1.11" Rev E.
  • GWBASIC.EXE (size 59728 bytes, 05 June 1984 01:25:00, MD5 = 2FB3EB25944C27267626836435DE7369) "BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, Inc" from MS-DOS 1.25.
  • Floppy disk images of Compaq MS-DOS 1.10, 1.11, 1.12, 3.00, 3.31.
  • Floppy disk images of MS-DOS 1.25, 2.11, 3.10, 3.30.
  • and many other...
No version of BASIC.COM nor BASICA.COM were found to contain the keys, but several ROM from IBM computers embedded them: IBM BASIC 1.00 and 1.10 (1981), IBM computers 5160, 6162, 5170, PCjr. These ROM BASIC were not only run at computer power up if no operating system disk was used, but were also called by BASIC.COM and BASICA.COM running from DOS.
 
Last edited:
Top