Purpose

We describe the deciphering algorithm needed to unprotect source programs stored by GW-BASIC / BASICA.COM.

History

Microsoft BASIC interpreters were a common way of programming early IBM PC and compatible computers (1981 onwards). Several versions of Microsoft BASIC were used: first IBM PC came with “Cassette BASIC” in ROM, while “Disk BASIC” (BASIC.COM) and “Advanced BASIC” (BASICA.COM) shipped with DOS floppies. IBM compatible computers ran the equivalent GW-BASIC bundled with MS-DOS (online manual). They were eventually replaced by the QBasic interpreter with MS-DOS 5.0, or by the QuickBASIC compiler.

Source programs used by BASICA and GW-BASIC interpreters could be saved under three different formats:
  • The command SAVE “PROGRAM.BAS”, A stores an ASCII version of the program which can easily be read by a text editor or exported to other BASIC dialects.
  • The command SAVE “PROGRAM.BAS” stores the program as binary tokens to spare scarce memory and disk space, and to speed up loading time from old floppy drives. Althought not officialy published by Microsoft, the comprehensive format of these “tokenized” programs has been described by Norman De Forest and Dan Tobias. Detokenizers for Microsoft BASICs have been written to convert these programs back to ASCII.
  • The command SAVE “PROGRAM.BAS”, P was aimed at protecting the program by furthermore enciphering this tokenized file. Such protected programs cannot be listed or modified anymore under GW-BASIC / BASICA.

Unprotecting enciphered BASIC source programs

To unprotect the programs saved with the P option, we studied the code of GWBASIC.EXE (version 3.22, file size 80502 bytes, date/time 24 july 1987 00:00:02, MD5 = AB25516575579185CCA865D89E3E1A31). This particular file was shipped with Microsoft MS-DOS 3.30 for Wyse Technology OEM. The deciphering code embedded in GW-BASIC is equivalent to the following Assembly language listing:
Code:
; Input: DS:SI -> Data to be deciphered.
;          DS:DX -> After end of data.
Decipher_GWBASIC proc near
    mov    cx, 0D0Bh
    mov    di,si
    mov    bh,0
    cld
@@LoopDecipher:
    cmp    si,dx
    jz     short @@EndOfFile
    ; Decipher one byte...
    mov    bl,ch
    lodsb
    sub    al,cl
    xor    al, [bx + offset Key1 - 1]
    mov    bl,cl
    xor    al, [bx + offset Key2 - 1]
    add    al,ch
    stosb
    ; Next byte...
    dec    cl
    jnz    short @@NotZ1
    mov    cl,0Bh
@@NotZ1:
    dec    ch
    jnz    @@LoopDecipher
    mov    ch,0Dh
    jmp    @@LoopDecipher
@@EndOfFile:
    ret
Decipher_GWBASIC endp

Key1    db    9Ah, 0F7h, 19h, 83h,  24h, 63h, 43h, 83h,  75h, 0CDh, 8Dh, 84h, 0A9h
Key2    db    7Ch,  88h, 59h, 74h, 0E0h, 97h, 26h, 77h, 0C4h,  1Dh, 1Eh

This procedure is 45 bytes long. The actual code from Microsoft is 50% longer, but performs exactly the same operation. This deciphering computation has to begin with the second byte of the file. The very first byte has to be changed from FEh (signature of a protected file) to FFh (signature of an unprotected tokenized file). The last significant byte of the file will be set to 1Ah, but early versions of BASIC seems to leave some junk bytes as slack space at the end of the file.

The main deciphering computation apply XOR between the file and two keys embedded in GWBASIC.EXE. As these keys are 13 and 11 bytes long, this is equivalent to a key repeated every 13 * 11 = 143 bytes. This is equivalent to a polyalphabetic substitution cipher.

Using a constant key embedded in the program violates Kerckhoffs's principle, so in fact this is more a source code obfuscation than a true strong cipher.

Knowing this deciphering algorithm, it is easy to write a program to unprotect the files.

Analysis of the Source Code of the GW-BASIC Interpreter

In may 2020, Microsoft released the GW-BASIC Interpreter Source Code as read-only open source software under MIT License. The released version reflects the state of the GW-BASIC interpreter source code as it was in 1983, and is labelled "(C) Copyright Microsoft 1982". Michal Necasek attempted to figure out the exact date of this code and guessed it would be roughly from mid-1983.

The file GWMAIN.ASM contains this notice:
Code:
--------- ---- -- ---- ----- --- ---- -----
COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
--------- ---- -- ---- ----- --- ---- -----

ORIGINALLY WRITTEN ON THE PDP-10 FROM
FEBRUARY 9 TO  APRIL 9 1975

BILL GATES WROTE A LOT OF STUFF.
PAUL ALLEN WROTE A LOT OF OTHER STUFF AND FAST CODE.
MONTE DAVIDOFF WROTE THE MATH PACKAGE (F4I.MAC).

GW-BASIC source code consists of 40 files of 8086 assembly language. Each source files contains a header stating "This translation created 10-Feb-83 by Version 4.3", as Microsoft generated the 8086 assembly language code from the sources of a master implementation. Comments left in the files are refering to LXI instruction or "[B,C]", "[D,E]" and "[H,L]" registers, therefore suggesting that the master could be 8080/8085 source code.

The deciphering of a protected BASIC program is done by the procedure PDECOD in source file GIODSK.ASM:
Code:
    .RADIX    10
RET12:    RET
    N1=11D            ;Number of bytes to use from ATNCON
    N2=13D            ;Number of bytes to use from SINCON

PDECOD:    MOV    CX,OFFSET N1+N2*256    ;Initialize both counters
    MOV    BX,WORD PTR TXTTAB    ;Starting point
    MOV    DX,BX        ;Into [D,E]
DECDBL:
    MOV    BX,WORD PTR VARTAB    ;At end?
    CMP    BX,DX        ;Test
    JZ     RET12        ;Yes
    MOV    BX,OFFSET $LOGP
    MOV    AL,CH
    CBW
    ADD    BX,AX
    MOV    SI,DX
    CLD              ;Set Post-Increment mode
    LODSB            ;[AL]=byte from program
    SUB    AL,CL        ;Subtract counter for randomness
    XOR    AL,BYTE PTR CS:0[BX]    ;XOR on this one too
    PUSH   AX        ;Save result
    MOV    BX,OFFSET $EXPCN
    MOV    AL,CL        ;Use [CL] to index into it
    CBW
    ADD    BX,AX
    POP    AX           ;Get back current byte
    XOR    AL,BYTE PTR CS:0[BX]    ;XOR entry
    ADD    AL,CH        ;Add counter for no reason
    MOV    DI,DX
    CLD              ;Set Post-Increment mode
    STOSB            ;store [AL] back in program
    INC    DX        ;Increment pointer
    DEC    CL        ;decrment first table index
    JNZ    CNTZR2        ;Still non-Zero
    MOV    CL,LOW OFFSET N1    ;Re-initialize counter 1
CNTZR2:    DEC    CH
    JNZ    DECDBL        ;Decrement counter-2, Still non-zero, go for more
    MOV    CH,LOW OFFSET N2    ;Re-initialize counter 2
    JMP    SHORT DECDBL    ;Keep going until done

The keys are parts of constant tables used by other mathematical routines of GW-BASIC in file MATH1.ASM (these numbers are in octal notation, therefore octal 232 is hexadecimal 9A and octal 174 is hexadecimal 7C):
Code:
    .RADIX    8        ; To be safe
    ; ...

;**********************************************************
;FOR LOG CALCULATIONS HART ALGORITHM 2524 WILL BE USED
;IN THIS ALGORITHM WE WILL CALCULATE BASE 2 LOG AS FOLLOWS
;LOG(X)=P(X)/Q(X)
;***************************************************************
$LOGP:    DB    4
    DB    232        ;4.8114746
    DB    367
    DB    031
    DB    203
    DB    044        ;6.105852
    DB    143
    DB    103
    DB    203
    DB    165        ;-8.86266
    DB    315
    DB    215
    DB    204
    DB    251        ;-2.054667
    DB    177
    DB    203
    DB    202

    ; ...

;*********************************************************
;$EXPCN CONTAINS THE COEFFICIENTS FOR POLYNOMIAL EVALUATION
;OF LOG BASE 2 OF X WHERE .5.LE.X.LE.1
;THE COEFFICIENTS ARE FROM HART #1302
;***********************************************************
$EXPCN: DB    7        ;DEGREE + 1
    DB    174        ;.00020745577403-
    DB    210
    DB    131
    DB    164
    DB    340        ;.00127100574569-
    DB    227
    DB    046
    DB    167
    DB    304        ;.00965065093202+
    DB    035
    DB    036
    DB    172
    DB    136        ;.05549656508324+
    DB    120
    DB    143
    DB    174
    DB    032        ;.24022713817633-
    DB    376
    DB    165
    DB    176
    DB    030        ;.69314717213716+
    DB    162
    DB    061
    DB    200
    DB    000        ;1.0
    DB    000
    DB    000
    DB    201


Which versions of Microsoft BASIC used this cipher?

Searching files for the binary string of the two keys reveals possible users of this cipher. However, as these byte sequences are used by BASIC mathematical routines, this is only a clue and not an evidence. By searching the binary signature of the two keys in an incomplete collection of DOS versions and IBM PC ROM images, we found them in several versions of Microsoft BASIC:
  • BASICA.EXE (size 54272 bytes, 13 May 1983 12:00:00, MD5 = 28E22CAA7EC534A78D37AA3314690758) from "The COMPAQ Personal Computer DOS, Version 1.11" Rev E.
  • GWBASIC.EXE (size 59728 bytes, 05 June 1984 01:25:00, MD5 = 2FB3EB25944C27267626836435DE7369) "BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, Inc" from MS-DOS 1.25.
  • Floppy disk images of Compaq MS-DOS 1.10, 1.11, 1.12, 3.00, 3.31.
  • Floppy disk images of MS-DOS 1.25, 2.11, 3.10, 3.30.
  • and many other...
No version of BASIC.COM nor BASICA.COM were found to contain the keys, but several ROM from IBM computers embedded them: IBM BASIC 1.00 and 1.10 (1981), IBM computers 5160, 6162, 5170, PCjr. These ROM BASIC were not only run at computer power up if no operating system disk was used, but were also called by BASIC.COM and BASICA.COM running from DOS.

The signature of the deciphering procedure (not the keys) was found in the following binaries:
  • GW-BASIC 3.22, (C) Copyright Microsoft 1983,1984,1985,1986,1987 (US version)
  • GW-BASIC 3.22, (C) Copyright Microsoft 1983,1984,1985,1986,1987 (french version)
  • GW-BASIC 3.20, (C) Copyright Microsoft 1983,1984,1985,1986,1987
  • GW-BASIC Version 1.12.03, Copyright (C) 1984 Corona Data Systems, Inc.
  • GW-BASIC 2.01, Rev. 1.02, Development Rev. 1.0, for Olivetti Personal Computer, Copyright (C) by Olivetti, 1984
  • GW-BASIC 3.22, Rev. 3.29 for Olivetti Personal Computer, Copyright (C) by Olivetti, 1987

How to execute GW-BASIC files nowadays?

PC-BASIC is a free, cross-platform interpreter for GW-BASIC, Advanced BASIC (BASICA), PCjr Cartridge Basic and Tandy 1000 GWBASIC. It interprets these BASIC dialects with a high degree of accuracy, aiming for bug-for-bug compatibility. PC-BASIC emulates the most common video and audio hardware on which these BASICs used to run. PC-BASIC runs plain-text, tokenised and protected .BAS files. It implements floating-point arithmetic in the Microsoft Binary Format (MBF) and can therefore read and write binary data files created by GW-BASIC.

Other independant work on this topic

John Thomason released “Unprotect Basic Version 1.10” (UNPBASIC.COM , a 535-bytes long DOS program dated 21 december 1990) to decipher GW-BASIC files. This link was alive in 2018 but appears to be dead in 2020.

The protection flag PROFLG defined in file GWDATA.ASM and checked for by PROCHK in file GIODSK.ASM governs source file protection. When PROFLG is non-zero, several BASIC direct statements are disabled to prevent access to the protected program: SAVE without the P option is prohibited; LIST and LLIST is prohibited; PEEK, POKE, BSAVE and BLOAD are disabled, and CHAIN cannot include the MERGE option. Norman L. De Forest experimentally recovered PROFLG offsets accross several versions of GW-BASIC and devised a complicated exploit to reset this flag from within the GW-BASIC interpreter. However, deciphering the file using the native algorithm is easier and more general.

The American Cryptogram Association (ACA) publishes a bimonthly periodical journal called The Cryptogram. In their Computer Supplement #19 of summer 1994, Paul C. Kocher published BASCRACK, a C program to decipher GW-BASIC protected files:
C:
/* BASCRACK.C

GW-BASIC for MS-DOS appears to encrypt a program using a substitution cipher
with period 143.  There is an 11-byte key and a 13-byte key that are used to
"protect" the program. Running this program will attempt to crack that protection.
*/

#include <stdio.h>
int main(int argc, char **argv) {
    unsigned char key1[13]={
        0xA9,0x84,0x8D,0xCD,0x75,0x83,0x43,0x63,0x24,0x83,0x19,0xF7,0x9A};
    unsigned char key2[11]={
        0x1E,0x1D,0xC4,0x77,0x26,0x97,0xE0,0x74,0x59,0x88,0x7C};
    int nextbyte, index;
    unsigned char c;
    FILE *infile, *outfile;

    if (argc != 3) {
        printf("Utility to decrypt GWBASIC/BASICA files saved with \",p\"\n\n"
               "Copyright 1992 by Paul C. Kocher. All rights reserved.\n\n"
               "Usage: BASCRACK encrypted.bas outfile.bas\n");
        exit(1);
    }
    if ((infile=fopen(argv[1],"rb"))==NULL || (outfile=fopen(argv[2],"wb"))==NULL) {
        printf("Error opening file.\n");
        exit(1);
    }
    if (fgetc(infile) == 0xFE) { fputc(0xFF, outfile); }
    else { printf("Not an encrypted BASIC file\n");
        exit(1);
    }
    index = 0;
    nextbyte=fgetc(infile);
    while (c=nextbyte, (nextbyte=fgetc(infile)) != EOF) {
        c -= 11 - (index % 11);
        c ^= key1[ index % 13 ];
        c ^= key2[ index % 11 ];
        c += 13 - (index % 13);
        fputc(c, outfile);
        index = (index+1) % (13*11);
    }
    fputc(c, outfile); /* Don't decrypt the EOF character */
    return 0;
}

In the Computer Supplement #21 of Spring 1996, under "Basic Peeks, Pokes and Subroutines", Mike Todd published what he said to be a way to unprotect a BASIC program that was saved with ",P":
First you must create a file to overlay the ,P setting. From the DOS prompt start up BASICA or BASIC and enter the BASIC command BSAVE "UN.P",1124,1. This will create a file on your default drive named UN.P.
Next LOAD your program that had been saved using ",P". If it was named MYPROG.BAS the BASIC command would be LOAD "MYPROG".
Now to use the UN.P file to overlay the protection setting use the command BLOAD "UN.P",1124.
You may now use the LIST, EDIT and SAVE commands as usual.
I did not checked this trick under BASICA, but I checked it under several versions of GW-BASIC, and all of them denied to run the BLOAD statement, since it is inhibited by the protected save:
DOS VersionFile SizeMD5Result
GW-BASIC version 3.22Failure
GW-BASIC 2.01 for Olivetti Computer(1983, 1984)Failure
BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, IncMS-DOS 1.25GWBASIC.EXE (05 June 1984 01:25:00)597282FB3EB25944C27267626836435DE7369Failure
 
Last edited:
Back
Top