Purpose
We describe the deciphering algorithm needed to unprotect source programs stored by GW-BASIC / BASICA.COM.History
Microsoft BASIC interpreters were a common way of programming early IBM PC and compatible computers (1981 onwards). Several versions of Microsoft BASIC were used: first IBM PC came with “Cassette BASIC” in ROM, while “Disk BASIC” (BASIC.COM) and “Advanced BASIC” (BASICA.COM) shipped with DOS floppies. IBM compatible computers ran the equivalent GW-BASIC bundled with MS-DOS (online manual). They were eventually replaced by the QBasic interpreter with MS-DOS 5.0, or by the QuickBASIC compiler.Source programs used by BASICA and GW-BASIC interpreters could be saved under three different formats:
- The command SAVE “PROGRAM.BAS”, A stores an ASCII version of the program which can easily be read by a text editor or exported to other BASIC dialects.
- The command SAVE “PROGRAM.BAS” stores the program as binary tokens to spare scarce memory and disk space, and to speed up loading time from old floppy drives. Althought not officialy published by Microsoft, the comprehensive format of these “tokenized” programs has been described by Norman De Forest and Dan Tobias. Detokenizers for Microsoft BASICs have been written to convert these programs back to ASCII.
- The command SAVE “PROGRAM.BAS”, P was aimed at protecting the program by furthermore enciphering this tokenized file. Such protected programs cannot be listed or modified anymore under GW-BASIC / BASICA.
Unprotecting enciphered BASIC source programs
To unprotect the programs saved with the P option, we studied the code of GWBASIC.EXE (version 3.22, file size 80502 bytes, date/time 24 july 1987 00:00:02, MD5 = AB25516575579185CCA865D89E3E1A31). This particular file was shipped with Microsoft MS-DOS 3.30 for Wyse Technology OEM. The deciphering code embedded in GW-BASIC is equivalent to the following Assembly language listing:
Code:
; Input: DS:SI -> Data to be deciphered.
; DS:DX -> After end of data.
Decipher_GWBASIC proc near
mov cx, 0D0Bh
mov di,si
mov bh,0
cld
@@LoopDecipher:
cmp si,dx
jz short @@EndOfFile
; Decipher one byte...
mov bl,ch
lodsb
sub al,cl
xor al, [bx + offset Key1 - 1]
mov bl,cl
xor al, [bx + offset Key2 - 1]
add al,ch
stosb
; Next byte...
dec cl
jnz short @@NotZ1
mov cl,0Bh
@@NotZ1:
dec ch
jnz @@LoopDecipher
mov ch,0Dh
jmp @@LoopDecipher
@@EndOfFile:
ret
Decipher_GWBASIC endp
Key1 db 9Ah, 0F7h, 19h, 83h, 24h, 63h, 43h, 83h, 75h, 0CDh, 8Dh, 84h, 0A9h
Key2 db 7Ch, 88h, 59h, 74h, 0E0h, 97h, 26h, 77h, 0C4h, 1Dh, 1Eh
This procedure is 45 bytes long. The actual code from Microsoft is 50% longer, but performs exactly the same operation. This deciphering computation has to begin with the second byte of the file. The very first byte has to be changed from FEh (signature of a protected file) to FFh (signature of an unprotected tokenized file). The last significant byte of the file will be set to 1Ah, but early versions of BASIC seems to leave some junk bytes as slack space at the end of the file.
The main deciphering computation apply XOR between the file and two keys embedded in GWBASIC.EXE. As these keys are 13 and 11 bytes long, this is equivalent to a key repeated every 13 * 11 = 143 bytes. This is equivalent to a polyalphabetic substitution cipher.
Using a constant key embedded in the program violates Kerckhoffs's principle, so in fact this is more a source code obfuscation than a true strong cipher.
Knowing this deciphering algorithm, it is easy to write a program to unprotect the files.
Analysis of the Source Code of the GW-BASIC Interpreter
In may 2020, Microsoft released the GW-BASIC Interpreter Source Code as read-only open source software under MIT License. The released version reflects the state of the GW-BASIC interpreter source code as it was in 1983, and is labelled "(C) Copyright Microsoft 1982". Michal Necasek attempted to figure out the exact date of this code and guessed it would be roughly from mid-1983.The file GWMAIN.ASM contains this notice:
Code:
--------- ---- -- ---- ----- --- ---- -----
COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
--------- ---- -- ---- ----- --- ---- -----
ORIGINALLY WRITTEN ON THE PDP-10 FROM
FEBRUARY 9 TO APRIL 9 1975
BILL GATES WROTE A LOT OF STUFF.
PAUL ALLEN WROTE A LOT OF OTHER STUFF AND FAST CODE.
MONTE DAVIDOFF WROTE THE MATH PACKAGE (F4I.MAC).
GW-BASIC source code consists of 40 files of 8086 assembly language. Each source files contains a header stating "This translation created 10-Feb-83 by Version 4.3", as Microsoft generated the 8086 assembly language code from the sources of a master implementation. Comments left in the files are refering to LXI instruction or "[B,C]", "[D,E]" and "[H,L]" registers, therefore suggesting that the master could be 8080/8085 source code.
The deciphering of a protected BASIC program is done by the procedure PDECOD in source file GIODSK.ASM:
Code:
.RADIX 10
RET12: RET
N1=11D ;Number of bytes to use from ATNCON
N2=13D ;Number of bytes to use from SINCON
PDECOD: MOV CX,OFFSET N1+N2*256 ;Initialize both counters
MOV BX,WORD PTR TXTTAB ;Starting point
MOV DX,BX ;Into [D,E]
DECDBL:
MOV BX,WORD PTR VARTAB ;At end?
CMP BX,DX ;Test
JZ RET12 ;Yes
MOV BX,OFFSET $LOGP
MOV AL,CH
CBW
ADD BX,AX
MOV SI,DX
CLD ;Set Post-Increment mode
LODSB ;[AL]=byte from program
SUB AL,CL ;Subtract counter for randomness
XOR AL,BYTE PTR CS:0[BX] ;XOR on this one too
PUSH AX ;Save result
MOV BX,OFFSET $EXPCN
MOV AL,CL ;Use [CL] to index into it
CBW
ADD BX,AX
POP AX ;Get back current byte
XOR AL,BYTE PTR CS:0[BX] ;XOR entry
ADD AL,CH ;Add counter for no reason
MOV DI,DX
CLD ;Set Post-Increment mode
STOSB ;store [AL] back in program
INC DX ;Increment pointer
DEC CL ;decrment first table index
JNZ CNTZR2 ;Still non-Zero
MOV CL,LOW OFFSET N1 ;Re-initialize counter 1
CNTZR2: DEC CH
JNZ DECDBL ;Decrement counter-2, Still non-zero, go for more
MOV CH,LOW OFFSET N2 ;Re-initialize counter 2
JMP SHORT DECDBL ;Keep going until done
The keys are parts of constant tables used by other mathematical routines of GW-BASIC in file MATH1.ASM (these numbers are in octal notation, therefore octal 232 is hexadecimal 9A and octal 174 is hexadecimal 7C):
Code:
.RADIX 8 ; To be safe
; ...
;**********************************************************
;FOR LOG CALCULATIONS HART ALGORITHM 2524 WILL BE USED
;IN THIS ALGORITHM WE WILL CALCULATE BASE 2 LOG AS FOLLOWS
;LOG(X)=P(X)/Q(X)
;***************************************************************
$LOGP: DB 4
DB 232 ;4.8114746
DB 367
DB 031
DB 203
DB 044 ;6.105852
DB 143
DB 103
DB 203
DB 165 ;-8.86266
DB 315
DB 215
DB 204
DB 251 ;-2.054667
DB 177
DB 203
DB 202
; ...
;*********************************************************
;$EXPCN CONTAINS THE COEFFICIENTS FOR POLYNOMIAL EVALUATION
;OF LOG BASE 2 OF X WHERE .5.LE.X.LE.1
;THE COEFFICIENTS ARE FROM HART #1302
;***********************************************************
$EXPCN: DB 7 ;DEGREE + 1
DB 174 ;.00020745577403-
DB 210
DB 131
DB 164
DB 340 ;.00127100574569-
DB 227
DB 046
DB 167
DB 304 ;.00965065093202+
DB 035
DB 036
DB 172
DB 136 ;.05549656508324+
DB 120
DB 143
DB 174
DB 032 ;.24022713817633-
DB 376
DB 165
DB 176
DB 030 ;.69314717213716+
DB 162
DB 061
DB 200
DB 000 ;1.0
DB 000
DB 000
DB 201
Which versions of Microsoft BASIC used this cipher?
Searching files for the binary string of the two keys reveals possible users of this cipher. However, as these byte sequences are used by BASIC mathematical routines, this is only a clue and not an evidence. By searching the binary signature of the two keys in an incomplete collection of DOS versions and IBM PC ROM images, we found them in several versions of Microsoft BASIC:- BASICA.EXE (size 54272 bytes, 13 May 1983 12:00:00, MD5 = 28E22CAA7EC534A78D37AA3314690758) from "The COMPAQ Personal Computer DOS, Version 1.11" Rev E.
- GWBASIC.EXE (size 59728 bytes, 05 June 1984 01:25:00, MD5 = 2FB3EB25944C27267626836435DE7369) "BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, Inc" from MS-DOS 1.25.
- Floppy disk images of Compaq MS-DOS 1.10, 1.11, 1.12, 3.00, 3.31.
- Floppy disk images of MS-DOS 1.25, 2.11, 3.10, 3.30.
- and many other...
The signature of the deciphering procedure (not the keys) was found in the following binaries:
- GW-BASIC 3.22, (C) Copyright Microsoft 1983,1984,1985,1986,1987 (US version)
- GW-BASIC 3.22, (C) Copyright Microsoft 1983,1984,1985,1986,1987 (french version)
- GW-BASIC 3.20, (C) Copyright Microsoft 1983,1984,1985,1986,1987
- GW-BASIC Version 1.12.03, Copyright (C) 1984 Corona Data Systems, Inc.
- GW-BASIC 2.01, Rev. 1.02, Development Rev. 1.0, for Olivetti Personal Computer, Copyright (C) by Olivetti, 1984
- GW-BASIC 3.22, Rev. 3.29 for Olivetti Personal Computer, Copyright (C) by Olivetti, 1987
How to execute GW-BASIC files nowadays?
PC-BASIC is a free, cross-platform interpreter for GW-BASIC, Advanced BASIC (BASICA), PCjr Cartridge Basic and Tandy 1000 GWBASIC. It interprets these BASIC dialects with a high degree of accuracy, aiming for bug-for-bug compatibility. PC-BASIC emulates the most common video and audio hardware on which these BASICs used to run. PC-BASIC runs plain-text, tokenised and protected .BAS files. It implements floating-point arithmetic in the Microsoft Binary Format (MBF) and can therefore read and write binary data files created by GW-BASIC.Other independant work on this topic
John Thomason released “Unprotect Basic Version 1.10” (UNPBASIC.COM , a 535-bytes long DOS program dated 21 december 1990) to decipher GW-BASIC files. This link was alive in 2018 but appears to be dead in 2020.The protection flag PROFLG defined in file GWDATA.ASM and checked for by PROCHK in file GIODSK.ASM governs source file protection. When PROFLG is non-zero, several BASIC direct statements are disabled to prevent access to the protected program: SAVE without the P option is prohibited; LIST and LLIST is prohibited; PEEK, POKE, BSAVE and BLOAD are disabled, and CHAIN cannot include the MERGE option. Norman L. De Forest experimentally recovered PROFLG offsets accross several versions of GW-BASIC and devised a complicated exploit to reset this flag from within the GW-BASIC interpreter. However, deciphering the file using the native algorithm is easier and more general.
The American Cryptogram Association (ACA) publishes a bimonthly periodical journal called The Cryptogram. In their Computer Supplement #19 of summer 1994, Paul C. Kocher published BASCRACK, a C program to decipher GW-BASIC protected files:
C:
/* BASCRACK.C
GW-BASIC for MS-DOS appears to encrypt a program using a substitution cipher
with period 143. There is an 11-byte key and a 13-byte key that are used to
"protect" the program. Running this program will attempt to crack that protection.
*/
#include <stdio.h>
int main(int argc, char **argv) {
unsigned char key1[13]={
0xA9,0x84,0x8D,0xCD,0x75,0x83,0x43,0x63,0x24,0x83,0x19,0xF7,0x9A};
unsigned char key2[11]={
0x1E,0x1D,0xC4,0x77,0x26,0x97,0xE0,0x74,0x59,0x88,0x7C};
int nextbyte, index;
unsigned char c;
FILE *infile, *outfile;
if (argc != 3) {
printf("Utility to decrypt GWBASIC/BASICA files saved with \",p\"\n\n"
"Copyright 1992 by Paul C. Kocher. All rights reserved.\n\n"
"Usage: BASCRACK encrypted.bas outfile.bas\n");
exit(1);
}
if ((infile=fopen(argv[1],"rb"))==NULL || (outfile=fopen(argv[2],"wb"))==NULL) {
printf("Error opening file.\n");
exit(1);
}
if (fgetc(infile) == 0xFE) { fputc(0xFF, outfile); }
else { printf("Not an encrypted BASIC file\n");
exit(1);
}
index = 0;
nextbyte=fgetc(infile);
while (c=nextbyte, (nextbyte=fgetc(infile)) != EOF) {
c -= 11 - (index % 11);
c ^= key1[ index % 13 ];
c ^= key2[ index % 11 ];
c += 13 - (index % 13);
fputc(c, outfile);
index = (index+1) % (13*11);
}
fputc(c, outfile); /* Don't decrypt the EOF character */
return 0;
}
In the Computer Supplement #21 of Spring 1996, under "Basic Peeks, Pokes and Subroutines", Mike Todd published what he said to be a way to unprotect a BASIC program that was saved with ",P":
I did not checked this trick under BASICA, but I checked it under several versions of GW-BASIC, and all of them denied to run the BLOAD statement, since it is inhibited by the protected save:First you must create a file to overlay the ,P setting. From the DOS prompt start up BASICA or BASIC and enter the BASIC command BSAVE "UN.P",1124,1. This will create a file on your default drive named UN.P.
Next LOAD your program that had been saved using ",P". If it was named MYPROG.BAS the BASIC command would be LOAD "MYPROG".
Now to use the UN.P file to overlay the protection setting use the command BLOAD "UN.P",1124.
You may now use the LIST, EDIT and SAVE commands as usual.
DOS Version | File Size | MD5 | Result | ||
---|---|---|---|---|---|
GW-BASIC version 3.22 | Failure | ||||
GW-BASIC 2.01 for Olivetti Computer | (1983, 1984) | Failure | |||
BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, Inc | MS-DOS 1.25 | GWBASIC.EXE (05 June 1984 01:25:00) | 59728 | 2FB3EB25944C27267626836435DE7369 | Failure |
Last edited: