PurposeHistoryUnprotecting enciphered BASIC source programsWhich versions of Microsoft BASIC used this cipher? PurposeWe describe the deciphering algorithm needed to unprotect source programs stored by GW-BASIC / BASICA.COM. HistoryMicrosoft BASIC interpreters were a common way of programming early IBM PC and compatible computers (1981 onwards). Several versions of Microsoft BASIC were used: first IBM PC came with “Cassette BASIC” in ROM, while “Disk BASIC” (BASIC.COM) and “Advanced BASIC” (BASICA.COM) shipped with DOS floppies. IBM compatible computers ran the equivalent GW-BASIC bundled with MS-DOS. They were eventually replaced by the QBasic interpreter with MS-DOS 5.0, or by the QuickBASIC compiler. Source programs used by BASICA and GW-BASIC interpreters could be saved under three different formats: The command SAVE “PROGRAM.BAS”, A stores an ASCII version of the program which could be read by or exported to other BASIC dialects. The command SAVE “PROGRAM.BAS” stores the program as binary tokens to spare scarce memory and disk space. Althought not officialy published by Microsoft, the comprehensive format of these “tokenized” programs has been described by Norman De Forest and Dan Tobias. Detokenizers for Microsoft BASICs have been written to convert these programs back to ASCII. The command SAVE “PROGRAM.BAS”, P was aimed at protecting the program by furthermore enciphering this tokenized file. Such protected programs cannot be listed or modified anymore under GW-BASIC / BASICA. Unprotecting enciphered BASIC source programsTo unprotect the programs saved with the P option, we studied the code of GWBASIC.EXE (version 3.22, file size 80502 bytes, date/time 24 july 1987 00:00:02, MD5 = AB25516575579185CCA865D89E3E1A31). This particular file was shipped with Microsoft MS-DOS 3.30 for Wyse Technology OEM. The deciphering code embedded in GW-BASIC is equivalent to the following Assembly language listing: Code (ASM): ; Input: DS:SI -> Data to be deciphered. ; DS:DX -> After end of data. Decipher_GWBASIC proc near mov cx, 0D0Bh mov di,si mov bh,0 cld @@LoopDecipher: cmp si,dx jz short @@EndOfFile ; Decipher one byte... mov bl,ch lodsb sub al,cl xor al, [bx + offset Key1 - 1] mov bl,cl xor al, [bx + offset Key2 - 1] add al,ch stosb ; Next byte... dec cl jnz short @@NotZ1 mov cl,0Bh @@NotZ1: dec ch jnz @@LoopDecipher mov ch,0Dh jmp @@LoopDecipher @@EndOfFile: ret Decipher_GWBASIC endp Key1 db 9Ah, 0F7h, 19h, 83h, 24h, 63h, 43h, 83h, 75h, 0CDh, 8Dh, 84h, 0A9h Key2 db 7Ch, 88h, 59h, 74h, 0E0h, 97h, 26h, 77h, 0C4h, 1Dh, 1Eh This PROCEDURE is 45 bytes long. The actual code from Microsoft is 50% longer, but performs exactly the same operation. This deciphering computation has to begin with the second byte of the file. The very first byte has to be changed from FEh (signature of a protected file) to FFh (signature of an unprotected tokenized file). The last significant byte of the file will be set to 1Ah, but early versions of BASIC seems to leave some junk bytes as slack space at the end of the file. The main deciphering computation apply XOR between the file and two keys embedded in GWBASIC.EXE. As these keys are 13 and 11 bytes long, this is equivalent to a key repeated every 13 * 11 = 143 bytes. Including the key in the program violates Kerckhoffs's principle, so in fact this is more a source code obfuscation than a true strong cipher. Knowing this deciphering algorithm, it is easy to write a program to unprotect the files. This is exactly what does “Unprotect Basic Version 1.10”, a 535-bytes long program by John Thomason (21 december 1990). Which versions of Microsoft BASIC used this cipher?By searching the binary signature of the two keys in an incomplete collection of DOS versions and IBM PC ROM images, we found them in several versions of Microsoft BASIC, suggesting that they all supported the same cipher: BASICA.EXE (size 54272 bytes, 13 May 1983 12:00:00, MD5 = 28E22CAA7EC534A78D37AA3314690758) from "The COMPAQ Personal Computer DOS, Version 1.11" Rev E. GWBASIC.EXE (size 59728 bytes, 05 June 1984 01:25:00, MD5 = 2FB3EB25944C27267626836435DE7369) "BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, Inc" from MS-DOS 1.25. Floppy disk images of Compaq MS-DOS 1.10, 1.11, 1.12, 3.00, 3.31. Floppy disk images of MS-DOS 1.25, 2.11, 3.10, 3.30. and many other... No version of BASIC.COM nor BASICA.COM were found to contain the keys, but several ROM from IBM computers embedded them: IBM BASIC 1.00 and 1.10 (1981), IBM computers 5160, 6162, 5170, PCjr. These ROM BASIC were not only run at computer power up if no operating system disk was used, but were also called by BASIC.COM and BASICA.COM running from DOS.