1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Deciphering GW-BASIC / BASICA protected programs

Discussion in 'Papers' started by Christophe Lenclud, Apr 3, 2018.

  1. Christophe Lenclud

    Christophe Lenclud New Member Publisher


    Purpose

    We describe the deciphering algorithm needed to unprotect source programs stored by GW-BASIC / BASICA.COM.

    History

    Microsoft BASIC interpreters were a common way of programming early IBM PC and compatible computers (1981 onwards). Several versions of Microsoft BASIC were used: first IBM PC came with “Cassette BASIC” in ROM, while “Disk BASIC” (BASIC.COM) and “Advanced BASIC” (BASICA.COM) shipped with DOS floppies. IBM compatible computers ran the equivalent GW-BASIC bundled with MS-DOS. They were eventually replaced by the QBasic interpreter with MS-DOS 5.0, or by the QuickBASIC compiler.

    Source programs used by BASICA and GW-BASIC interpreters could be saved under three different formats:
    • The command SAVE “PROGRAM.BAS”, A stores an ASCII version of the program which could be read by or exported to other BASIC dialects.
    • The command SAVE “PROGRAM.BAS” stores the program as binary tokens to spare scarce memory and disk space. Althought not officialy published by Microsoft, the comprehensive format of these “tokenized” programs has been described by Norman De Forest and Dan Tobias. Detokenizers for Microsoft BASICs have been written to convert these programs back to ASCII.
    • The command SAVE “PROGRAM.BAS”, P was aimed at protecting the program by furthermore enciphering this tokenized file. Such protected programs cannot be listed or modified anymore under GW-BASIC / BASICA.

    Unprotecting enciphered BASIC source programs

    To unprotect the programs saved with the P option, we studied the code of GWBASIC.EXE (version 3.22, file size 80502 bytes, date/time 24 july 1987 00:00:02, MD5 = AB25516575579185CCA865D89E3E1A31). This particular file was shipped with Microsoft MS-DOS 3.30 for Wyse Technology OEM. The deciphering code embedded in GW-BASIC is equivalent to the following Assembly language listing:
    Code (ASM):
    1. ; Input: DS:SI -> Data to be deciphered.
    2. ;          DS:DX -> After end of data.
    3. Decipher_GWBASIC proc near
    4.     mov    cx, 0D0Bh
    5.     mov    di,si
    6.     mov    bh,0
    7.     cld
    8. @@LoopDecipher:
    9.     cmp    si,dx
    10.     jz    short @@EndOfFile
    11.     ; Decipher one byte...
    12.     mov    bl,ch
    13.     lodsb
    14.     sub    al,cl
    15.     xor    al, [bx + offset Key1 - 1]
    16.     mov    bl,cl
    17.     xor    al, [bx + offset Key2 - 1]
    18.     add    al,ch
    19.     stosb
    20.     ; Next byte...
    21.     dec    cl
    22.     jnz    short @@NotZ1
    23.     mov    cl,0Bh
    24. @@NotZ1:
    25.     dec    ch
    26.     jnz    @@LoopDecipher
    27.     mov    ch,0Dh
    28.     jmp    @@LoopDecipher
    29. @@EndOfFile:
    30.     ret
    31. Decipher_GWBASIC endp
    32.  
    33. Key1    db    9Ah, 0F7h, 19h, 83h,  24h, 63h, 43h, 83h,  75h, 0CDh, 8Dh, 84h, 0A9h
    34. Key2    db    7Ch,  88h, 59h, 74h, 0E0h, 97h, 26h, 77h, 0C4h,  1Dh, 1Eh
    This PROCEDURE is 45 bytes long. The actual code from Microsoft is 50% longer, but performs exactly the same operation. This deciphering computation has to begin with the second byte of the file. The very first byte has to be changed from FEh (signature of a protected file) to FFh (signature of an unprotected tokenized file). The last significant byte of the file will be set to 1Ah, but early versions of BASIC seems to leave some junk bytes as slack space at the end of the file.

    The main deciphering computation apply XOR between the file and two keys embedded in GWBASIC.EXE. As these keys are 13 and 11 bytes long, this is equivalent to a key repeated every 13 * 11 = 143 bytes.

    Including the key in the program violates Kerckhoffs's principle, so in fact this is more a source code obfuscation than a true strong cipher.

    Knowing this deciphering algorithm, it is easy to write a program to unprotect the files. This is exactly what does “Unprotect Basic Version 1.10”, a 535-bytes long program by John Thomason (21 december 1990).

    Which versions of Microsoft BASIC used this cipher?

    By searching the binary signature of the two keys in an incomplete collection of DOS versions and IBM PC ROM images, we found them in several versions of Microsoft BASIC, suggesting that they all supported the same cipher:
    • BASICA.EXE (size 54272 bytes, 13 May 1983 12:00:00, MD5 = 28E22CAA7EC534A78D37AA3314690758) from "The COMPAQ Personal Computer DOS, Version 1.11" Rev E.
    • GWBASIC.EXE (size 59728 bytes, 05 June 1984 01:25:00, MD5 = 2FB3EB25944C27267626836435DE7369) "BASIC Interpreter - Version 1.12.03 - Copyright (C) 1984 Corona Data Systems, Inc" from MS-DOS 1.25.
    • Floppy disk images of Compaq MS-DOS 1.10, 1.11, 1.12, 3.00, 3.31.
    • Floppy disk images of MS-DOS 1.25, 2.11, 3.10, 3.30.
    • and many other...
    No version of BASIC.COM nor BASICA.COM were found to contain the keys, but several ROM from IBM computers embedded them: IBM BASIC 1.00 and 1.10 (1981), IBM computers 5160, 6162, 5170, PCjr. These ROM BASIC were not only run at computer power up if no operating system disk was used, but were also called by BASIC.COM and BASICA.COM running from DOS.
     
    Last edited: Apr 3, 2018

Share This Page