Detecting a signedness bug in vintage compiled Pascal programs

The bug

We previously reported that MASM.EXE version 1.00, a program published in 1981 by Microsoft and IBM for MS-DOS, hangs or exits when run with some precise free memory amounts. We showed that this is due to a signedness error in the stack setup code emitted by the compiler. All other programs built by the same compiler are potentially affected by this bug.

Which compiler generated IBM MASM.EXE 1.00 ?

Using its current signature database (file "IBM PC" dated 20 December 2014), Detect It Easy version 1.01 says that IBM MASM.EXE version 1.00 would have been emitted by the IBM Pascal Compiler, but misdetect the version number as being 2.00 (1987).

IBM programming languages for the PC were licensed from Microsoft to IBM.
Reading the startup code emitted by the IBM Personal Computer Pascal Compiler 1.00 published in 1981, 2.00 and 2.02 (1984-1987), they are not the same as the startup code of MASM.EXE 1.00.

According to Tim Paterson, original author of MS-DOS, MASM 1.00 was written in Pascal by Microsoft's programmer Marc McDonald:
MASM was written in Pascal by Marc McDonald (Microsoft employee #1, after Bill & Paul). Microsoft had been doing all their development on DEC computers, and the macro capability put in MASM was modeled after DEC assemblers. Because MASM was such a large macro assembler, one of the guys referred to it as "McDonald's big mac".

MASM.EXE 1.00 could have been cross compiled from DEC computers, since the first IBM Personal Computer was announced on August 12, 1981 and was merely a prototype when Microsoft programmers were writing its software.

Interestingly, the IBM PC Pascal Compiler version 1.00 (PAS1.EXE and PAS2.EXE) has neither the same startup code as MASM.EXE, neither the same bug (it has another bug, however this is another story). This startup code shows that the Pascal compiler could have been compiled by itself.

Hunting this bug in EXE programs

We provide below a program to detect if an EXE file is affected by this bug and compute the DOS free memory threshold that would trigger the bug. This utility was able to detect the buggy code sequence in the following files.

The IBM Personal Computer MACRO Assembler, Version 1.00 (1981)

All three EXE files in the distribution are affected:
  • MASM.EXE, date 12 July 1981, 67584 bytes, MD5=0C68BDE13BF46F813B41FC5B19ED56D8, SHA1=0C68BDE13BF46F813B41FC5B19ED56D8. Memory threshold that triggers the bug : slightly > 578448 bytes of consecutive free DOS memory.
  • ASM.EXE, date 12 July 1981, 52736 bytes, MD5=671562CB253977E074766843CA5C0024, SHA1=472F524F19EA953931C16AB214A5BC7F3AF9D7D1. Threshold : 566784 bytes.
  • CREF.EXE, date 10 May 1981, 13825 bytes, MD5=9D7C17FA6DF6698524B78BD793E19B18, SHA1=C70ABB8A702C4DE1F67C6A94BE7CE4AF3E4C5288. Threshold : 533344 bytes.
All three thresholds were computed from machine code analysis and then checked by experimentation to be accurate.

Other Microsoft languages

The Microsoft MACRO Assembler version 1.00, M.EXE (67840 bytes, MD5=82E78C6A076A3C581C5DF33BC7766DE8), do not have the same bug and startup code as the version licensed to IBM. It seems to have been compiled with the IBM Pascal Compiler or another compiler using the same startup code.

Patching buggy programs

Every affected program could easily be patched by changing a single byte of code. We choose not to do this automatically, because we do not want patched files to become more widely available than genuine historic files. If someone does patch a file, he should rename it so that it becomes obvious that this is no more the genuine EXE file, and he should conscientiously keep the original file with the new one and with an information text stating the patching.

Code to detect this stack setup bug

This program was tested with MinGW / GCC 4.8.1 under 64-bit Windows and Borland Turbo C 1.0 under MS-DOS 6.22.
To be published soon...
Last edited: