The P5 compiler


The P5 compiler has existed for a long time as an idea. P4, the last of the Zurich series P-system compilers, left off before the ISO 7185 standard in 1982. It was not only not standard Pascal compliant, it also was only a subset, abeit a substantial one, of full revised Pascal. As an example, or "model" implementation of Pascal, it would have made sense to update the compiler to ISO 7185 status, and that was basically done as "A Model Implementation of Standard Pascal" [Welsh&Hay] before 1986. In fact, the project was designed to support the ISO 7185 project.

However, there were a few reasons that a true P5, a straightforward update of the old p4, was a good idea. First, the source code of the "Model Implementation" is not generally available. Second, the "Model Implementation" is a complete scratch rewrite of the compiler, and shares virtually nothing in common with the original P4. This was important because several books, articles and online resources exist for the P4 compiler

What I wanted for p5 was a compiler that both accepted ISO 7185 standard Pascal, and was also written in it. The compiler is an extended version of P4 and uses the same intermediate codes where possible.

P5 now accepts the full ISO 7185 language, and also has been remade as a byte oriented machine, similar to what was done for both the UCSD compiler and the "Model Implementation". This is is the key to achieving a high efficiency implementation that runs with compact code.

P5 also runs the PAT or Pascal Acceptance test, and also self compiles.

P5 correctly runs the BSI, or British Standards Institute tests.


The meaning of P5


P5 is a very important milestone for Pascal. To understand why, it is a good idea to review why P4 was important. P4 was to accomplish:

To understand why P5 is important, you must understand that P4 didn't completely accomplish the above goals. First of all, P4 was a subset of the full languge. It was never designed to run the full language, only a minimal subset that could be ported to a new machine. The idea was to finish out the full language on the target machine.

Unfortunately, that meant there was not a concrete model of some of the more advanced (and hard to implement) features of full Pascal, for example interprocedural gotos, and procedure and function parameters. These and other features of Pascal left out of P4 were often left out of target compilers, and when they were implemented, they were implemented wrong.

The other issue is that P4 was designed to be a minimal bootstrap implementation. If you examine P4, you will see that it makes little use of strings, and keeps them short. This is because it is very inefficient when it comes to storing them in memory. They are stored one character to a word (60 bits on the CDC 6000). Pascal has packing, but that is not implemented in P4.

Finally, P4 is very much oriented to the CDC 6000 that it originated on. Everything is stored in 60 bit words, and there is a packing system designed to store two instructions per word.

The reason P4 had these limitations is that memory was very limited back in the 1970s, when P4 came about. Even on the CDC 6000. The authors of P4 worked hard to get P4 down in size and memory requirements so that it would self compile.

By the time of the ISO 7185 standard, many people understood that P4 was limited for its purposes. The "Model Implementation of Standard Pascal" [Welsh & Hay] was the answer, and it contained a compiler for the full ISO 7185 Pascal language. Further, it implemented the interpreter as a byte oriented machine (sometimes called a "bytecode machine"). Unfortunately, it got sucked up into the BSI, who have effectively killed it (there appear to be no internet copies of it, and the BSI has not been forthcoming concerning it). Another issue with the "Model Implementation" (with apologies to Jim Welsh and Atholl Hay), is that the MI is written in the "self documented" form (avocated by D. E. Knuth and others) where the entire documentation exists in the same file, intermixed with the code. This is a beautiful method to present code as a finished product, but it tends to be fairly difficult to work on an change. Finally, the MI was a complete break with P4, and had nothing in common with it. This meant that MI used completely new methods and documentation, whereas P4 was already documented in the common media and well understood.

P5 is both a break with the past and an embrace of it:


The PAT and PRT


The PAT or Pascal acceptance test is a series of tests in one file that go through each feature of ISO 7185 Pascal. If a ISO 7185 Pascal implementation can compile and run this correctly, then it is substantially compliant with ISO 7185 Pascal.

There are two types of tests, the PAT and the PRT, or Pascal Rejection Test. The PAT test should compile and run correctly, and is a "positive" indication that the implementation compiles standard structures and gives standard results. The PRT is the opposite. It is designed to either fail to compile or generate runtime errors or both. It is a "negative" test that makes sure that the implementation rejects non-standard structures.

The PAT only is represented here (for now).

Relationship to the BSI test suite

The BSI test suite [covered in Wichmann&Ciechanowicz] includes both positive and negative testing, and appeared in original version in the Pascal User's Group. After a great deal of trouble I was able to OCR a copy of that test, which was published free and clear of restrictions.

However, the both the test suite and bore copyright notices at one point, and both were given to the BSI (British Standards Institute) to keep and distribute. The BSI no longer distributes either, at any price, and whether they have kept it is also in doubt. In fact, recently I have been calling them about once a month to find out any information about the pair of programs. Both of them were created at universities outside of the BSI, and both were intended to be distributed, not locked in a vault to be eventually discarded.

I don't and won't distrubute the BSI test without permission, and I don't have access to the model compiler. Even with the BSI status of "openly published, but rights kept", I don't feel comfortable putting it up on this site. However, because it was in fact openly published, I don't feel that I, as an individual, am unable to run the tests, either.

Now, the reason that all of this matters is that with P5, we have effectively replaced the material imprisoned by the BSI. P5 upgrades P4, which never bore a copyright, was public domain, and was distributed openly. I put my own work into upgrading P4 into P5, but I donate that work back to the public domain. As P4 was, P5 is free of copyright and charges. Use as you see fit.

The PAT was created entirely by me and is original work. However, I also donate this to public domain. It was created back in the early 1990's, and used to validate both mine and other Pascal compilers. The PAT effectively replaces the positive testing side of the BSI. I also intend to create a negative test, the PRT, and also make that public domain.

Further, the PAT and PRT form a collection point for tests, including test that were made in reaction to the failures seen while running the BSI tests. In other words, if the BSI test found a failure, then an equivalent test was added to the PAT (not copied from the BSI!). This is a work in progress, so not all failure points have yet been addressed.

Thus, the PAT and PRT are designed to be full replacements for the BSI tests.

Format and working of the PAT

The PAT is designed to execute a small amount of code, then print the results. Each "test point" tests one feature of ISO 7185 Pascal,and is numbered according to type and sequence. Here is an example from the test:

   write('Control6: ');

   if true then write('yes') else write('no');

   writeln(' s/b yes');

This prints:

Control 6: yes s/b yes

So you see the number and type of the test, control structures number 6, the result, 'yes', and finally what the result should be.

The PAT is designed to be verified manually, that is, you read it and check that the printed results equal the "should be" collumn. The PAT can be easily automated for regression purposes by redirecting the output to a file, then comparing a saved "gold" version of the result file to the current file.


Self compile


P5 is capable compiling itself. This takes different steps for each of the sections, pcom and pint. The resulting intermediate files are listed in the files seciton below.

pcom

I was able to get pcom.pas to self compile. This means to compile and run pcom.pas, then execute it in the simulator, pint. Then it is fed its own source, and compiles itself into intermediate code. Then this is compared to the same intermediate code for pcom as output by the regular compiler. Its a good self check, and in fact found a few bugs.

The Windows batch file to control a self compile and check is:

cpcoms.bat

What does it mean to self compile? For pcom, not much. Since it does not execute itself (pint does that), it is simply operating on the interpreter, and happens to be compiling a copy of itself.

Changes required

Pcom won't directly compile itself the way it is written. The reason is that the "prr" file, the predefined file it uses to represent it's output file is only predefined to p5 itself. The rules for ISO 7185 are that each file that is defined in the header which is not predefined, such as "input" or "output", must be also declared in a var statement. This makes sense, because if it is not a predefined file, the compiler must know what type of file (or even non-file) that is being accessed externally to the program.

Because the requirements are different from a predefined special compiler file to a file that is simply external, the source code must be different for a P5 file vs. another compiler. A regular ISO 7185 Pascal compiler isn't going to have a predefined prr file.

The change is actually quite small, and marked in the source. You simply need to remove, or comment out, the following statements:

{ !!! remove this statement for self compile }    

prr: text;                      { output code file }

and

{ !!! remove this statement for self compile }

rewrite(prr); { open output file }

In the batch file above, this modified file is represented as pcomm.pas, or "modified" pcom.pas.

All of the source code changes from pcom.pas to pcomm.pas are automated in cpints.bat.

pint

Pint is more interesting to self compile, since it is running (being interpreted) on a copy of itself. Unlike the pcom self compile, pint can run a copy of itself running a copy of itself, etc., to any depth. Of course, each time the interpreter runs on itself, it slows down orders of magnitude, so it does not take many levels to make it virtually impossible to run to completion. Ran a copy of pint running on itself, then interpreting a copy of iso7185pat. The result of the iso7185pat is then compared to the "gold" standard file.

As with pcom, pint will not self compile without modification. It has the same issue with predefined header files. Also, pint cannot run on itself unless its storage requirements are reduced. For example, if the "store" array, the byte array that is used to contain the program, constants and variables, is 1 megabyte in length, the copy of pint that is hosted on pint must have a 1 megabyte store minus all of the overhead associated with pint itself.

The windows batch file required to self compile pint is:

cpints.bat

As a result, these are the changes required in pint:

{ !!! Need to use the small size memory to self compile, otherwise, by

  definition, pint cannot fit into its own memory. }

maxstr      = 2000000;  { maximum size of addressing for program/var }

{maxstr      = 200000;}  { maximum size of addressing for program/var }

and

{ !!! remove this next statement for self compile }

prd,prr     : text;(*prd for read only, prr for write only *)

and

{ !!! remove this next statement for self compile }

reset(prd);

and

{ !!! remove this next statement for self compile }

rewrite(prr);

All these changes were made in the file pintm.pas.

Pint also has to change the way it takes in input files. It cannot read the intermediate from the input file, because that is reserved for the program to be run. Instead, it reads the intermediate from the "prd" header file. The interpreted program can also use the same prd file. The solution is to "stack up" the intermediate files. The intermediate for pint itself appears first, followed by the file that is to run under that (iso7185pat). It works because the intermediate has a command that signals the end of the intermediate file, "q". The copy of pint that is reading the intermediate code for pint stops, then the interpreted copy of pint starts and reads in the other part of the file. This could, in fact, go to any depth.

All of the source code changes from pint.pas to pintm.pas are automated in cpints.bat.

Self compiled files and sizes

The resulting sizes of the self compiled files are:

pcomm.p5

Storage areas occupied

=====================================

Program           0-114657  ( 114658)

Stack/Heap   114658-1987994 (1873337)

Constants   1987995-2000000 (  12005)

 

pintm.p5

Storage areas occupied

=====================================

Program           0- 56194  (  56195)

Stack/Heap    56195-1993985 (1937791)

Constants   1993986-2000000  (  6014)

 


Files for the P5 system


 

Versions

 

 

Version

File

1.1

p5.zip

 

Contains all the files required for the P5 implementation.

 

The zip archives here are for significant releases. Each release undergoes a testing process and is certified by running the entire test suite against it.

 

Development tree

 

P5 is also available as an GIT project at:

 

https://sourceforge.net/projects/pascalp5/

This lets you access my development link for Pascal-P5. You can use this to:

And much more.

To get a current development copy of Pascal-P5, simply have GIT installed on your system, and execute the following in the directory where you wish to install P5:

git clone ssh://samiam95124@git.code.sf.net/p/pascalp5/code pascalp5-code

This will get the entire P5 file tree and place it into the target directory "p5".

A note about versioning

My common practice is to "bump" the version numer after any changes are made to a certified release. The idea is that the new version number will be the version of the next release to come. This of course can cause confusion. However, the rule, is: if it is not in the above release list, then it is a development version.

Getting started

 

For all versions see the readme.txt file in the root directory.

 


Compiling and using P5


The P5 compiler/assembler is much easier than P4 in one respect. There are no limitations to remember verses ISO 7185 Pascal. If it is legal Standard Pascal, it will compile and run.

To run P5, use the following format:

pcom intermediate.int < source.pas

pint intermediate.int program.out

All files must be specified.

This is what the batch file p5.bat given above does.


What is.. and is not in P5


While upgrading P4 to P5, I specifically tried to avoid any temptation to "improve" the code, such as add functions or features, or reformat the code to be more presentable, etc. There is a time and place for that. I simply wanted P5 to be a full language compiler for Pascal, instead of a subset compiler. The one exception I allowed for is the addition of a routine that dumps all of the error codes that were used in the source compile along with their text equivalents. I have found this to be a great improvement on trying to search the various documents for what error code means what.

Of course, virtually all implementations improve on the original Pascal, including the original CDC 6000 compiler. The extensions consist of a combination of features best left defined to a particular implementation, and also usablity extentions to Pascal in general.

There's a lot more that can be done with P5. However, I have left that for the P6 project. P6 is the next step for the P-series, and includes a series of extentions to the base ISO 7185 language.


For more information contact: Scott A. Moore samiam@moorecad.com