December 16, 1999

Thoughtful Programming and Forth: Ch 1

by Jeff Fox

Chapter 1

Most of the Forth community has had little exposure to the evolution of Chuck’s Forth for the last fifteen years and have now become deeply entrenched in their habits from twenty years ago. Chuck has lamented that no-one has published a book teaching people how to do Forth well. Chuck has seen how other people use Forth and is generally not impressed. On this page I will discuss aspects of the Forth language as I currently see them and lightly cover the subject of good Forth programming.

A Definition for Good in the Context of Forth Programming

What is good Forth? What makes one Forth program better than another? Well of course it depends on context. The first thing in that context to me is the computer. Real programs run on real computers. By that I mean real programs are implementations not specifications. You can specify the design of a program in a more or less portable form or you can specify the details of an actual implementation of that program more explicitly. In either case I am talking about two aspects of the program, the source and the object. I will discuss what I mean by good source and good object code.

Good object code is pretty straightforward. It is efficient in terms of system resources, it does not consume resources excessively. The particular resources for a given system and a given program will constitute a different balance of things like memory use, speed (time use), register use, cache use, I/O device use, etc. On many architectures there is the tradeoff between code size and speed. Up to the point that cache overflows — longer sequences of unfactored instructions — will execute faster so many compilers perform inlining of instructions. At the point that a cache overflows things can slow down by an order of magnitude and if the program expands to virtual memory paging from disk, things will slow down by orders of magnitude.

A little smaller, a little bigger, no big deal. A little faster, a little slower, no big deal. But when the ratios become quite large you really need to pay attention to the use of resources. Since there are so many layers that all multiply by one another in terms of efficiency if a system has ten layers that each introduce a little more fat, the final code may see a small fraction of the total CPU power available. Programmers need to remember on most modern machines the CPU is much faster than cache memory, cache memory is much faster than onpage DRAM access and offpage DRAM access is much slower than onpage. Regardless of other factors the way the program organizes data in memory and how it is accessed can easily effect program speed by more than an order of magnitude. What is marketed as a 100 MHz PC can easily be slowed to 10 MHz by slow memory access depending on the program. It can be effectively reduced to almost nothing when the software unpredictably goes away for 20 seconds at a time to do some system garbage collection or something. From the user’s point of view for those 20 seconds the machine has 0 user MIPS. Programs slow significantly when the program or dataset is so large and access to it is so random that the worst case memory time happens a lot. This and much worse is what happens as programs grow and spill out of cache and out of available memory. To avoid this, keep things small.

In some cases, such as scripting languages, fat is not an issue in terms of code efficiency. It remains an issue in programmer efficiency however if that fat is a source of bugs just like lean code only moreso. Excessively fat programs can easily be excessively buggy and unstable because the bugs will be hard to find in all that fat. Also if a program is grossly inefficient at runtime it may not be as important as the time spent writing it. There are many one-off types of applications where big and slow is not an issue such as trivial scripts that only run once in a while. But for system software it is very important that object code not be too inefficient because other things are built on top of it.

Of course some people would say, who cares, just buy a more expensive and faster computer to make up the difference. Sometimes that makes sense. But for those who have been in those BIOSes and system software and seen how bad it can get, it seems like a shame to see people being forced to waste 90% of their investment in hardware or software because it means someone gets to charge more money. In this sense the inefficiency fuels the planned obsolescence and forces people down the expensive upgrade path. It’s good for you if you own Intel or Microsoft but otherwise it is a concern that has spawned the growth of public-domain software like Linux.

Good source code is a bit more difficult to define. It should be clear, easy to read and write, easy to debug. Again a little smaller, a little bigger no big deal. But computer languages are more different than one another than human languages. When people see a language that is considerably more brief or verbose than the computer language that they are used to, their immediate reaction is usually I can’t read that. It’s too little or it’s too much. To compound this variation in point of view the visual layout of the source is a big issue. The attention of reader is directed by code layout and this is also a big factor on how readable the code will be. If the comments are in a language that you don’t read they, don’t help. If they are in a font that is too small to see, they don’t help. If they are printed in a color that you can’t see, they don’t help. Fortunately some vision problems are correctable, but these are issues.

For some people the code layout must be pretty. This may be more important to some people than code contents. I can’t relate to that myself. To me the layout is simply there to direct the attention of the reader. You are not trying to give them an esthetically pleasing experience so that they sigh when they look at the page and don’t bother to read the contents. If you follow code layout rules they are there just to make the code clearer.

Chuck has switched to color in his latest Forth as a replacement for some of syntax and words that he had not already eliminated. : ; [ ] LITERAL DECIMAL HEX \ ( ) are some of the words that Chuck has replaced with color change tokens. What I find most interesting about this is that when reading the code a different part of your brain is engaged in seeing the organization of the code into words and what the compiler and interpreter are going to do with the code than the the part of your brain that decodes the meaning of the words. It seems to free the part of the brain reading words to focus on the words more clearly because there are less distractions. Mostly Chuck has replaced some layout information and some Forth words with color. Besides making the Forth small and fast, as Chuck puts it, it also makes it colorful. My own experience with his colorForth is that the result is easier to read code than conventional Forth. But until I have tried using it myself I am not ready to make a final judgment about that.

As I have said, prettiness is more important to some people and beauty is in the eye of the beholder. Some people think a system described clearly on a couple of pages is beautiful in itself, just as a concise equation in Physics. To another person a listing that looks like a telephone directory is beautiful. People will never agree about what looks best. Chuck has limited detail resolution in his vision and complains that he can’t see small fonts on the screen. He uses large fonts so he can see them and as a consequence he only has short lines and small definitions. Other people have screens with 256 characters on a line and some very long Forth definitions. Chuck complains that he can’t see those small characters and that the code should be factored into smaller pieces. (when the code is printed in a larger font Chuck has also complained that he still couldn’t read it because often it would begin with lots of words that had been loaded from a user’s libraries that are essential for the author to write anything but which can only be described as extensions to Forth. If you know all of these persons extensions you might be able to read the code.) This same author complains that he is color blind so colorForth doesn’t work for him, even if he were not color blind the lack of layout and spelling rules would make it unreadable to him. Of course color has been substituted for layout and some words in colorForth. Chuck feels color is a good substitute for layout and some words, other people don’t or haven’t tried it.

As I say the layout issue is very personal. One person may have a couple of rules for layout, and someone else may have about as many rules for spelling and code layout as another person needs to define the Forth system. My stance is that this is a matter of taste and I have my personal style and I can read either extreme of code. The code with pages of layout and spelling rules looks nice and if you cross reference all the words that came from the user’s private libraries the meaning is clear. I find Chuck’s colorForth very easy to read too. I think it is easier for me to read but part of that is the same reason that a 25K source is easier to read than a 25M source.

Size becomes a significant factor when it comes to being clear, easy to read, easy to write and maintain, etc. When the number ratios become quite large very small programs can be read quickly but may include subtleties that elude easy perception on the surface. They may need to be read more than once, or they may require more documentation than the code itself to be clear. If code is too dense it will appear as nothing except meaningless cryptic symbols unless it is studied in great detail. If code is too verbose it may appear as perfectly clear line by line but impossible to view because of size. Yes, I can read source code, but no I can’t read 25 megabytes of source and keep a picture of it all clearly in my mind.

So my definition here for good source code is: something that conveys meaning to the programmer effectively. I would say text, but it could include graphics in visual programming or color, font styles, etc.

Object Code
n. The code produced by a compiler from the source code, usually in the form of machine language that a computer can execute directly, or sometimes in assembly language.
Source Code
n. the original form of a computer program before it is converted into a machine-readable code
Truth Bleeds Red 2018