Back to the lab, gentle[wo]men, the computer architecture series we started many moons ago continues! Last time we went over Assembly programming in an effort to better understand how (relatively) normal humans like us can actually talk to computers. But to be frank, we can do much better, so accordingly the chef’s special today is compilers.
Throughout these posts, I’ll be using one central analogy in my attempt to bring everything together: the main assembly line in a car manufacturing plant, which is analogous to the processor (or CPU) that lies within a larger computer. As a direct corollary, we’ll be referring to ourselves as the car designers in charge of writing up the instruction manual for the plant to follow, which is analogous to the programmers in charge of writing code for the processor to execute.
Do Recall
When we discussed Assembly last week with rose-coloured glasses, well, it’s only because it was so much better than the old alternative of using a keyboard that looked like this:
Allowing the user to type what vaguely resembled words rather than arbitrary codes (Assembly Language vs. Machine Code, as shown below) was a huge jump in usability and productivity.
Like we explored last time, this is analogous to the car designer being able to write his instructions in his native language (German). So, he’s able to work more comfortably and then have a team of translators convert it to the correct language for the manufacturing plant (Chinese). Sure, it takes time and money to do this, but because the designer can now work much faster and more comfortably, the ultimate result is a better, nicer and more advanced automobile.
But We Can Do Better
Okay, so Assemblers are certainly a step in the right direction, the instructions are much more legible and everyone’s happy that better cars are being produced as a result. But once the novelty wears off and you start getting used to your new tool, it’s clear that some annoyances still remain…
Manual space management
For one, as we’re writing the instruction manual, we might start to notice some patterns in the way the more common instructions are written. Take for example the process of attaching the hood, the trunk lid, or the doors to the car. If you recall part 3 of this series, we’re constantly writing instructions like this:
LOAD "hood" in shelf bin #101
LOAD "attachment bolts" in shelf bin #102
LOAD "attachment nuts" in shelf bin #103
ADD #101, #102, #103 to "car"
STORE result "car" for later
In other words, we’re manually controlling the plant’s storage space (as explained in part 1) by telling the workers exactly when to bring a part from the back-store to the shelf because they’ll need it, or when to put it back because we’re done with it.
This is, quite frankly, ridiculous when you think about it. That means we need to know the exact specifications of the auto plant we’re writing to, so we don’t accidentally make the workers run out of space because the shelf is smaller than we thought. That also means we’re wasting time writing very specific instructions when in reality they’re pretty obvious; we could just have someone else convert it for us like we did with the translators last time.
So that’s exactly what we’re going to do. We’re going to hire a new team of people and call them “converters,” and they’re going to transform the written words (in German) of the designer into the more specific intermediate instructions (still in German but more verbose) which get handed off to the translators before being shipped to the plant (in Chinese).
This will allow the car designer to write at a higher level, worry less about the details of space management, and ultimately be more productive. For example, the instructions given above don’t need to be written as five individual instructions (LOAD, LOAD, LOAD, ADD, STORE), it’s just as understandable if we write something like this:
AttachToCar(Hood, using Bolts, Nuts)
All of the necessary information is contained within that one single instruction, and anything else like storing it afterwards is strongly implied (you weren’t gonna just throw out all that hard work were you?). Even details like the shelf bin numbers don’t need to be explicitly written because they can be figured out. This basically leaves it up to the team of converters to use the shelves as they see fit, because fundamentally it doesn’t matter what bin on the shelf is used so long as the car actually gets built correctly. And since there’s only one set of bolts and nuts that fits the hood anyways, it’s perfectly feasible for them to work their magic by just choosing whatever makes sense. Leaving the specific details to them means we can focus on the bigger picture.
External instructions
Another problem with the old instructions we were writing is that they sometimes get incredibly repetitive. Let’s say that you as the designer are putting the same exact engine in four different models of car. Does it really make sense to be copy-pasting identical engine assembly instructions into each of the instruction manuals? Firstly, that’s an awful lot of matching text to be repeating across four different manuals, and secondly, if you made a mistake, you’ll have to fix it four different times!
So, why not save a few trees by just writing an external instruction manual about this particular engine, and that way you can just refer to it directly in each car’s manual. In this way, you as the car designer can nonchalantly write “now just build engine #001 lol” in the middle of the car’s instruction manual. Then, when you’re handing it off to the converters, just include a copy of the manual for building engine #001, and they’ll take care of bringing it all together into one complete set of instructions to be sent to the plant.
Instruction Analysis
Now that this conversion process has been put in place, thanks to the problems above, and we have a team of people to carefully read over the code, we might as well maximize our use of them. It’s really easy to make mistakes when writing instruction manuals, so they might as well be proofreading for us.
The mistakes we’re most likely to make as designers come from the category of things that “make no goddamn sense.” For example, we might write in step #12 to attach the engine, when in reality it hasn’t been built yet because that’s actually in step #15. With low-level instructions, we might assume the engine is already in shelf bin #101 and accidentally write an instruction to attach the engine like this:
ADD #101 to "car"
So, because we’re doing manual space management and accessing bin numbers directly, a worker would end up reaching into shelf bin #101, pulling out a tire, and having no idea what to do next as he wonders what kind of idiot wrote these instructions.
Now there are two advantages to having the conversion team around in this case. First, writing in a higher level language means that instead of producing the instruction above (which is unclear because it doesn’t tell us what’s actually in bin #101), we’d be writing something like this instead:
Install(Engine, using DuctTape, ChewingGum)
So, because the word engine is explicitly listed, it’s much easier to notice the mistake at a glance and prevent it from reaching the converters. But let’s say that we’re having a rough day and we don’t notice before sending it off… Were we to hand off such faulty instructions to the converters, they could easily proofread it and notice the mistake themselves. Even if they didn’t see it right away, because it’s their job to figure out the best way of using the shelf space now, sooner or later they’d notice that the bin was not being filled, the mistake would surface and a catastrophe would be avoided.
Assemblers on Steroids
The team of converters, as we’ve outlined them above, is directly analogous to modern code compilers. They essentially fit above the Assembler in the stack, as shown below.
They eliminate even more tedium from the lives of programmers so they can focus on getting things done. More specifically let’s focus on the three big ones we covered above in Analogy-land.
Register Allocation
First, where working in Assembly requires the programmer to manually manage individual registers, compilers were specifically built to take care of it themselves with a technique called register allocation. This way, the programmer doesn’t have to worry about loading or storing values into low-level memory, all of that complexity is hidden away so that he (or she, come on now) can treat everything as being readily accessible immediately. This leads to code that is much simpler, much shorter, and much easier to read and write.
Also, because this hides the specific implementation of the current computer from the programmer, this allows for code to become portable across many machines. Granted, this means that you’ll need a new compiler for every different instruction set (part tre) you wish to target, but it nonetheless allows programmers to save a lot of time by not having to worry about what specific type of processor the code is going to run on.
Linkers
Second, our new best friend the compiler brings along his buddy called the linker to help in the fight against programmer tedium. While not strictly a part of the compiler, the linker takes the compiler’s low-level code output and joins it with whatever other libraries have been referenced to form a single executable file. So, just like the car designer can refer to an external engine manual in his instructions provided he attaches it, the programmer can refer to an external library so long as it’s stored somewhere nearby on the computer. The linker will grab whatever it needs from the library and smash it all together into a single executable for ease of use. This allows code to be written much more succinctly, as it eliminates the need for repetition across programs.
Code Analysis
Third, because compilers inherently needs to “read” your code, another helpful feature they bring along is the ability to perform code analysis for the programmer. The easy first one is error-checking, as anything that doesn’t make sense will give the compiler trouble when trying to produce machine code. So, it’ll stop the compilation and report an error to the programmer so he can fix things up and have better luck next time.
But since the compiler’s getting intimate with your code, it might as well give it some more advanced analysis. The next big one is performance analysis so that the compiler can automatically remove dead code, or remove instructions that recalculate the same exact number over and over, for example. Furthermore, given that it’s aware of the target machine’s technical details, it can also perform platform-specific enhancements that go beyond the obvious ones outlined earlier. Granted, this means that a lot of work must go into ensuring the compiler’s correctness, but once it’s in place, free* performance improvements are nothing to sneeze at.
…And More
Finally, note that a compiler does a lot more than just the three things outlined above, and you can read more about it on good old Wikipedia, or in this informative slide deck. But those three are pretty big ones and serve to give you some appreciation of the work that went into designing compilers over the years. Also this post has grown into an overly long abomination, so here, let’s keep you entertained.
cat_computer.jpg
Up, Up and Away
At last, we’ve gotten to a point where written code actually resembles English thanks to compilers, not to mention that they bring many other improvements to the table with them. Apart from newfangled visual programming shenanigans, compilers are effectively the top of the food chain when it comes to programming tools (Note that interpreters are also very similar for the purposes of our discussion).
So, this ends our little side-quest exploring the means by which programmers tell computers what to do. It was increasingly getting further away from the core topic of computer architecture anyways. Next time, we’ll be back on the more traditional track, covering branch prediction.