The Limbo Programming Language

Dennis M. Ritchie

(Revised 2005 by Vita Nuova)

Limbo is a programming language intended for applications running distributed systems on small computers. It supports modular programming, strong type checking at compile- and run-time, interprocess communication over typed channels, automatic garbage collection, and simple abstract data types. It is designed for safe execution even on small machines without hardware memory protection.

In its implementation for the Inferno operating system, object programs generated by the Limbo compiler run using an interpreter for a fixed virtual machine. Inferno and its accompanying virtual machine run either stand-alone on bare hardware or as an application under conventional operating systems like Unix, Windows 2000, Linux, FreeBSD, MacOSX, and Plan 9. For most architectures, including Intel x86, ARM, PowerPC, MIPS and Sparc, Limbo object programs are transformed on-the-fly into instructions for the underlying hardware.

1. Overview and introduction

A Limbo application consists of one or more modules, each of which supplies an interface declaration and an implementation part. A module that uses another module includes its declaration part. During execution, a module dynamically attaches another module by stating the other module’s type identifier and a place from which to load the object code for its implementation.

A module declaration specifies the functions and data it will make visible, its data types, and constants. Its implementation part defines the functions and data visible at its interface and any functions associated with its data types; it may also contain definitions for functions used only internally and for data local to the module.

Here is a simple module to illustrate the flavour of the language.

1   implement Command;

2   include "sys.m";

3   include "draw.m";

4   sys:    Sys;

5   Command: module

    {

6       init: fn (ctxt: ref Draw->Context, argv: list of string);

7   };

8   # The canonical "Hello world" program, enhanced

9   init(ctxt: ref Draw->Context, argv: list of string)

10  {

11      sys = load Sys Sys->PATH;

12      sys->print("hello world\n");

13      for (; argv!=nil; argv = tl argv)

14          sys->print("%s ", hd argv);

15      sys->print("\n");

16  }

A quick glance at the program reveals that the syntax of Limbo is influenced by C in its expressions, statements, and some conventions (for example, look at lines 13-14), and also by Pascal and its successors (the declarations on lines 4, 6, 9). When executed in the Inferno environment, the program writes hello world somewhere, then echoes its arguments.

Let’s look at the program line-by-line. It begins (line 1) by saying that this is the implementation of module Command. Line 2 includes a file (found in a way analogous to C’s #include mechanism) named sys.m. This file defines the interface to module Sys; it says, in part,

Sys: module {

    PATH: con "$Sys";

    . . .

    print: fn (s: string, *): int;

    . . .

};

This declares Sys to be the type name for a module containing among other things a function named print; the first argument of print is a string. The * in the argument list specifies that further arguments, of unspecified type, may be given.

Line 3 includes draw.m; only one piece of information, mentioned below, is used from it. Line 4 declares the variable sys to be of type Sys; its name will be visible throughout the remainder of the file describing this module. It will be used later to refer to an instance of the Sys module. This declaration initializes it to nil; it still needs to be set to a useful value.

Lines 5-7 constitute the declaration of Command, the module being implemented. It contains only a function named init, with two arguments, a ref Draw->Context and a list of strings, and it doesn’t return any value. The ref Draw->Context argument would be used if the program did any graphics; it is a data type defined in draw.m and refers to the display. Since the program just writes text, it won’t be used. The init function isn’t special to the Limbo language, but it is conventional in the environment, like main in C.

In a module designed to be useful to other modules in an application, it would be wise to take the module declaration for Command out, put it in a separate file called command.m and use include command.m to allow this module and others to refer to it. It is called, for example, by the program loader in the Inferno system to start the execution of applications.

Line 8 is a comment; everything from the # to the end of line is ignored.

Line 9 begins the definition for the init function that was promised in the module’s declaration (line 6). The argument that is a list of strings is named argv.

Line 11 connects the program being written to the Sys module. The first token after load is the target module’s name as defined by its interface (here found in the include on line 2) The next token is the place where the code for the module can be found; it is a string that usually names a file. Conventionally, in the Inferno system, each module contains a constant declaration for the name PATH as a string that names the file where the object module can be found. Loading the file is performed dynamically during execution except for a few modules built into the execution environment. (These include Sys; this accounts for the peculiar file name $Sys as the value of PATH.)

The value of load is a reference to the named module; line 11 assigns it to the variable sys for later use. The load operator dynamically loads the code for the named module if it is not already present and instantiates a new instance of it.

Line 12 starts the work by printing a familiar message, using the facilities provided by module Sys through its handle sys; the notation sys->print(...) means to call the print function of the module referred to by sys. The interface of Sys resembles a binding to some of the mechanisms of Unix and the ISO/ANSI C library.

The loop at lines 13-14 takes the list of string argument to init and iterates over it using the hd (head) and tl (tail) operators. When executed, this module combines the traditional ‘Hello world’ and echo.

2. Lexical conventions

There are several kinds of tokens: keywords, identifiers, constants, strings, expression operators, and other separators. White space (blanks, tabs, new-lines) is ignored except that it serves to separate tokens; sometimes it is required to separate tokens. If the input has been parsed into tokens up to a particular character, the next token is taken to include the longest string of characters that could constitute a token.

The native character set of Limbo is Unicode, which is identical with the first 16-bit plane of the ISO 10646 standard. Any Unicode character may be used in comments, or in strings and character constants. The implementation assumes that source files use the UTF-8 representation, in which 16-bit Unicode characters are represented as sequences of one, two, or three bytes.

2.1. Comments

Comments begin with the # character and extend to the end of the line. Comments are ignored.

2.2. Identifiers

An identifier is a sequence of letters and digits of which the first is a letter. Letters are the Unicode characters a through z and A through Z, together with the underscore character, and all Unicode characters with encoded values greater than 160 (A0 hexadecimal, the beginning of the range corresponding to Latin-1).

Only the first 256 characters in an identifier are significant.

2.3. Keywords

The following identifiers are reserved for use as keywords, and may not be used otherwise:

    adt alt array   big

    break   byte    case    chan

    con continue    cyclic  do

    else    exit    fn  for

    hd  if  implement   import

    include int len list

    load    module  nil of

    or  pick    real    ref

    return  self    spawn   string

    tagof   tl  to  type

    while

The word union is not currently used by the language.

2.4. Constants

There are several kinds of constants for denoting values of the basic types.

2.4.1. Integer constants

Integer constants have type int or big. They can be represented in several ways.

Decimal integer constants consist of a sequence of decimal digits. A constant with an explicit radix consists of a decimal radix followed by R or r followed by the digits of the number. The radix is between 2 and 36 inclusive; digits above 10 in the number are expressed using letters A to Z or a to z. For example, 16r20 has value 32.

The type of a decimal or explicit-radix number is big if its value exceeds 2311, otherwise it is int.

Character constants consist of a single Unicode character enclosed within single-quote characters . Inside the quotes the following escape sequences represent special characters:

\\      backslash

\’      single quote

\"      double quote

\a      bell (BEL)

\b      backspace (BS)

\t      horizontal tabulation (HT)

\n      line feed (LF)

\v      vertical tabulation (VT)

\f      form feed (FF)

\r      carriage return (CR)

\udddd  Unicode character named by 4 hexadecimal digits

\0      NUL

Character constants have type int.

2.4.2. Real constants

Real constants consist of a sequence of decimal digits containing one period . and optionally followed by e or E and then by a possibly signed integer. If there is an explicit exponent, the period is not required. Real constants have type real.

2.4.3. Strings

String constants are sequences of Unicode characters contained in double quotes. They cannot extend across source lines. The same escape sequences listed above for character constants are usable within string constants. Strings have type string.

2.4.4. The nil constant

The constant nil denotes a reference to nothing. It may be used where an object of a reference type is expected; otherwise uninitialized values of reference type start off with this value, it can be assigned to reference objects, and reference types can be tested for equality with it. (The keyword has other uses as well.)

2.5. Operators and other separators

The operators are

    +   -   *   /   %   &   |   ^

    ==  <   >   <=  >=  !=  <<  >>

    &&  ||  <-  ::

    =   +=  -=  *=  /=  %=  &=  |=  ^=  <<= >>=

    :=

    ~   ++  --  !   **

The other separators are

    :   ;   (   )   {   }   [   ]

    ,   .   ->  =>

3. Syntax notation

In this manual, Limbo syntax is described by a modified BNF in which syntactic categories are named in an italic font, and literals in typewriter font. Alternative productions are listed on separate lines, and an optional symbol is indicated with the subscript ‘‘opt.’’

4. Types and objects

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.