Indefinite Studies

Level Up!

with 3 comments

Today was my first day as a software engineer at Apple in Cupertino. It means that in all likelihood I will only post stuff about my cat on this blog from now on, and that it feels a bit like I am 5 and this is Christmas Day. Last year has been a heck of a ride: I graduated, got married, moved to the US, started a postdoc in Dawn Song’s group, became a father, worked on a few papers with amazing people, interviewed with Google Zurich and finally accepted Apple’s offer. As usual, there is a quote by Douglas Adams that describes this phenomenon inconspicuously well:

Life… is like a grapefruit. It’s orange and squishy, and has a few pips in it, and some folks have half a one for breakfast.

Written by dan

October 17, 2011 at 22:50

Posted in useless

A Survey of Call Graph and Points-to Algorithms in Java

leave a comment »

Disclaimer: I am not an expert on the subject, this post is a collection of my reading notes from a bunch of papers. There may be factual errors and the content is subject to change. Just because I can.

In object-oriented languages, constructing a call graph is complicated by the presence of virtual method calls. This is a problem known as the dynamic dispatch problem, and is common in many programming languages and paradigms. The problem can be formulated as follows for Java: if m is a virtual method (i.e. not private or static), then the method that o.m() resolves to depends on the run-time type of o (as opposed to its declaration type). Determining the run-time type of o requires points-to information (what object may o point to?), which in turn requires a call graph to be computed (because what values o may point to depend on what methods get called). This mutual recursion is a fundamental problem, that is not quite trivial to tackle.

A number of static analyses have been proposed to do the job, but since the problem is undecidable no algorithm can return both a precise and correct answer. As a result, we have to decide if we want fast or precise analyses, since they can not be both. Some of the characteristics that affect the precision and cost of the algorithms are:

flow sensitivity: does the order of statements matter?
context sensitivity: how to distinguish between two calls of the same procedure in different contexts?
aggregate structure modeling: how are arrays, lists, and hash tables abstracted?
heap modeling: how to differentiate the potentially unbounded number of objects that may be created at a given allocation site?

Below are some notes about flow sensitivity and context sensitivity, followed by a discussion of some of the classic algorithms for call graph construction.

Flow sensitivity

Flow-insensitive analyses are only concerned with what statements are present in the program, not with the order or the reachability of statements. They return properties of variables valid in the whole program.
Flow-sensitive analyses compute what values may flow to variables by considering valid paths in the program. They return properties of variables valid at certain program points. Therefore, they are more precise but more expensive to compute.

Example (courtesy of smcc): a parity analysis for the following code snippet yields different results depending on the flow sensitivity of the analysis. A flow-insensitive analysis would return that x could be anything, while a flow-sensitive analysis would return that x is even at line 3 and odd at line 4.

void someMethod(int y) {
   int x;
   x = 2 * y;
   x = x + 1;
}

A common way to obtain a form of flow sensitivity is to transform the program in SSA form (thereby encoding control dependencies in variable names) and then to apply flow-insensitive analyses. Algorithms for efficient translation to SSA form are given in Appel, Modern Compiler Implementation in Java. For an implementation that transforms Java bytecode to SSA form, see Soot. In SSA form, the above example would become:

void someMethod(int y) {
    int x_1 = 2 * y;
    int x_2 = x_1 + 1;
}

A flow-insensitive analysis would then be sufficient to return that x_1 is even and x_2 is odd.

Although instances of the standard dataflow frameworks (Kildall, A Unified Approach to Global Program Optimization, POPL 1973) lend themselves particularly well to flow-sensitive analyses, Nielson et al. in Principles of Program Analysis give an example of a flow-insensitive instance, by simply considering empty kill sets in the dataflow equations (i.e., it loses information about which variables are overwritten).

Context sensitivity

Context sensitivity matters for inter-procedural analyses, where a single method may be invoked in different contexts. So, "what is a context?", you may ask. Well, it is a parameter of the analysis, so the answer is "it depends". The most common abstractions seem to be:

call sites, so that the same method invoked from different line numbers would be distinct. However, if a method is invoked multiple times from the same program point (in a loop for example), then all these invocations would look exactly the same to the analysis, because they are all abstracted to come from the same place.
receiver objects, so that the same method applied to two different (abstract) objects would be distinct. Objects are commonly abstracted by their allocation site (i.e., the place where they are new'ed). The same restriction as above applies, multiple allocations in a loop will be merged. The technique is described in Parameterized Object Sensitivity for Points-to Analysis for Java (ISSTA 2002, 129 citations) by Milanova, Rounev and Ryder.

Call site sensitivity example (from Lhoták's PhD thesis): A context-insensitive points-to analysis would analyse id() once, and return that both c and d may point to b and a. A context-sensitive analysis using call sites as the context abstraction, on the other hand, would analyze id() in 2 different contexts and would properly return that c points to a and d points to b.

Object id(Object o) {
    return o;
}

void f() {
    Object a = new Object();
    Object b = new Object();
    Object c = id(a);
    Object d = id(b);
}

Object sensitivity example (from Chord's PLDI tutorial): in the following example, we want to examine whether floors.elems and events.elems may point to the same elements. An object-insensitive points-to analysis would return that they could point to the same objects, because it does not distinguish between different instances of List. An object-sensitive analysis would abstract events and floors using two different allocation sites, and would maintain separate points-to information for evens.elems and floors.elems. Therefore, it would be able to determine that these two arrays have disjoint points-to sets.

class List {
	Object[] elems;

	List() {
		elems = new Object[42];
	}
}

class Bldg {
	List events, floors;

	Bldg() {
		this.events = new List();
		this.floors = new List();
		for (int i=0; i < events.elems.length; i++)
			events.elems[i] = new Object();
		for (int i=0; i < floors.elems.length; i++)
			floors.elems[i] = new Object();
	}
}

In each case (object sensitivity and call site sensitivity), we could get further precision by examining not only the last call site/receiver object, but the list of call sites/receiver objects that lead to the current context (i.e., the call stack is the context). In general, these lists may be infinite, so a common abstraction is to bound them by a parameter k. Hence, we refer to k-object-sensitivity or k-call-site-sensitivity. However, if m in the number of abstract context abstractions (allocation sites or call sites), there are m^k potential abstract contexts to consider. It quickly becomes impractical for values of k greater than 2 or 3.

The added precision of context-sensitivity depends on the client analysis. For instance, context-sensitivity does not add much to the precision of call graph construction and virtual method resolution. In Java, 1-object-sensitivity seems to provide the best tradeoff between performance and precision for most client analyses.

Algorithms

Example:

abstract class Animal {
	public abstract void saySomething();
}

class Cat extends Animal {
	public void saySomething() {
		System.out.println("purr");
	}
}

class Dog extends Animal {
	public void saySomething() {
		System.out.println("woof");
	}
}

class Fish extends Animal {
	public void saySomething() {
		System.out.println("...");
	}
}

class Car {  // not an Animal
	public void saySomething() {
		System.out.println("honk!");
	}
}

public class Main {
	static Animal neverCalled() {
		return new Fish();
	}

	static Animal selectAnimal() {
		return new Cat();
	}

	public static void main(String[] args) {
		Animal animal = selectAnimal();
		animal.saySomething();
	}
}

There are 4 methods called saySomething in this application, and the problem is to find which one(s) may be called at run-time (the correct answer is Cat.saySomething()). A dumb and incorrect analysis would just report that any method called saySomething could be invoked, including the one in the Car class. Many algorithms have been proposed to solve this problem, here are a few examples (they are all flow-insensitive, and n stands for the program size):

CHA (Class Hierarchy Analysis)

by Dean, Grove and Chambers in Optimization of object-oriented programs using static class hierarchy analysis (ECOOP 1995, 490 citations)
complexity: O(n)?
it does the only trivial thing to do: it first builds the class hierarchy (i.e., it figures that Cat, Fish and Dog are subclasses of Animal) and determines that saySomething() could be applied to any subclass of Animal, thereby eliminating Car.

RTA (Rapid Type Analysis)

by Bacon and Sweeney in Fast Static Analysis of C++ Virtual Function Calls (OOPSLA 1996, 387 citations)
complexity: O(n)?
simply put, it prunes from the CHA call graph methods that can never be reached because their enclosing class is never instantiated. In our example, it would determine that saySomething() could be applied to a Fish or a Cat but not a Dog, since no instance of Dog is ever created. It is strictly more powerful than CHA and still very fast and simple.

VTA (Variable Type Analysis)

by Sundaresan et al. in Practical Virtual Method Call Resolution for Java (OOPSLA 2000, 206 citations)
complexity: O(n)?
it can be thought of as a refined version of RTA. When RTA collects all objects that can be created in the whole program and uses that information to prune the call graph, VTA does this for each variable instead, providing more precise information. In our example, VTA would correctly determine that saySomething() can not be applied to a Fish, because there is no corresponding edge in the type propagation graph.

k-CFA (Control Flow Analysis of order k):

by Shivers in Control-flow analysis of higher-order languages (1991 PhD thesis, 476 citations)
complexity: O(n³/log(n)) for 0CFA
initially formulated for the functional language Scheme, it has apparently evolved to mean different things in the functional and OO communities (I am still looking for a simple formulation for OO languages though). In Resolving and Exploiting the k-CFA Paradox (PLDI 2010), it is described as a "k-call-site-sensitive, ﬁeld-sensitive points-to analysis algorithm with a context-sensitive heap and with on-the-ﬂy call-graph construction". It seems that any analysis that uses k-call-strings as contexts could be called a k-CFA analysis.
in the PLDI paper, Might et al. also note that "[in the OO setting] 1- and 2-CFA analysis is considered heavy but certainly possible".
in our example, 0CFA would return that saySomething() at line 40 has to be applied to a Cat, because only a Cat can flow to the animal variable.

ZCWL

by Zhu and Calman in Symbolic pointer analysis revisited (PLDI 2004, 69 citations) and Whaley and Lam in Cloning-based context-sensitive pointer alias analysis using binary decision diagrams (PLDI 2004, 327 citations)
complexity: massive
essentially a k-CFA analysis where k is the depth of the call graph (plus some fancy-schmancy operations on strongly connected components of the call graph). Requires a context-insensitive call-graph to boot, which it then makes context-sensitive.

Based on this simple example, we see that CHA and RTA return an imprecise call graph, while VTA, 0CFA and ZCWL return a precise call graph. VTA is by far the simplest and most efficient in our case, because we only want a rather coarse-grained call graph, not variable-level points-to information that k-CFA and ZCWL provide.

The Worst Journey in the World

leave a comment »

This is the story of the 1910-1913 British expedition to the South Pole. The setting is so epic that is seems to be almost fictional. Sadly, this is a true story.

You don’t expect Antarctica to be a particularly comfortable place, but the book does a very good job at recounting the life (and death) of the expedition in excruciating details. Their boat almost sank on their way to Antarctica as they passed the Roaring Forties. They hoped that ponies could be used to lay out dépôts, but the poor beasts had the greatest pains to adapt to the cold and had to be finished with pickaxes and fed to the dogs. Their two pre-WWI motor sledges just did not work. They were to travel more than 3000 km by foot, dragging sledges with their food and fuel supplies behind them, over a period of 5 months. They had to thaw their way into their sleeping bags at night, and to cut their boots open to give them extra room for the frostbite-induced blisters. There was the constant threat of crevasses, week-long blizzards, scurvy and snow-blindness.

Yet, they were experienced, organized, fit and cheerful. Reminiscent of a huge but flat Himalayan ascent (although the Pole actually lies 2800m above sea level), they built a shelter for the winter and laid out dépôts with supplies for the return of the polar party. After a year of preparation, the polar journey was finally ready. They crossed the immensity of the Ross Ice Shelf (roughly the size of France). They ascended the 160 km long Beardmore glacier, and reached the Polar Plateau. They were ready for the final push to the Pole into uncharted territory. All was looking good.

January 4. A last note from a hopeful position. I think it’s going to be all right. We have a fine party going forward and arrangements are all going well.

– entry in the journal of Captain Scott before the final “ascent”

When they reached the Pole after months of such hardships, it was only to find a tent sporting a Norwegian flag and a note by Amundsen, whose expedition was the first to reach the Pole. Just a month earlier.

On the return journey, problems started to arise. The cold was more intense than expected at this time of year, making the sledges harder to pull on ice crystals. Hunger was persistent, as the rations proved to be insufficient (the summit rations were roughly 4900 calories a day, but with no vitamins intake). Frostbite was creeping its way on hands and feet. One man collapsed, and never woke up. Another could not go any further and suggested that they should leave him behind in his sleeping bag. Ultimately, he said “I am just going outside and may be some time”. He walked out into the blizzard never to be seen again. The three remaining explorers fought until the end, but ran out of fuel and food and died 11 miles from a dépôt with plenty. What follows are the last entries in the Captain’s journal.

Monday, March 19. Lunch. We camped with difficulties last night and were dreadfully cold till after our supper of cold pemmican and biscuit and a half pannikin of cocoa cooked over the spirit. Then, contrary to expectations, we got warm and slept well. To-day we started in the usual dragging manner. Sledge dreadfully heavy. We are 15.5 miles from the dépôt and ought to get there in three days. What progress ! We have two days’ food but barely a day’s fuel. All our feet are getting bad — Wilson’s best, my right foot worse, the left all right. There is no chance to nurse one’s feet till we can get hot food into us. Amputation is the least I can hope for now, but will the trouble spread ? That is the serious question. The weather doesn’t give us a chance — the wind from N. to N.W. and -40 degrees. temp. to-day.

Wednesday, March 21. Got within 11 miles of dépôt Monday night ; had to lay up all yesterday in severe blizzard. To-day forlorn hope, Wilson and Bowers going to dépôt for fuel.

22 and 23. Blizzard bad as ever — Wilson and Bowers unable to start — to-morrow last chance — no fuel and only one or two of food left — must be near the end. Have decided it shall be natural — we shall march for the dépôt with or without our effects and die in our tracks.

Thursday, March 29. Since the 21st we have had a continuous gale from W.S.W. and S.W. We had fuel to make two cups of tea apiece and bare food for two days on the 20th. Every day we have been ready to start for our dépôt 11 miles away, but outside the door of the tent it remains a scene of whirling drift. I do not think we can hope for any better things now. We shall stick it out to the end, but we are getting weaker, of course, and the end cannot be far.

It seems a pity, but I do not think I can write more.

Written by dan

September 26, 2011 at 23:13

Posted in useless

The Mythical Man-Month Reading Notes

with one comment

the man-month is a fallacious and dangerous myth, for it implies than men and months are interchangeable
Brooks’s Law: adding manpower to a late project makes it later (effort is needed to repartition the work, train the new people, and more time is spent on intercommunication)
“Good cooking takes time” (Restaurant Antoine’s Chef)
discipline is good for art: providing an architecture enhances the creative style of an implementing group
the purpose of organization is to reduce the amount of communication and coordination necessary
“The programmer delivers satisfaction of a user need rather than any tangible product” (Cosgrove)
all repairs tend to destroy structure, to increase the entropy of a system
“How does one project get to be a year late? One day at a time”
the first step in controlling a project on a tight schedule is to have a schedule with concrete, measurable milestones. A programmer will rarely lie about a milestone if it’s so sharp he can’t deceive himself.
chronic schedule slippage is a morale-killer: “if you miss one deadline, make sure you make the next one” (Mac Carthy)

Written by dan

September 26, 2011 at 12:04

Posted in note to self

A healthy dose of sanity

leave a comment »

It is not that often that you learn something and get a good laugh from a Ph.D. thesis. Hence, Shivers’ Control-flow analysis of higher-order languages has been a pleasant surprise.

He is at the origin of k-CFA, a class of static analyses that tackle the dynamic dispatch problem with context-sensitive data-flow analysis. Even though the analysis is targeted at functional languages, it also applies to object-oriented languages with virtual function calls (see Matt Might for a gentle introduction). From Shivers’ thesis:

So, if we wish to have a control-flow graph for a piece of Scheme code, we need to answer the following question: for every procedure call in the program, what are the possible lambda expressions that call could be a jump to? But this is a flow analysis question! So with regard to flow analysis in a [higher-order language], we are faced with the following unfortunate situation:

In order to do flow analysis, we need a control-flow graph.

In order to determine control-flow graphs, we need to do flow analysis.

The problem could be rephrased for Java as: “if we wish to have an [inter-procedural] control flow graph for a Java application, we need to answer the following question: for every instance method call, what classes could the object be an instance of?”

Written by dan

September 15, 2011 at 13:53

Posted in code analysis

Low Tech Thermostat

leave a comment »

Why do offices have to be so damn cold?

Written by dan

September 4, 2011 at 14:58

Posted in useless

Me on Dynamic Binary Instrumentation

leave a comment »

That was at Deepsec 2009 in Vienna. Slides here.

Written by dan

June 21, 2011 at 12:13

Posted in code analysis

Generic Iterative Dataflow Analysis in Python

with 2 comments

Here is an example of a remarkably simple yet powerful dataflow analysis technique. The algorithm is very generic and can be used to implement a number of forward and backward analyses such as constant propagation, reaching definitions, value-set analysis, or in my case type inference.

The algorithm, adapted from the maximal fixedpoint algorithm in the dragon book, takes a control flow graph as input and outputs IN and OUT (maps from basic block to abstract state at their entry and exit (an abstract state maps variables to abstract values)). It is parametric, you must supply it with a few functions that will determine the output of your analysis:

analysis.meet, takes two abstract states and returns an abstract state (see lattices). To guarantee that the algorithm terminates, this function should be monotone (and your lattice of abstract values of finite height).
analysis.step_forward (resp. analysis.step_backward), a function that takes an instruction and an abstract state at its entry (resp. exit) and “executes” it, transforming the abstract state. They are used to automatically compute the transfer function for each basic block in the cfg.

It looks like this:

def forward_transfer_function(analysis, bb, IN_bb):
    OUT_bb = IN_bb.copy()
    for insn in bb:
        analysis.step_forward(insn, OUT_bb)
    return OUT_bb

def backward_transfer_function(analysis, bb, OUT_bb):
    IN_bb = OUT_bb.copy()
    for insn in reversed(bb):
        analysis.step_backward(insn, IN_bb)
    return IN_bb

def update(env, bb, newval, todo_set, todo_candidates):
    if newval != env[bb]:
        print '{0} has changed, adding {1}'.format(bb, todo_candidates)
        env[bb] = newval
        todo_set |= todo_candidates

def maximal_fixed_point(analysis, cfg, init={}):
    # state at the entry and exit of each basic block
    IN, OUT = {}, {}
    for bb in cfg.nodes:
        IN[bb] = {}
        OUT[bb] = {}
    IN[cfg.entry_point] = init

    # first make a pass over each basic block
    todo_forward = cfg.nodes
    todo_backward = cfg.nodes

    while todo_backward or todo_forward:
        while todo_forward:
            bb = todo_forward.pop()

            ####
            # compute the environment at the entry of this BB
            new_IN = reduce(analysis.meet, map(OUT.get, cfg.pred[bb]), IN[bb])
            update(IN, bb, new_IN, todo_backward, cfg.pred[bb])

            ####
            # propagate information for this basic block
            new_OUT = forward_transfer_function(analysis, bb, IN[bb])
            update(OUT, bb, new_OUT, todo_forward, cfg.succ[bb])

        while todo_backward:
            bb = todo_backward.pop()

            ####
            # compute the environment at the exit of this BB
            new_OUT = reduce(analysis.meet, map(IN.get, succ[bb]), OUT[bb])
            update(OUT, bb, new_OUT, todo_forward, cfg.succ[bb])

            ####
            # propagate information for this basic block (backwards)
            new_IN = backward_transfer_function(analysis, bb, OUT[bb])
            update(IN, bb, new_IN, todo_backward, cfg.pred[bb])

    ####
    # IN and OUT have converged
    return IN, OUT

Ideally, to propagate dataflow information in one pass, you would like to have visited every predecessors of a basic block B for a forward pass before analyzing B. Unfortunately, due to irreducible flow graphs you are not guaranteed to be able to do this. Instead, this algorithm

starts with an empty state at some arbitrary basic block
makes a forward pass and a backward pass over each basic block, adding the successors/predecessors to a worklist when changes are detected
continues until the worklist is empty.

The meet function is here to “combine” information from multiple paths, for instance if B2 is reachable from B0 and B1, then IN(B2) = meet(OUT(B1), OUT(B2)). If you wanted to collect value set information and you had:

OUT(B0) = [a->{1}]
OUT(B1) = [a-> {-1}]
then meet could output IN(B2) = [a -> {1, -1}]

Depending on how meet is defined, it can look for information true for all paths coming to a basic block, or for information from at least one path.

Now, some sample code to implement a simple constant propagation analysis. It is forward only for simplicity, but the algorithm works for bidirectional analyses such as type inference.


def meet_val(lhs, rhs):
    result = None

    if lhs == 'NAC' or rhs == 'NAC':
        result = 'NAC'

    elif lhs == 'UNDEF' or rhs == 'UNDEF':
        result = 'UNDEF'

    else:
        result = 'CONST'

    return result

def meet_env(lhs, rhs):
    lhs_keys = set(lhs.keys())
    rhs_keys = set(rhs.keys())
    result = {}

    for var in lhs_keys - rhs_keys:
        result[var] = lhs[var]

    for var in rhs_keys - lhs_keys:
        result[var] = rhs[var]

    for var in lhs_keys & rhs_keys:
        result[var] = meet_val(lhs[var], rhs[var])

    return result

def abstract_value(env, expr):
    if expr.isdigit():
        return 'CONST'

    try:
        return env[expr]
    except KeyError:
        return 'UNDEF'

def step_forward(insn, env_in):
    if type(insn) == str:
        return 

    var, op, expr = insn

    # insn is var = c
    if len(expr) == 1:
        env_in[var] = abstract_value(env_in, expr)

    else:
        e1, op, e2 = expr
        val1 = abstract_value(env_in, e1)
        val2 = abstract_value(env_in, e2)
        env_in[var] = meet_val(val1, val2)

def step_backward(insn, env_in):
    pass

The function step_forward defines the abstract semantics for the statements or instructions of the language you want to analyze and for the analysis you want to implement. For instance here we only collect if a variable at some program point is constant, undefined, or not a constant (NAC). To do the actual propagation, we could also collect the allocation site of the constant.

the example program

Let’s consider a super simple language, where variables are numbers that can only be affected to or added together. The function meet_val computes the meet for two abstract values, according to this table:

        UNDEF  CONST  NAC
       -----------------
UNDEF | UNDEF  UNDEF  NAC
CONST | UNDEF  CONST  NAC
NAC   | NAC    NAC    NAC

Let’s consider a simple program in this “language” where we don’t specify the constructs for the control flow. The algorithm just assumes that every edge in the CFG is reachable. This is obviously not the case in practice, but that only means that we are going to miss some patterns (the analysis is sound but imprecise in order to terminate).

Now, we want to find if a and ret are constants. Here is the code necessary to setup and run the example (you need networkx to run it):


import networkx as nx    

class SomeObject:
    pass

def instructionify(somestr):
    toks = somestr.split()
    if '+' in somestr:
        return (toks[0], toks[1], (toks[2], toks[3], toks[4]))
    return tuple(somestr.split())

# setup the program's cfg
prog = nx.DiGraph()
s0 = ('entry'),
s1 = instructionify('b = x'),
s2 = instructionify('c = 2'),
s3 = instructionify('a = 40 + c'),
s4 = instructionify('ret = a + x'),
prog.add_edge(s0, s1)
prog.add_edge(s1, s2)
prog.add_edge(s2, s1)
prog.add_edge(s1, s3)
prog.add_edge(s3, s3)
prog.add_edge(s3, s4)

# initialize pred and succ
pred, succ = {}, {}
for bb in prog:
    pred[bb] = set(prog.predecessors(bb))
    succ[bb] = set(prog.successors(bb))

cfg             = SomeObject()
cfg.nodes       = set(prog.nodes())
cfg.pred        = pred
cfg.succ        = succ
cfg.entry_point = s0

analysis               = SomeObject()
analysis.meet          = meet_env
analysis.step_forward  = step_forward
analysis.step_backward = step_backward

# run the whole thing
IN, OUT = maximal_fixed_point(analysis, cfg)
print 'a   at program point s3 is', OUT[s3]['a']
print 'ret at program point s4 is', OUT[s4]['ret']

And the output is:

a   at program point s3 is CONST
ret at program point s4 is UNDEF

As a final note: it is possible to speed things up a bit by choosing a better ordering for basic blocks than just going randomly at first (because we initially fail to propagate lots of information). This might end up in another blog post. Cheers!

Written by dan

April 23, 2011 at 00:20

Posted in code analysis, python

How old is Dynamic Binary Instrumentation?

leave a comment »

So I was recently reading TAOCP (note for later: find a way to not be smug about it), and stumbled on this gem:

1.4.3.2. Trace Routines. When a machine is being simulated on itself (…), we have the special case of a simulator called a trace or monitor routine. Such programs are occasionally used to help in debugging, since they print out a step-by-step account of how the simulated program behaves.

This is about as exciting as it gets for somebody versed in tracing and dynamic binary instrumentation, especially since this volume of TAOCP was published in 1968. The algorithm that follows looks exactly like what you would find in today’s dynamic binary rewriters (saving and restoring application registers, decoding and executing the instruction while retaining control on jumps). There is also a mention regarding what we now call transparency:

[the program being traced] must not store anything into the locations used by the trace program

The bit that achieved to kill me was exercise 6, “design a trace routine capable of tracing itself” (I happen to be very sensitive when somebody mentions self-applying things). I spent a few years working on this, how come it never came up before? I looked again at references in the few standard papers about DBI, namely:

Luk et al., Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation, 2005
Bruening, Efﬁcient, Transparent, and Comprehensive Runtime Code Manipulation, 2004
Bala et al., Dynamo: a transparent dynamic optimization system, 2000
Nethercote and Seward, Valgrind: a framework for heavyweight dynamic binary instrumentation, 2007
Hunt and Brubacher, Detours: Binary Interception of Win32 Functions, 1999

There is no mention of Knuth, and with one exception, the oldest references date from the 90s. The exception comes from the Detours paper, with an incredible reference to Gill, The Diagnosis of Mistakes in Programmes on the EDSAC, 1951:

The second technique will be referred to as the ‘step-by-step’ technique. In this, the control unit of the machine never obeys any of the orders of the original programme directly. The machine remains under the control of the checking routine, which is so constructed that the orders of the original programme are examined one by one, and carried out in exactly the same manner and sequence as if they were being obeyed directly. If this were all that the checking routine accomplished, it would be merely a means of carrying out machine operations in slow motion-slowed down, in fact, by a factor of the order of 10. The reason for adopting this type of operation is that it is now open to the author of the checking routine to insert additional orders into it, causing it to print useful information as it proceeds. This information may be chosen so as to assist in the investigation of either order or numerical failures.

So to answer the original question in this post: software-based tracing seems to be just as old as software itself.

Indefinite Studies

Level Up!

A Survey of Call Graph and Points-to Algorithms in Java

Flow sensitivity

Context sensitivity

Algorithms

Further reading

The Worst Journey in the World

The Mythical Man-Month Reading Notes

A healthy dose of sanity

Low Tech Thermostat

Me on Dynamic Binary Instrumentation

Generic Iterative Dataflow Analysis in Python

How old is Dynamic Binary Instrumentation?