Infinite Trees

1)

Up until now, we’ve mostly looked at trees as a data structure, but we haven’t looked at them in the context of any particular problems. Trees are used in many applications in computer science, robotics, and game design. One common application is for representing the possible moves in a game or search problem. Imagine that you were exploring the space that a player (character or robot) could move through. On each move, the player would change position (and possible acquire the ability to do new tasks based on location – we’ll set that aside for now).

Imagine that you wanted to run programs to optimize the movement of your player through the space. You would need a representation of the space, a way to determine the new position after each move, and a way to estimate which moves now could yield the best payoff a few moves down the road. Trees are a useful data structure here: each node contains data about the present location (where it is, its features, etc), each type of move yields a new location (aka node). For example, consider a grid-based simple game in which a player can only move between adjacent squares (ie, forward, backward, left, right). This yields a location tree like:

     ------- <4,8>------
    /        /   \      \
<4,9>    <4,7>   <3,8>  <5,8>
                / / \ \
           ----- /   \ -------
          /     /     \      \
      <3,9>   <3,7>  <2,8>   <4,8>

(Ignore that we have <4,8> twice in this picture – we’ll return to that tomorrow.)

The problem with building this tree upfront is that set of possible game/robot configurations (or positions) tends to be large, if not (effectively) infinite. In this case, creating the tree upfront simply isn’t feasible. Instead, we need to create the tree in a demand-driven fashion: we create each node when we need it, remembering just enough information to let us create more nodes as necessary.

Our task today is to figure out how to develop infinite trees in Java. To simplify the problem, we will work simply with binary trees, each of which has an integer for its datum. The left child of each node is the node with data value minus 1; the right child is the node with data value plus 1. In other words:

       0
      / \
    -1   1
   /  \  ...
 -2    0
...    ...

Let’s start with our previous definition of binary trees and see how well it fits our new problem:

  interface INode {}
 
  class Leaf implements INode {
    Leaf(){}
  }
 
  class Node implements INode {
    int data;
    INode left;
    INode right;
 
    Node (int data, INode left, INode right) {
      super(data);
      this.left = left;
      this.right = right;
    }
  }
 
  class Examples {
    INode myTree = new Node(4,
                            new Node(3, new Leaf(), new Leaf()),
                            new Leaf());
  }

This isn’t a great representation, because a Leaf denotes the end of the tree. In our desired problem, the tree never ends, it just hasn’t necessarily been built yet. There is also no notion here that a Leaf has data, but not (yet) children. So this shape doesn’t capture the essence of our problem. Instead, we need classes that capture the “unfinished” nature of some nodes. We will omit leaves entirely.

Following conventional CS terminology, we will call un-built nodes “promises”:

  class NodePromise implements INode {
    int data;
 
    NodePromise(int data) {
      this.data = data;
    }
  }

Our prior example would be written as following using NodePromise:

  class Examples {
    INode myTree = new Node(4,
                            new NodePromise(3),
                            new NodePromise(5));
  }

That is, the tree has one completed node (with data 4), and two “pending” nodes (for 3 and 5) that we might create on demand.

To demand that a node be created, we “force” the promise of a node to be carried out (“force” is again the conventional CS term here). In particular, our NodePromise class will have a force method that builds-out one more node of the tree:

  private INode force() {
    return new Node(this.data,
                    new NodePromise(this.data - 1),
                    new NodePromise(this.data + 1));
  }

So if we were to run myTree.left.force(), we would get a Node with data 3 and NodePromise children for values 2 and 4.

To summarize, we now have the following class hierarchy for infinite trees, cleaned up with an abstract class to capture the data common to the two kinds of INode:

  interface INode {}
 
  abstract class AbsNode implements INode {
    int data;
 
    AbsNode(int data) {
      this.data = data;
    }
  }
 
  class Node extends AbsNode {
    private INode left;
    private INode right;
 
    Node (int data, INode left, INode right) {
      super(data);
      this.left = left;
      this.right = right;
    }
  }
 
  class NodePromise extends AbsNode {
    NodePromise(int data) {
      super(data);
    }
 
    private INode force() {
      return new Node(this.data,
                      new NodePromise(this.data - 1),
                      new NodePromise(this.data + 1));
    }
  }

Now that we have force, we need to decide when to use it. We said that we wanted to construct the tree “on demand”, meaning that if a user of our code wants to see a node, we should force it to exist. The user should not have to think about whether the node already exists. They should simply be able to ask for a node and get it back; it is up to our code to create that node if needed. Put differently, the user should have the illusion of having an infinite tree. That we don’t create it as such is an implementation detail of no concern to the user.

Notice that Node has fields for left and right, while NodePromise does not. This suggests that the INode interface needs to provide methods getLeft() and getRight() that will get the corresponding subtree. For Node objects, these methods just return the corresponding (existing) node. For NodePromise objects, these methods force creation of the new nodes:

  interface INode {
    INode getLeft();
    INode getRight();
  }
 
  class Node extends AbsNode {
    private INode left;
    private INode right;
 
    // constructor omitted
 
    public INode getLeft() { return this.left; }
 
    public INode getRight() { return this.right; }
  }
 
  class NodePromise extends AbsNode {
 
    // constructor omitted
 
    private INode force() {
      return new Node(this.data,
                      new NodePromise(this.data - 1),
                      new NodePromise(this.data + 1));
    }
 
    public INode getLeft() { return this.force().getLeft(); }
    public INode getRight() { return this.force().getRight(); }
  }

To convince ourselves that this does hide the details from a user, we can experiment with our interface (in DrJava, you can use the interactions window for this).

> NodePromise root = new NodePromise(0);
> root.getLeft().getData()
-1
> root.getLeft().getLeft().getData()
-2
> root.getLeft().getRight().getLeft().getData()
-1
> root.getLeft().getRight().getRight().getData()
1

This assumes that we also have an getData method in the AbsNode class that returns the data (see the final posted code file for this). We see that a user can’t tell that we are using NodePromise underneath, even though we have not created the full tree explicitly (as seen in the definition of root). We also know that force must be working, since we started with only a NodePromise. In addition, we know that we are not actually creating the infinite tree, as our test case terminated without a stack overflow error.

Of course, we might feel more confident that we aren’t constructing the whole tree if we added a method to compute its size. What if we added the following method to AbsNode?

  abstract class AbsNode implements INode {
 
    int data;
 
    // constructor omitted
 
    public int getData() { return this.data; }
 
    public int size() {
      return 1 + getLeft().size() + getRight().size();
    }
  }

If you run this method on our root example, Java eventually reports a “stack overflow” error. This means that the size computation did not terminate. Why not? If you think about our data structure carefully, it has no base case. This seems weird: a recursive data structure with no real base case. Haven’t we always needed a base case (such as leaf) in order to finish building data, or to make functions terminate? Yes. We only get away with building infinite data here because we defer building all the data via the promise.

As for termination, when you follow templates on data with a base case, your functions are guaranteed to terminate. For functions on infinite data, we need to rely on something other than the data definition for termination. For example, we can change the problem to introduce a pseudo-base case (ie, terminate on NodePromise). Or, we can do a computation that has some other argument on which we can terminate; if you expand a tree out by a fixed number of levels, as the lookAhead function will below, then the number of levels gives a way to guarantee termination. Either way, you need to make some argument about termination, rather then relying on the data structure to do that for you.

As a side note: some of you may have learned about induction in high school or other courses. Induction is how we usually prove that functions over recursively-defined data are correct. Induction, however, relies on having a base case, which isn’t true here. If you are interested in the mathematics of infinite data structures, look up co-induction, which handles reasoning on infinite data structures.

To see the “pseudo base-case” idea in action, we can choose to compute the size of the demanded portion of the tree, we can stop counting the size at promised nodes. We can achieve that by overwriting the size computation in the NodePromise class:

  public int size() { return 0; }

What you’re seeing here is a powerful programming concept known as laziness. In lazy computation, we compute only what is needed, extending the data if more cases are required. Laziness is an interesting concept that we cover in depth in the Programming Languages course (for anyone who is interested). For purposes of this class, you should understand that demand-driven computation is sometimes essential, and that you can use methods such as force to achieve it.

Stop and think about force for a moment. It is a rather different function than ones you have seen before. Most functions simply provide names for computations. When we give them arguments, they substitute those arguments into the function body and return a result. In contrast, force exists to delay a computation (namely, the construction of the tree). It doesn’t take arguments. Those of you who came through 101 saw this concept before through lambda. The core idea is important in computer science overall, however: functions do not always abstract over behavior; they also can set up a computation to be performed later.

We mentioned that a common operation on game trees is to look ahead a few steps to help determine which of several moves to take. To wrap up this example, let’s write a lookAhead function that takes an integer indicating how many levels down to expand the tree, and returns a list of the possible nodes at that depth. A real game program would then perform some computation over that list of nodes to select which move to make next.

First, let’s agree on the interface and examples:

  interface INode {
    int getData();
    INode getLeft();
    INode getRight();
    IList<INode> lookAhead (int depth);
    int size();
  }

Calling lookAhead on a node at depth 0 should return a list containing only the node (irregardless of whether that node is a NodePromise). At any depth larger than 0, lookAhead will simply append the results of calling lookAhead on each of its subtrees at depth-1.

Note that nothing in this description differentiates between Node and NodePromise, so we can simply put the lookAhead computation in the AbsNode class:

  public IList<INode> lookAhead(int depth) {
    if (depth == 0)
      return new ConsList<INode>(this, new MtList<INode>());
    else
      return this.getLeft().lookAhead(depth-1).append(this.getRight().lookAhead(depth -1));
  }

This code uses a method append that concatenates the results of two lists. We haven’t written this method on lists before, but could easily add it to our running IList implementation if we so chose.

Actually, this is a good opportunity to introduce you to Java’s built-in classes for lists. Java uses the term LinkedList for the kinds of lists you have created in Racket and Java. If you want to use the LinkedList classes, you first need the following statement at the top of your file:

  import java.util.LinkedList;

The LinkedList package has several minor and one major difference from the lists we’ve created manually. The minor differences arise in the names of operators. We’ll just show you those as we need the operators, or you can look up the Java LinkedList interface with an easy Google search. The major difference is that Java’s operators do not return a new list. Instead, they modify an existing list. We’ll get into the serious implications of this difference next week. For now, it largely means that you have to write your code somewhat differently than before. This is best seen by example. Consider the following lookAhead code as written against the built-in list library:

  public LinkedList<INode> lookAhead(int depth) {
    if (depth == 0) {
      LinkedList<INode> newlist = new LinkedList<INode>();
      newlist.add(this);
      return newlist;
    }
    else {
      LinkedList<INode> leftresult =
        this.getLeft().lookAhead(depth - 1);
      LinkedList<INode> rightresult =
        this.getRight().lookAhead(depth - 1);
      leftresult.addAll(rightresult);
      return leftresult;
    }
  }

First, we see that the class name for a built-in list is LinkedList. Now look inside the case when the depth is zero. The built-in classes do not distinguish between empty and non-empty lists as our classes do. When we ask for a new LinkedList, we simply get back a new list with no elements. To add this as an element to the list, we use the add operator:

  newlist.add(this);

After adding this, we can return newlist.

If you try to write these three lines more compactly as

  return new LinkedList<INode>().add(this);

Java gives you a type error. Our lookAhead function needs to return a list, but the add operator doesn’t return a list. Instead, it returns void, which you can think of as the “return nothing” type. The standard list operations all return either void or boolean (to indicate success or failure of the operation). As a result, you can’t compose list operations as we have done previously. Instead, you create a variable to hold the list(s), write each operation as an independent statement, then return whichever list holds your result at the end. The sequence of statements in the else case of lookAhead give you another example of this stylistic difference.

One way to test lookAhead is to make sure that we are getting the number of nodes we expect from looking ahead at various depths. The following set of test cases shows what we expect here, using Java’s built-in size method on lists to compute our answer:

  class Examples {
    NodePromise root = new NodePromise(0);
 
    boolean testLookAhead0 (Tester t) {
      return t.checkExpect(root.lookAhead(0).size(),1);
    }
 
    boolean testLookAhead1 (Tester t) {
      return t.checkExpect(root.lookAhead(1).size(),2);
    }
 
    boolean testLookAhead3 (Tester t) {
      return t.checkExpect(root.lookAhead(3).size(),8);
    }
  }

For more detailed testing, we would want to check the contents of each node. We will hold off on that for now, as we haven’t yet shown you how to iterate over Java’s built-in lists.

Actually, this example works well for illustrating some other Java-isms that you need to know coming out of this course. The final version of the code, posted to the website, illustrates three additional features/techniques that you need to understand:

  • Access modifiers: Java lets you control access to fields and methods across classes. You’ve already encountered “public” as an annotation for methods, but we’ve never discussed what it means. This code illustrates three different access modifiers: public, private, and protected.

    • Private fields and methods are only visible inside of the class (they are not accessible in subclasses); force in NodePromise is a good example of a private method: no class other than NodePromise should know about or be able to use this method. As a general rule, you should make all fields private (providing methods to access them, as we did with getData) unless you have a good reason to do otherwise. Once we get into mutation, other classes could modify your non-private data, which can become problematic.

    • Protected fields and methods are visible within the class and its subclasses. The data in AbsNode is protected to restrict its direct access to Node and NodePromise. Had it been private, neither subclass could use the expression this.data.

    • Public fields and methods are accessible to any class or object. Methods in interfaces are required to be public (that’s the spirit of being an interface).

This summary really is all you need to know about public/private/protected. We will begin to use these modifiers consistently so that you get comfortable with them and their constraints. There are any number of descriptions of these online if you feel you need more detail.

  • Wrapper classes: It is common to use a separate class to “wrap” an instance of your datastructures. This is useful when you want to give someone either an obscured or persistent reference to your data, while allowing you to modify the data behind the scenes. The GameTree class in the posted code is a wrapper for nodes. It is useful in this case because a user should not need to know that there is a difference between Node and NodePromise (i.e., that we defer the computation of some nodes is really an implementation detail). Instead, we allow a user to create a GameTree, which we in turn initialize with a NodePromise.

  • Multiple Constructors: Notice that there are two constructors in the GameTree class: one takes a value for the rootdata and one initializes the rootdata to 0. This provides flexibility to your user. In a real game, the initial data might be too complex for a user to provide, or the user might customize only part of the initial data. Multiple constructors provide flexibility when different contexts would use your program in different ways.

We’ve covered a lot of ground here. What should you understand from this lecture?

  • That we can support infinite data by defining classes and methods to build more data as it is needed.

  • How to create and augment Java’s built-in lists (iteration will come later).

  • Java’s access modifiers, and when to use each.

  • The idea of wrapper classes and multiple constructors to give users a need-to-know view of your data.


1)
lecture notes adapted and used with permission from Kathi Fisler