Differences

This shows you the differences between two versions of the page.

Link to this comparison view

people:mlsmith:infinite-trees [2020/05/01 14:13] (current)
mlsmith created
Line 1: Line 1:
 +====== Infinite Trees ======
 +
 +((lecture notes adapted and used with permission from Kathi Fisler))
 +
 +===== 1 Motivation =====
 +
 +Up until now, we’ve mostly looked at trees as a data structure, but we haven’t looked at them in the context of any particular problems. Trees are used in many applications in computer science, robotics, and game design. One common application is for representing the possible moves in a game or search problem. Imagine that you were exploring the space that a player (character or robot) could move through. On each move, the player would change position (and possible acquire the ability to do new tasks based on location – we’ll set that aside for now).
 +
 +Imagine that you wanted to run programs to optimize the movement of your player through the space. You would need a representation of the space, a way to determine the new position after each move, and a way to estimate which moves now could yield the best payoff a few moves down the road. Trees are a useful data structure here: each node contains data about the present location (where it is, its features, etc), each type of move yields a new location (aka node). For example, consider a grid-based simple game in which a player can only move between adjacent squares (ie, forward, backward, left, right). This yields a location tree like:
 +<code text>
 +     ------- <4,8>------
 +    /        /        \
 +<4,9>    <4,7>   <3,8>  <5,8>
 +                / / \ \
 +           ----- /   \ -------
 +          /     /          \
 +      <3,9>   <3,7>  <2,8>   <4,8>
 +</code>
 +(Ignore that we have <4,8> twice in this picture – we’ll return to that tomorrow.)
 +
 +The problem with building this tree upfront is that set of possible game/robot configurations (or positions) tends to be large, if not (effectively) infinite. In this case, creating the tree upfront simply isn’t feasible. Instead, we need to create the tree in a demand-driven fashion: we create each node when we need it, remembering just enough information to let us create more nodes as necessary.
 +
 +Our task today is to figure out how to develop infinite trees in Java. To simplify the problem, we will work simply with binary trees, each of which has an integer for its datum. The left child of each node is the node with data value minus 1; the right child is the node with data value plus 1. In other words:
 +<code text>
 +       0
 +      / \
 +    -1   1
 +   /  \  ...
 + -2    0
 +...    ...
 +</code>
 +
 +===== 2 Modeling Infinite Trees =====
 +
 +Let’s start with our previous definition of binary trees and see how well it fits our new problem:
 +<code java>
 +  interface INode {}
 +
 +  class Leaf implements INode {
 +    Leaf(){}
 +  }
 +
 +  class Node implements INode {
 +    int data;
 +    INode left;
 +    INode right;
 +
 +    Node (int data, INode left, INode right) {
 +      super(data);
 +      this.left = left;
 +      this.right = right;
 +    }
 +  }
 +
 +  class Examples {
 +    INode myTree = new Node(4,
 +                            new Node(3, new Leaf(), new Leaf()),
 +                            new Leaf());
 +  }
 +</code>
 +
 +This isn’t a great representation, because a ''Leaf'' denotes the end of the tree. In our desired problem, the tree never ends, it just hasn’t necessarily been built yet. There is also no notion here that a Leaf has data, but not (yet) children. So this shape doesn’t capture the essence of our problem. Instead, we need classes that capture the "unfinished" nature of some nodes. We will omit leaves entirely.
 +
 +Following conventional CS terminology, we will call un-built nodes "promises":
 +<code java>
 +  class NodePromise implements INode {
 +    int data;
 +
 +    NodePromise(int data) {
 +      this.data = data;
 +    }
 +  }
 +</code>
 +
 +Our prior example would be written as following using ''NodePromise'':
 +<code java>
 +  class Examples {
 +    INode myTree = new Node(4,
 +                            new NodePromise(3),
 +                            new NodePromise(5));
 +  }
 +</code>
 +
 +That is, the tree has one completed node (with data 4), and two "pending" nodes (for 3 and 5) that we might create on demand.
 +
 +To demand that a node be created, we "force" the promise of a node to be carried out ("force" is again the conventional CS term here). In particular, our ''NodePromise'' class will have a ''force'' method that builds-out one more node of the tree:
 +<code java>
 +  private INode force() {
 +    return new Node(this.data,
 +                    new NodePromise(this.data - 1),
 +                    new NodePromise(this.data + 1));
 +  }
 +</code>
 +
 +So if we were to run ''myTree.left.force()'', we would get a ''Node'' with ''data'' 3 and ''NodePromise'' children for values 2 and 4.
 +
 +==== 2.1 The Class Hierarchy So Far ====
 +
 +To summarize, we now have the following class hierarchy for infinite trees, cleaned up with an abstract class to capture the data common to the two kinds of ''INode'':
 +<code java>
 +  interface INode {}
 +
 +  abstract class AbsNode implements INode {
 +    int data;
 +
 +    AbsNode(int data) {
 +      this.data = data;
 +    }
 +  }
 +
 +  class Node extends AbsNode {
 +    private INode left;
 +    private INode right;
 +
 +    Node (int data, INode left, INode right) {
 +      super(data);
 +      this.left = left;
 +      this.right = right;
 +    }
 +  }
 +
 +  class NodePromise extends AbsNode {
 +    NodePromise(int data) {
 +      super(data);
 +    }
 +
 +    private INode force() {
 +      return new Node(this.data,
 +                      new NodePromise(this.data - 1),
 +                      new NodePromise(this.data + 1));
 +    }
 +  }
 +</code>
 +
 +===== 3 When to Force Node Creation =====
 +
 +Now that we have force, we need to decide when to use it. We said that we wanted to construct the tree "on demand", meaning that if a user of our code wants to see a node, we should force it to exist. The user should not have to think about whether the node already exists. They should simply be able to ask for a node and get it back; it is up to our code to create that node if needed. Put differently, the user should have the illusion of having an infinite tree. That we don’t create it as such is an implementation detail of no concern to the user.
 +
 +Notice that ''Node'' has fields for ''left'' and ''right'', while ''NodePromise'' does not. This suggests that the ''INode'' interface needs to provide methods ''getLeft()'' and ''getRight()'' that will get the corresponding subtree. For ''Node'' objects, these methods just return the corresponding (existing) node. For ''NodePromise'' objects, these methods force creation of the new nodes:
 +<code java>
 +  interface INode {
 +    INode getLeft();
 +    INode getRight();
 +  }
 +
 +  class Node extends AbsNode {
 +    private INode left;
 +    private INode right;
 +
 +    // constructor omitted
 +
 +    public INode getLeft() { return this.left; }
 +
 +    public INode getRight() { return this.right; }
 +  }
 +
 +  class NodePromise extends AbsNode {
 +
 +    // constructor omitted
 +
 +    private INode force() {
 +      return new Node(this.data,
 +                      new NodePromise(this.data - 1),
 +                      new NodePromise(this.data + 1));
 +    }
 +
 +    public INode getLeft() { return this.force().getLeft(); }
 +    public INode getRight() { return this.force().getRight(); }
 +  }
 +</code>
 +
 +To convince ourselves that this does hide the details from a user, we can experiment with our interface (in DrJava, you can use the interactions window for this).
 +<code text>
 +> NodePromise root = new NodePromise(0);
 +> root.getLeft().getData()
 +-1
 +> root.getLeft().getLeft().getData()
 +-2
 +> root.getLeft().getRight().getLeft().getData()
 +-1
 +> root.getLeft().getRight().getRight().getData()
 +1
 +</code>
 +
 +This assumes that we also have an ''getData'' method in the ''AbsNode'' class that returns the data (see the final posted code file for this). We see that a user can’t tell that we are using ''NodePromise'' underneath, even though we have not created the full tree explicitly (as seen in the definition of root). We also know that ''force'' must be working, since we started with only a ''NodePromise''. In addition, we know that we are not actually creating the infinite tree, as our test case terminated without a stack overflow error.
 +
 +===== 4 Computing the Size =====
 +
 +Of course, we might feel more confident that we aren’t constructing the whole tree if we added a method to compute its size. What if we added the following method to ''AbsNode''?
 +<code java>
 +  abstract class AbsNode implements INode {
 +
 +    int data;
 +
 +    // constructor omitted
 +
 +    public int getData() { return this.data; }
 +
 +    public int size() {
 +      return 1 + getLeft().size() + getRight().size();
 +    }
 +  }
 +</code>
 +
 +If you run this method on our root example, Java eventually reports a "stack overflow" error. This means that the ''size'' computation did not terminate. Why not? If you think about our data structure carefully, it has no base case. This seems weird: a recursive data structure with no real base case. Haven’t we always needed a base case (such as leaf) in order to finish building data, or to make functions terminate? Yes. We only get away with building infinite data here because we defer building all the data via the promise.
 +
 +As for termination, when you follow templates on data with a base case, your functions are guaranteed to terminate. For functions on infinite data, we need to rely on something other than the data definition for termination. For example, we can change the problem to introduce a pseudo-base case (ie, terminate on ''NodePromise''). Or, we can do a computation that has some other argument on which we can terminate; if you expand a tree out by a fixed number of levels, as the ''lookAhead'' function will below, then the number of levels gives a way to guarantee termination. Either way, you need to make some argument about termination, rather then relying on the data structure to do that for you.
 +
 +As a side note: some of you may have learned about induction in high school or other courses. Induction is how we usually prove that functions over recursively-defined data are correct. Induction, however, relies on having a base case, which isn’t true here. If you are interested in the mathematics of infinite data structures, look up co-induction, which handles reasoning on infinite data structures.
 +
 +To see the "pseudo base-case" idea in action, we can choose to compute the size of the //demanded// portion of the tree, we can stop counting the size at promised nodes. We can achieve that by overwriting the ''size'' computation in the ''NodePromise'' class:
 +<code java>
 +  public int size() { return 0; }
 +</code>
 +
 +===== 5 Lazy Computation and Two Kinds of Functions =====
 +
 +What you’re seeing here is a powerful programming concept known as //**laziness**//. In lazy computation, we compute only what is needed, extending the data if more cases are required. Laziness is an interesting concept that we cover in depth in the Programming Languages course (for anyone who is interested). For purposes of this class, you should understand that demand-driven computation is sometimes essential, and that you can use methods such as ''force'' to achieve it.
 +
 +Stop and think about ''force'' for a moment. It is a rather different function than ones you have seen before. Most functions simply provide names for computations. When we give them arguments, they substitute those arguments into the function body and return a result. In contrast, ''force'' exists to //delay// a computation (namely, the construction of the tree). It doesn’t take arguments. Those of you who came through 101 saw this concept before through ''lambda''. The core idea is important in computer science overall, however: functions do not always abstract over behavior; they also can set up a computation to be performed later.
 +
 +===== 6 Looking Ahead in the Tree =====
 +
 +We mentioned that a common operation on game trees is to look ahead a few steps to help determine which of several moves to take. To wrap up this example, let’s write a ''lookAhead'' function that takes an integer indicating how many levels down to expand the tree, and returns a list of the possible nodes at that depth. A real game program would then perform some computation over that list of nodes to select which move to make next.
 +
 +First, let’s agree on the interface and examples:
 +<code java>
 +  interface INode {
 +    int getData();
 +    INode getLeft();
 +    INode getRight();
 +    IList<INode> lookAhead (int depth);
 +    int size();
 +  }
 +</code>
 +
 +Calling ''lookAhead'' on a node at depth 0 should return a list containing only the node (irregardless of whether that node is a ''NodePromise''). At any depth larger than 0, ''lookAhead'' will simply append the results of calling ''lookAhead'' on each of its subtrees at depth-1.
 +
 +Note that nothing in this description differentiates between ''Node'' and ''NodePromise'', so we can simply put the ''lookAhead'' computation in the ''AbsNode'' class:
 +<code java>
 +  public IList<INode> lookAhead(int depth) {
 +    if (depth == 0)
 +      return new ConsList<INode>(this, new MtList<INode>());
 +    else
 +      return this.getLeft().lookAhead(depth-1).append(this.getRight().lookAhead(depth -1));
 +  }
 +</code>
 +
 +This code uses a method ''append'' that concatenates the results of two lists. We haven’t written this method on lists before, but could easily add it to our running ''IList'' implementation if we so chose.
 +
 +==== 6.1 Using Java’s Built-In List Classes ====
 +
 +Actually, this is a good opportunity to introduce you to Java’s built-in classes for lists. Java uses the term ''LinkedList'' for the kinds of lists you have created in Racket and Java. If you want to use the ''LinkedList'' classes, you first need the following statement at the top of your file:
 +<code java>
 +  import java.util.LinkedList;
 +</code>
 +
 +The ''LinkedList'' package has several minor and one major difference from the lists we’ve created manually. The minor differences arise in the names of operators. We’ll just show you those as we need the operators, or you can look up the Java ''LinkedList'' interface with an easy Google search. The major difference is that Java’s operators do not return a new list. Instead, they modify an existing list. We’ll get into the serious implications of this difference next week. For now, it largely means that you have to write your code somewhat differently than before. This is best seen by example. Consider the following ''lookAhead'' code as written against the built-in list library:
 +<code java>
 +  public LinkedList<INode> lookAhead(int depth) {
 +    if (depth == 0) {
 +      LinkedList<INode> newlist = new LinkedList<INode>();
 +      newlist.add(this);
 +      return newlist;
 +    }
 +    else {
 +      LinkedList<INode> leftresult =
 +        this.getLeft().lookAhead(depth - 1);
 +      LinkedList<INode> rightresult =
 +        this.getRight().lookAhead(depth - 1);
 +      leftresult.addAll(rightresult);
 +      return leftresult;
 +    }
 +  }
 +</code>
 +
 +First, we see that the class name for a built-in list is ''LinkedList''. Now look inside the case when the depth is zero. The built-in classes do not distinguish between empty and non-empty lists as our classes do. When we ask for a new ''LinkedList'', we simply get back a new list with no elements. To add this as an element to the list, we use the ''add'' operator:
 +<code java>
 +  newlist.add(this);
 +</code>
 +
 +After adding this, we can return newlist.
 +
 +If you try to write these three lines more compactly as
 +<code java>
 +  return new LinkedList<INode>().add(this);
 +</code>
 +Java gives you a type error. Our ''lookAhead'' function needs to return a list, but the ''add'' operator doesn’t return a list. Instead, it returns ''void'', which you can think of as the "return nothing" type. The standard list operations all return either ''void'' or boolean (to indicate success or failure of the operation). As a result, you can’t compose list operations as we have done previously. Instead, you create a variable to hold the list(s), write each operation as an independent statement, then return whichever list holds your result at the end. The sequence of statements in the ''else'' case of ''lookAhead'' give you another example of this stylistic difference.
 +
 +==== 6.2 Testing lookAhead ====
 +
 +One way to test ''lookAhead'' is to make sure that we are getting the number of nodes we expect from looking ahead at various depths. The following set of test cases shows what we expect here, using Java’s built-in ''size'' method on lists to compute our answer:
 +<code java>
 +  class Examples {
 +    NodePromise root = new NodePromise(0);
 +
 +    boolean testLookAhead0 (Tester t) {
 +      return t.checkExpect(root.lookAhead(0).size(),1);
 +    }
 +
 +    boolean testLookAhead1 (Tester t) {
 +      return t.checkExpect(root.lookAhead(1).size(),2);
 +    }
 +
 +    boolean testLookAhead3 (Tester t) {
 +      return t.checkExpect(root.lookAhead(3).size(),8);
 +    }
 +  }
 +</code>
 +For more detailed testing, we would want to check the contents of each node. We will hold off on that for now, as we haven’t yet shown you how to iterate over Java’s built-in lists.
 +
 +===== 7 And Some More Java Lessons... =====
 +
 +Actually, this example works well for illustrating some other Java-isms that you need to know coming out of this course. The final version of the code, posted to the website, illustrates three additional features/techniques that you need to understand:
 +
 +  * Access modifiers: Java lets you control access to fields and methods across classes. You’ve already encountered "public" as an annotation for methods, but we’ve never discussed what it means. This code illustrates three different access modifiers: public, private, and protected. \\ \\ 
 +    * //Private// fields and methods are only visible inside of the class (they are not accessible in subclasses); ''force'' in ''NodePromise'' is a good example of a private method: no class other than ''NodePromise'' should know about or be able to use this method. As a general rule, you should make all fields private (providing methods to access them, as we did with ''getData'') unless you have a good reason to do otherwise. Once we get into mutation, other classes could modify your non-private data, which can become problematic. \\ \\ 
 +    * //Protected// fields and methods are visible within the class and its subclasses. The data in AbsNode is protected to restrict its direct access to Node and NodePromise. Had it been private, neither subclass could use the expression this.data. \\ \\ 
 +    * //Public// fields and methods are accessible to any class or object. Methods in interfaces are required to be public (that’s the spirit of being an interface). \\ \\ 
 +      
 +This summary really is all you need to know about public/private/protected. We will begin to use these modifiers consistently so that you get comfortable with them and their constraints. There are any number of descriptions of these online if you feel you need more detail.
 +
 +  * Wrapper classes: It is common to use a separate class to "wrap" an instance of your datastructures. This is useful when you want to give someone either an obscured or persistent reference to your data, while allowing you to modify the data behind the scenes. The ''GameTree'' class in the posted code is a wrapper for nodes. It is useful in this case because a user should not need to know that there is a difference between ''Node'' and ''NodePromise'' (i.e., that we defer the computation of some nodes is really an implementation detail). Instead, we allow a user to create a ''GameTree'', which we in turn initialize with a ''NodePromise''. \\ \\ 
 +  * Multiple Constructors: Notice that there are two constructors in the ''GameTree'' class: one takes a value for the rootdata and one initializes the rootdata to 0. This provides flexibility to your user. In a real game, the initial data might be too complex for a user to provide, or the user might customize only part of the initial data. Multiple constructors provide flexibility when different contexts would use your program in different ways.
 +
 +===== 8 Summary =====
 +
 +We’ve covered a lot of ground here. What should you understand from this lecture?
 +  * That we can support infinite data by defining classes and methods to build more data as it is needed. \\ \\ 
 +  * How to create and augment Java’s built-in lists (iteration will come later). \\ \\ 
 +  * Java’s access modifiers, and when to use each. \\ \\ 
 +  * The idea of wrapper classes and multiple constructors to give users a need-to-know view of your data. \\ \\