Tree Types and Pattern Matching in Memphis C/C++

Domain Declarations and Match Statements

Memphis
Examples
Manuals
Distribution

Memphis is a C/C++ extension for compiler writers and other programmers having to manipulate symbolic data.

Memphis provides a new kind of type declarations to describe such data via recursive definitions in a grammatical style. It also introduces a new statement to process such structures via rules that are selected by pattern matching. These concepts are known from functional languages and modern compiler construction tools.

The new concepts are seamlessly integrated with the C/C++ programming language. One may use the new features without giving up anything of the power and flexibility of the host language. Memphis is implemented by a precompiler.

Here a is complete Memphis program; it uses a domain declaration to introduce a new data type Tree, and a match statement to process Tree values:


   // Define a data type Tree.
   // A Tree is either
   //    a node with an int field val and two subtrees left and right   
   //    denoted node(val, left, right)
   // or
   //    an empty tree
   //    denoted empty()

   domain Tree {
      node(int val, Tree left, Tree right)
      empty()
   }

   // Define a function ExampleTree that constructs a Tree
   //
   //    node
   //    |
   //    +- 11
   //    |
   //    +- empty
   //    |
   //    +- node
   //       |
   //       +- 22
   //       |
   //       +- empty
   //       |
   //       +- empty

   Tree ExampleTree ()
   {
      return node(11, empty(), node(22, empty(), empty()));
   }

   // Define a function Sum that computes the sum
   // of the values of a all nodes of a given Tree t

   int Sum (Tree t)
   {
      // inspect the structure of t

      match t {

	 rule node(v, l, r) :

	    // if t has the form node(v, l, r)
	    // the result is obtained by adding v to the
	    // recursively computed sums of l and r

	    return v + Sum(l) + Sum(r);

	 rule empty() :

	    // if t is the empty tree
	    // the result is 0

	    return 0;

      }
   }

   // The main function

   extern "C" printf(...);

   main ()
   {
      printf ("Sum of ExampleTree is %d\n", Sum(ExampleTree()));
   }

A Tree is given by a node that has an integer attribute val and two subtrees left and right. A Tree may also be the empty tree.

Here is graphical representation of a Tree that has a val field of 22 and two empty subtrees:

   node
   |
   +- 22
   |
   +- empty
   |
   +- empty

The data type Tree is given by a domain declaration as follows.

   domain Tree {
      node(int val, Tree left, Tree right)
      empty()
   }

A domain declaration specifies a type by enumerating possible alternatives how to denote values of that type. According to the above declaration, values of type Tree have two alternative forms:

   node(v, l, r)

or

   empty()

An alternative specification gives a name for the alternative and lists the types and names of its fields. The alternative

   node(int val, Tree left, Tree right)

from the above declaration defines values of kind node, they have an int field named val and two Tree fields named left and right.

Because 22 is an integer and empty() stands for the empty tree,

   node(22, empty(), empty())

stands for the Tree depicted above.

This notation can be used in an expression. The function ExampleTree returns a slightly more complex Tree:

   Tree ExampleTree ()
   {
      return node(11, empty(), node(22, empty(), empty()));
   }

The value returned may be graphically represented as

   node
   |
   +- 11
   |
   +- empty
   |
   +- node
      |
      +- 22
      |
      +- empty
      |
      +- empty

We now define a function Sum that computes the sum of the values of a all nodes of a given Tree t. This will be done by recursively following the structure of a Tree.

If t has the form

   node(v, l, r)

the result is obtained by adding v to the recursively computed sums of l and r.

If t has the form

   empty()

the result is 0.

For example

   Sum( node(22, empty(), empty()) )

is 22 + 0 + 0, i.e. 22.

The body of the function is given as a match statement.

   int Sum (Tree t)
   {
      match t {
         rule node(v, l, r) :
            return v + Sum(l) + Sum(r);
         rule empty() :
            return 0;
      }
   }

A match statement names an item the structure of which has to be inspected.

   match t {
      ...
   }

means: inspect the structure of t.

It then lists in braces a number of rules from which one is selected according to the structure of the value. The rules

   rule node(v, l, r) :
      return v + Sum(l) + Sum(r);
   rule empty() :
      return 0;

cover the two cases.

A rule gives a pattern for a value and a list of statement that are executed when the given value matches that pattern.

Let t be the tree

   node(22, empty(), empty())

and consider the rule

   rule node(v, l, r) :
      return v + Sum(l) + Sum(r);

Matching the value

   node(22, empty(), empty())

against the pattern

   node(v, l, r)

succeeds. It also defines the variables v, l, and r as 22, empty(), and empty(), respectively. These variables are implicitly declared as local to the rule.

With these values the rule body

   return v + Sum(l) + Sum(r);

is executed. It returns the value 22 + 0 + 0, i.e. 22.

The second rule would not have been applicable, since

   node(22, empty(), empty())

does not match the pattern

   empty()

Memphis allows nested patterns of arbitrary depth. However, the two rules in our example use simple pattern that correspond to the two alternatives of the domain declaration. This style is very common: The structure of algorithm mirrors the structure of data.