KeY-TechTips No. 2, December 17, 2001
-------------------------------------

The KeY-TechTips are intended for those who write code for the KeY 
system. They appear every two or three weeks and cover issues on how to 
write code for the KeY system in a comprehensive and example oriented 
way, building documentation step-by-step. You are welcome to write an 
article for the KeYTechTips. Mail to Richard Bubel (bubel@mail.informatik.kit.edu).

There will soon be a directory of all KeY-TechTips on the KeY "internal 
documents" website. (If you have not received the first issue of the 
KeYTechTips but want to read it, write to bubel@mail.informatik.kit.edu)

For questions or comments concerning these tips write to Andreas Roth 
(aroth@mail.informatik.kit.edu).

These second KeY-TechTips cover how to test code written for the KeY 
system using JUnit automated tests.

--------------------------------------------------------------------------------

TESTING KeY CODE
----------------

What are automated tests?
-------------------------

Looking into some of the packages of the KeY source code like 
"de.uka.ilkd.key.logicdata" you will notice several classes starting 
with "Test...", like the class 
"de.uka.ilkd.key.logicdata.TestNamespace". At a closer look, all of 
these classes apply to the following scheme:

  package de.uka.ilkd.key....;
  import junit.framework.*;
  public class Test... extends TestCase {

      public Test...(String name) {
          super(name);
      }

      public void setUp() {
         ...
      }

      public void test...() {
         ...
      }
    
      ...
 
      public void test...() {
         ...
      }
  }

All these Test... classes are automated tests using the JUnit framework. 
For a detailed account on JUnit have a look at "https://junit.org" or the 
literature that is listed there. These KeY-TechTips give a summary and 
describe how tests are used in the KeY project and what experiences have 
been made after almost two years working with JUnit.

As a first example, let's imagine the following test class:

  package de.uka.ilkd.key.magic;
  import junit.framework.*;
  public class TestMagicCalculator extends TestCase {

      private MagicCalculator mc;

      public TestMagicCalculator(String name) {
          super(name);
      }

      public void setUp() {
          mc=new MagicCalculator();
      }
 
      public void testAdd() {
          int result=mc.add(3,4);
      assert("Something wrong in add method of MagicCalculator", result==7);
      }

      public void testDiv() {
          int result=mc.div(42,6);
          assert(result==7);
      }    
  }

Additionally, we have a MagicCalculator class that implements standard 
arithmetic operations like integer addition and division. With our test 
class TestMagicCalculator we can test some of the functionalities of 
MagicCalculator.

Tests are invoked by calling the Java program "junit.textui.TestRunner", 
or the even nicer "junit.swingui.TestRunner". JUnit's jar-file has to be 
part of the classpath. If you have installed the KeY source code version 
and call java with the script "runJava" in the "system" directory then 
you don't have to look for any classpath variable.
By indicating the full class name JUnit knows which test to run. In our 
case we have to enter:

 > runJava junit.textui.TestRunner de.uka.ilkd.key.magic.TestMagicCalculator

Internally, JUnit creates an instance of "TestMagicCalculator", invokes 
the "setUp()" method, i.e., in our case creates an instance of 
"MagicCalculator", and subsequently calls all public methods in 
"TestMagicCalculator" whose name start with "test" and that have no 
parameters, i.e., in our case the methods "testAdd()" and "testDiv()". 
When a call to the "assert" method (defined in the super class 
"TestCase") is reached, the boolean value in the argument is checked and 
a special exception is thrown if it evaluates to false. Assumed 
"MagicCalculator"`s "div()" method has errors and returns an 8 instead 
of a 7 in the above case, JUnit would state that a failure has occurred 
and deliver more information about the failure. If a string is provided 
for "assert" (as in "testAdd()") JUnit prints that message as well, in 
case of failure. If no failure occurs and no other exception is thrown 
during execution of the tests JUnit returns the glad message.

JUnit distinguishes between failures and errors. Failures occur if an 
assertion given by assert (or assertEquals, see the JUnit docs for that) 
is not met. If another exception is thrown while executing the involved 
classes, this is an error and you see the whole exception stack trace 
printed.

Automated tests are reassuring, aren't they?
--------------------------------------------

 From the moment JUnit lets all our tests pass, we can be happy, right? 
Well, it depends on the choice of the test cases! With 
"TestMagicCalculator" above we should not be too happy, as we know that 
42/6 is correctly calculated, but important cases are missing: what 
happens if the two numbers are not divisible, or if the divisor is 0? It 
is obvious that not all cases can be covered by tests, but it's a good 
idea to try to choose representatives from each class of cases: In our 
example, one from those pairs that are divisible, one from those who are 
not, one from those where no division is defined, etc. It is extremely 
important to cover the special cases, they are source of most errors, so 
identify them and write tests for them!

Often, it is also important to check that something does not work, e.g., 
that an exception is thrown upon a method call. Assumed a division by 
zero causes the "div(...)" method to throw an ArithmeticException. Then 
you could test this behavior by adding this test method:

  public void testDivBy0() {
      boolean exception = false;
      try {
          mc.div(42,0);
      } catch (ArithmeticException e) {
          exception = true;
      }
      assert("There should be an exception when dividing by zero.", 
exception);
  }

Actually, there should be tests for EVERY little task that has been 
completely performed. In the world of Extreme Programming these tasks 
are called stories. Tasks in the KeY project that were covered by tests 
are the matching algorithm of STSRs (in TestMatchSTSR), the application 
of STSRS (TestApplySTSR), the parsers (e.g. TestTermParser) and many 
many more.

To be honest, much more tests could have been and should have been 
written, but writing tests can be time consuming. There are also tasks 
that do not need to be covered by tests (Testing a class that merely 
consists of "get" and "set" methods is really unnecessary!). Anyway, in 
general, it's worth it!

Many bugs have been discovered by running automated tests. When we had 
to change something because we wanted to add new functionality to KeY or 
wanted to refactorize a portion of code, and ran all our automated tests 
afterwards, they let us know errors that we otherwise would not have 
noticed - immediately. Having this immediate and fast feedback is 
important, because the bugs must have had their source within the 
changed code. Discovering errors by chance two weeks after the change 
triggers a much wider search for the error! As the KeY project grows and 
more and more people work on parts of the KeY system's code this problem 
gets even more critical.

In order to get the immediate feedback from ALL the automated tests in 
the KeY system there is a single script "runTests" in the "system" 
directory. AFTER you changed something anywhere and BEFORE you check it 
into version control, you SHOULD CALL "runTests"! And you should only 
check in onto the main PRCS path, if all tests work correctly. This 
helps to ensure that nobody has to debug code he has not written 
her/himself and to always have a working current version. Thus, simply call:

 > runTests

"runTests" calls the class "TestKeY" in the system directory. This is a 
so called JUnit test suite and looks like this:

  import junit.framework.*;
  public class TestKey extends TestCase{
 
      static Class[] testClasses = new Class[] {
        de.uka.ilkd.key.collection.TestSetAsListOfString.class,
             //... much more test classes here
          de.uka.ilkd.key.logicdata.TestCollisionResolving.class
       };
 
      public static Test suite() {
          TestSuite suite=new TestSuite();
          for ( int i = 0; i < testClasses.length; i++ ) {
              suite.addTest(new TestSuite(testClasses[i]));
          }
          return suite;
      }

      public TestKey(String name) {
          super(name);
      }
  }

When you run "runTests" resp. this class, ALL the test cases in the 
array "testClasses" are executed by JUnit. So, adding any new created 
test case class (like "TestMagicCalculator") to that array ensures that 
"runTests" also considers the new tests and (if all users check 
"runTests" before they check in) that every change to the code respects 
the tests you wrote.

Testing terms in KeY
--------------------

Before we leave the "TestMagicCalculator" example, let's put the focus 
on how a test... method is build up in general. The observation is that 
in general we compare two results against each other when the 
functionality of a method is tested:

  First, the result that a method delivers and,
  second, the result we want the method to deliver.

In the TestMagicCalclulator.testDiv() method, the example for the first 
is the method call "mc.div(42,6)"; the example for the second is the 
literal "7". It is really easy to get the first because it just needs a 
method call, but it is IN GENERAL hard to get the second. That's not the 
case here, because the subjects are mere ints, but the complicated the 
structure of a method call's result the complicated to create the object 
of comparison.

To illustrate this and how the arising problems are handled in KeY let's 
move on to another (fictious) sample test case fragment:

  package de.uka.ilkd.key.magic;
  import junit.framework.*;
  import de.uka.ilkd.key.logicdata.*;
  public class TestMagicTermTransformation extends TestCase {

      private MagicTermTransformation mtt;
      private TermFactory termFactory;

      public TestMagicTermTransformation(String name) {
      super(name);
      }

      public void setUp() {
          mc=new MagicTermTransformation();
      termFactory=TermFactory.DEFAULT;
      }

      public void testTrans1() {
          Function c = new Function(new Name("c"), Sort.FORMULA, new 
Sort[0]);
          Function d = new Function(new Name("d"), Sort.FORMULA, new 
Sort[0]);
          Term origTerm = termFactory.createJunctorTerm(Op.AND,
                                        termFactory.createFunctionTerm(c),
                                        termFactory.createFunctionTerm(d));
          Term result = mtt.doTransformation1(origTerm);
      Term check = termFactory.createJunctorTerm(Op.OR,
                          termFactory.createJunctorTerm(Op.IMP,
                             termFactory.createJunctorTerm(Op.AND, //... ???
      assert("Transformation 1 failed.", result.equals(check));
      }
  }

The test case tests a class ("MagicTermTransformation") that transforms 
DL-terms somehow. Let's only assume that the results of such a 
transformation were really complicated and long. What is tested in the 
test case's method testTrans1 is that the result of a transformation on 
the formula "c & d" with constant predicates c and d equals some other 
(awfully long) term, called "check". To make the test work "check" has 
to be created, which is no principal problem, but needs a lot of calls 
to the TermFactory (that will be covered by some of the next 
KeYTechTips) building terms bottom-up. Having done this several times 
you won't be able to recognize the meaning of the terms unless you have 
magic documenting capabilities.

To work around this ever-returning problem of quickly creating terms in 
order to have them available as "check"s, KeY provides a class 
"STSRForTests" in the package "de.uka.ilkd.key.logicdata". Before it can 
be used to produce terms, a file with the needed sorts, predicates, 
functions (and maybe STSRs) must be put somewhere (e.g. to the directory 
of the test case). Such a file can look like this (have a look at the 
key-files in "/system/proofExamples/" to get more examples):

  sorts {
       s0;
       s1;
  }
  predicates {  
        p(s0, s0);
        c;
        d;
  }
  functions {
        s0 f(s0);
  }
  false

Having called this file "testMagicTermTransformation.key" the test case 
can be formulated as:

  package de.uka.ilkd.key.magic;
  import junit.framework.*;
  import de.uka.ilkd.key.logicdata.*;
  public class TestMagicTermTransformation extends TestCase {

      private MagicTermTransformation mtt;

      public TestMagicTermTransformation(String name) {
        super(name);
      }

      public void setUp() {
          mc=new MagicTermTransformation();
          STSRForTests.parse("de/uka/ilkd/key/magic"
                                           
+"/testMagicTermTransformation.key");
      }

      public void testTrans1() {
          Term origTerm = STSRForTests.parseTerm("c & d");
          Term result = mtt.doTransformation1(origTerm);
          Term check = STSRForTests.parseTerm("(((c | d) -> (c | d)) & 
d)");
                                                               //or 
whatever term
          assert("Transformation 1 failed.", result.equals(check));
      }
  }

As can be seen easily, this way of creating terms for tests is much more 
efficient and readable. Moreover, the predicates c and d are guaranteed 
to be the same.

When should I write tests?
--------------------------

It is important that tests cover sufficiently little and (if possible) 
isolated tasks, i.e., the involved classes should be very few. If a test 
fails you will be happy not to search 20 classes in a disgusting 
debugging effort, 3 or 4 classes are better. However, sometimes this 
goal is hardly to achieve and it is better to write such a "wide" test 
than to write none.

As an example take the Test "TestSTSRIndex" in package 
"de.uka.ilkd.key.proof". What is intended to be tested is that the index 
"STSRIndex" stores and retrieves STSRs in a correct way. The access 
methods of "STSRIndex", however, use methods that comprise the matching 
algorithm for STSRs. If errors in the matching algorithm happen, the 
tests for "STSRIndex" might fail, too, and finding the bug becomes more 
difficult. The "STSRIndex" had to be tested, so it is OK...

Tests can of course be written anytime, but it is much better to write 
them as early as possible. This is advantageous because you know much 
better what the critical cases are shortly after you wrote the code. If 
you want to build up on the just written code, you will also appreciate 
to be sure that the basics work and avoid a larger debugging phase. Some 
people say that it's better to write tests even BEFORE you start to code 
the task. Try it...