Different Kinds of Tests

3.1 Different Kinds of Tests

3.1.1 Simple Tests: T for "Test"

The most common kind of tests is the simple test, an example of which is given above. It is of the form

(*$T <header>
  <statement>
  ...
*)

where each statement must be a boolean OCaml expression involving the function (or functions, as we will see when we study headers) referenced in the header. The overall test is considered successful if each statement evaluates to true. Note that the "close comment" *) must appear on a line of its own.

Tip: if a statement is a bit too long to fit on one line, if can be broken using a backslash (\), immediately followed by the carriage return. This also applies to randomised tests.

3.1.2 Equality Tests: =

The vast majority of test cases tend to involve the equality of two expressions; using simple tests, one would write something like:

(*$T foo
  foo 1 ( * ) [4;5] = foo 3 ( * ) [1;5;2]
*)

While this certainly works, the failure report for such a test does not convey any useful information besides the simple fact that the test failed. Wouldn’t it be nice if the report also mentioned the values of the left-hand side and the right-hand side ? Yes it would, and specialised equality tests provide such functionality, at the cost of a little bit of boilerplate code. The bare syntax is:

(*$= <header>
  <lhs> <rhs>
  ...
*)

However, used bare, an equality test will not provide much more information than a simple test: just a laconic “not equal”. In order for the values to be printed, a “value printer” must be specified for the test. A printer is a function of type α→ string, where α is the type of the expressions on both side of the equality. To pass the printer to the test, we use parameter injection (cf. Section 4.2.5); equality tests have an optional argument printer for this purpose. In our example, we have α = int, so the test becomes simply:

(*$= foo & ~printer:string_of_int
  (foo 1 ( * ) [4;5]) (foo 3 ( * ) [1;5;2])
*)

The failure report will now be more explicit, saying expected: 20 but got: 30.

3.1.3 Randomized Tests: Q for "Quickcheck"

Quickcheck is a small library useful for randomized unit tests. Using it is a bit more complex, but much more rewarding than simple tests.

(*$Q <header>
  <generator> (fun <generated value> -> <statement>)
  ...
*)

Let us dive into an example straight-away:

(*$Q foo
  Q.small_int (fun i-> foo i (+) [1;2;3] = List.fold_left (+) i [1;2;3])
*)

The Quickcheck module is accessible simply as Q within inline tests; small_int is a generator, yielding a random, small integer. When the test is run, each statement will be evaluated for a large number of random values – 100 by default. Running this test for the above definition of foo catches the mistake easily:

law foo.ml:14::>  Q.small_int (fun i-> foo i (+) [1;2;3]
    = List.fold_left (+) i [1;2;3])
failed for 2

Note that the random value for which the test failed is provided by the error message – here it is 2. It is also possible to generate several random values simultaneously using tuples. For instance

(Q.pair Q.small_int (Q.list Q.small_int)) \
  (fun (i,l)-> foo i (+) l = List.fold_left (+) i l)

will generate both an integer and a list of small integers randomly. A failure will then look like

law foo.ml:15::>  (Q.pair Q.small_int (Q.list Q.small_int))
    (fun (i,l)-> foo i (+) l = List.fold_left (+) i l)
failed for (727, [4; 3; 6; 1; 788; 49])

Available Generators:

Simple generators:
unit, bool, float, pos_float, neg_float, int, pos_int, small_int, neg_int, char, printable_char, numeral_char, string, printable_string, numeral_string
Structure generators:
list and array. They take one generator as their argument. For instance (Q.list Q.neg_int) is a generator of lists of (uniformly taken) negative integers.
Tuples generators:
pair and triple are respectively binary and ternary. See above for an example of pair.
Size-directed generators:
string, numeral_string, printable_string, list and array all have *_of_size variants that take the size of the structure as their first argument.

Tips:

Duplicate Elements in Lists: When generating lists, avoid Q.list Q.int unless you have a good reason to do so. The reason is that, given the size of the Q.int space, you are unlikely to generate any duplicate elements. If you wish to test your function’s behaviour with duplicates, prefer Q.list Q.small_int.
Changing Number of Tests: If you want a specific test to execute each of its statements a specific number of times (deviating from the default of 100), you can specify it explicitly through parameter injection (cf. Section 4.2.5) using the count : int argument.
Getting a Better Counterexample: By default, a random test stops as soon as one of its generated values yields a failure. This first failure value is probably not the best possible counterexample. You can force qtest to generate and test all count random values regardless, and to display the value which is smallest with respect to a certain measure which you define. To this end, it suffices to use parameter injection to pass argument small : α → β, where α is the type of generated values and β is any totally ordered set (wrt. <). Typically you will take β = int or β = float. Example:
```
let fuz x = x
let rec flu = function
  | [] -> []
  | x :: l -> if List.mem x l then flu l else x :: flu l

(*$Q fuz; flu & ~small:List.length
  (Q.list Q.small_int) (fun x -> fuz x = flu x)
*)
```
The meaning of small:List.length is therefore simply: “choose the shortest list”. For very complicated cases, you can simultaneously increase count to yield an even higher-quality counterexample.

3.1.4 Raw oUnit Tests: R for "Raw"

When more specialised test pragmas are too restrictive, for instance if the test is too complex to reasonably fit on one line, then one can use raw oUnit tests.

(*$R <header>
  <raw oUnit test>...
  ...
*)

Here is a small example, with two tests stringed together:

(*$R foo
  let thing = foo  1 ( * )
  and li = [4;5] in
  assert_bool "something_witty" (thing li = 20);
  assert_bool "something_wittier" (foo 12 ( + ) [] = 12)
*)

Note that if the first assertion fails, the second will not be executed; so stringing two assertions in that mode is different in that respect from doing so under a T pragma, for instance.

That said, raw tests should only be used as a last resort; for instance you don’t automatically get the source file and line number when the test fails. If T and Q do not satisfy your needs, then it is probably a hint that the test is a bit complex and, maybe, belongs in a separate test suite rather than in the middle of the source code.

3.1.5 Exception-Throwing Tests: E for "Exception"

... not implemented yet...

The current usage is to use (*$T and the following pattern for function foo and exception Bar:

try ignore (foo x); false with Bar -> true

If your project uses Batteries and no pattern-matching is needed, then you can also use the following, sexier pattern:

Result.(catch foo x |> is_exn Bar)