Thursday, June 20, 2013

Optimizing with the help of FsCheck

Recently I needed a function to transpose a jagged array in F#. As I knew Haskell probably had this in its standard library and I'm lazy, I hoogled “transpose” and followed the link to its source code, then translated it to F#:

module List =
    let tryHead = 
        | [] -> None
        | x::_ -> Some x

    let tryTail =
        | _::xs -> Some xs
        | _ -> None

    let rec transpose =
        | [] -> []
        | []::xs -> transpose xs
        | (x::xs)::xss -> (x::(List.choose tryHead xss))::transpose (xs::(List.choose tryTail xss))

The end.

Oh wait, I needed to process arrays, not lists. Well, I suppose we could convert the jagged array to a jagged list, and then back to an array:

let transpose x = x |> Array.toList |> Seq.toList |> List.transpose |> List.toArray |> Seq.toArray

It works, but it's a bit inefficient. For example, it does a lot of unnecessary allocations. Let's time it with a big array:

let bigArray = Array.init 3000 (fun i -> Array.init i id)
transpose bigArray |> ignore

Real: 00:00:01.369, CPU: 00:00:01.357, GC gen0: 79, gen1: 44, gen2: 1

We can do better. Since we're working with arrays, we could calculate the length of each array and preallocate it, then copy the correponding values. Problem is, that kind of code is very imperative, and tricky to get right.
Enter FsCheck. With FsCheck we can use the inefficient implementation as a model to ensure that the new, more efficient implementation is correct. It's very easy too:

let transpose2 (a: 'a[][]) = a // placeholder for the efficient implementation
FsCheck.Check.Quick ("transpose compare", fun (a: int[][]) -> transpose a = transpose2 a) 

Run this and FsCheck will generate jagged arrays to compare both implementations. As the new implementation is merely a placeholder for now, it won't take long to find a value that fails the comparison:

transpose compare-Falsifiable, after 1 test (0 shrinks) (StdGen (1547388233,295729162)):

Now that we have a simple but strong test harness we can confidently implement the optimized function. And many failed test runs later, this passes the 100 (by default) test cases generated by FsCheck:

let transpose2 (a: 'a[][]) =
    if a.Length = 0 then 
        let r = Array.zeroCreate (a |> Array.length |> Seq.max)
        for i in 0 .. r.Length-1 do
            let c = Array.zeroCreate (a |> Seq.filter (fun x -> Array.length x > i) |> Seq.length)
            let mutable k = 0
            for j in 0 .. a.Length-1 do
                if a.[j].Length > i then
                    let v = a.[j].[i]
                    c.[k] <- v
                    k <- k + 1
            r.[i] <- c

See, I told you it would be all imperative and messy! But is it more efficient?

transpose2 bigArray |> ignore

Real: 00:00:00.218, CPU: 00:00:00.218, GC gen0: 3, gen1: 2, gen2: 0

Yes it is (at least with this very unscientific benchmark).

BONUS: are you wondering why the original definition uses List.choose tryHead instead of List.head ? That is:

let rec transpose =
    | [] -> []
    | []::xs -> transpose xs
    | (x::xs)::xss -> (x::( List.head xss))::transpose (xs::( List.tail xss))

FsCheck can help with that too! Simply run this against the generated inputs and ignore the result (a trivial test):

FsCheck.Check.Quick("transpose", List.transpose >> ignore)

And watch it fail:

transpose-Falsifiable, after 1 test (6 shrinks) (StdGen (1278231111,295729178)):
[[true]; []]
with exception:
System.ArgumentException: The input list was empty.
Parameter name: list
   at Microsoft.FSharp.Collections.ListModule.Head[T](FSharpList`1 list)
   at[T,TResult](FSharpFunc`2 mapping, FSharpList`1 x)
Note how FsCheck shrinks the value to the simplest possible failing case. This usage is great to find unwanted partial functions.

One last comment: none of this is specific for F#. FsCheck has a nice C# interface! You could also use it from Fuchu.


jackfoxy said...

FsCheck is a superb testing tool, and now that it is one of the first-rank F# projects on GitHub this article is a great addition to the literature helping people gain fluency in its use. More examples are always better, and you have shown FsCheck's usefulness in at least 3 previous articles.

Maybe it is time to add FsCheck to Bug squash's label tags?

Mauricio Scheffer said...

That's a great idea, Jack! I just did this. Thanks!

David said...

Hi Mauricio,

Thanks a lot for another great article. Thanks to this discovery I managed to pull a much more elegant version. It's a little slower than yours but parallelizing it was trivial by the addition of .Parallel in one place making it faster in sizable samples.

Needless to say, I might have given up hadn't it been for FsCheck keeping tabs on my changes.

Mauricio Scheffer said...

David: it looks great!

Lo Sauer said...

Great introduction and example. I like your F# post series! Cheers.