Autodiffin'it with spire
One of the most useful features of the Ceres solver is its support for the automatic differentiation (in forward-mode) of vector cost functions of multiple vector arguments. This is entirely implemented caller-side with respect to the optimizer, through the use of a pair of templated adaptors (high and low-level ones). They work together to "convert" a Ceres cost function into a templated functor, written by the user, that the can be called with one of two different types of arguments: ordinary double-valued parameter blocks for just computing the values of the residuals, and parameter blocks whose elements are Jet-valued for computing both the residuals and their partial derivatives with respect to the parameters.
So, the optimizer only "sees" an ordinary cost function, but:
- When it calls it with a NULL jacobian matrix, the adapter chooses the double-only specialization of the user-provided cost functor, and returns the straight residuals to the optimizer.
- Otherwise, it packs the parameter values into Jet's, calls the Jet-typed specialization of the functor, and then unpacks the result into residuals and jacobians.
Unfortunately none of these caller-side goodies can directly be used in our SWIG-based port, since SWIG cannot specialize the C++ templates for as-yet undefined implementations of the cost functors, to be provided by the users (or, to repeat the usual mantra, Java generics are not the same as C++ templates). Sad!
However, we can take advantage of Ceres's clean separation of the autodiff implementation from the core optimizer code. Autodiff is math, and there are plenty of excellent numerical math libraries for the JVM. A particularly good one in the Scala world is spire, whose authors have gone through extreme lengths toward providing powerful type-safe mathematical abstractions while sacrificing little or nothing in the way of execution performance. This article and this YouTube video provide good introductions to the library.
Some time ago, in preparation for this port, I contributed to spire an implementation of the Jet type that seamlessly integrates with the rest of spire's algebraic types as well as its generic implementations of the standard math libraries. Now we get to use it.
The basic idea is to replicate in a bit of Scala code the with/without jacobian code switch described above. Let's first define a CostFunctor base class, parametrized on the argument type, which the user can later override with their particular implementation of a residual.
This defines a function representing a cost term of dimension "kNumResiduals", dependent on a variadic number of parameter blocks, each of size N(0), N(1), etc., so that the parameter space is composed of N.length parameter blocks. The parameter blocks themselves are passed to the cost evaluation method as Array's parametrized on a type T which, through the given typeclasses, allow for algebraic calculations on their components.
The conversion method "toAutodiffCostFunction" produces the actual auto-differentiable cost function. It is defined as follows:
There is quite a bit of code there, but it should be fairly easy to parse.
First I define, following Ceres, a SizedCostFunction that makes explicit the input and output dimensions of a CostFunction, then I extend it into an AutodiffCostFunction, which is constructed from a given CostFunctor (hence with known such dimensions).
The evaluate method of AutodiffCostFunction implements the CostFunction interface. Note that it consists of two main branches, respectively for the case of pure cost evaluation (no jacobians), and cost with partial derivatives. In the second case I "convert" the input parameter blocks into Jet's set up for computing the associated derivatives, and then map back the functor's output to residuals and jacobians.
A bit of extra complication, present in the Ceres C++ code as well, is due to the fact that we want to differentiate a function of several separate blocks, whereas the Jet framework is designed for computing partial derivatives with respect to variables represented in a single container. Therefore we must "unroll" the blocks prior to calling the cost functor, and manage the variable indices accordingly.
We can test all the above by re-implementing in Scala the Powell function example from the Ceres distribution:
Running it we get the same results as the C++ version:
As before, the timing does not reflect JIT optimizations.
Getting auto-differentiation was the last nontrivial step of this port. From now on it's just "plumbing". My next post will point to a first version of the entire thing.