Sunday, June 5, 2016

Checkerboards!

Two days ago I wrote the umpteenth Stack Overflow reply about detecting and matching a checkerboard-like calibration target in an image.

The reply mentioned an often-rediscovered algorithm for indexing the detected target corner features into a 2D array isomorphic to the physical checkerboard pattern, so they can be correctly addressed by the later steps of the camera calibration software.

And then I received quite a few private "pings" from people asking for references to the algorithm itself. Yet the beauty of the obvious is that one can't quite remember when and where they saw it first, and point to an original reference. I definitely can't - my own version of that algorithm was developed by myself and a colleague at a now-defunct startup back in '99, I think. I have seen similar techniques in various papers and tech reports over the years, but can't find any in a pinch.

So, for future reference, here is a link to an old report by myself and Roman Waupotitsch describing the whole thing in excruciating detail (algorithms in pseudocode! you have been warned). We even tried to submit the draft to a conference, but it wasn't deemed worthy - applications are for mechanics, I guess, etc.

At the time our implementation (in C++ on Win32) was quite robust and performant - definitely superior to what the early versions of the OpenCV code could do. In particular, it was quite robust to missing / occluded corners, and could easily be generalized to the case of multiple targets in the same image - we had an application doing just that.

Its most glaring weakness was the use of a very simple asymmetry in the pattern for resolving the absolute positions of the corners: we just used two checkerboard columns wider than the rest. Worked for us at the time because the middle section of the board was normally in view in our application, but in every successive re-implementation of the thing I have used the clever "crossratio barcode" pattern by Matsunaga and Kanatani.

Saturday, March 5, 2016

On calling Ceres from Scala - 6.0: Skeres 0.99 is released

This is the seventh installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.

Skeres 0.99 is released


As hinted in my previous post, this one is to announce the release of a first working and nearly complete version of my SWIG wrap + port to Scala of the Ceres Solver API. The code is available in a Github repo, and has so far been only built and tested on my MacOS X notebook, though I expect minimal or no problems in running in on other UNIX-y systems.

For the build itself I went for the most straightforward solution:
  • Bash script to drive SWIG-ing and compilation of the JNI C++ wrapping code.
  • SBT for everything JVM-related thereafter.
The bash script itself fires up sbt in console mode after setting the environment variable LD_LIBRARY_PATH so the JVM knows where to find the JNI wrapper DLL's. From there you can issue the "test" command to run all the unit tests, or head to "project examples" to run the examples. For instance, we can run my port of the simple bundle adjuster from the Ceres distribution's directory of examples, on the same data as provided therein:


What's missing?

Not much. Details are in the TODO.md file in the repo. Mostly documentation, porting of more examples, and a couple of the client-side goodies that Ceres provides, e.g. grid evaluation. Stay tuned for more.

Was it hard?

Not complaining at all. Most of the work was in porting to Scala the client-side auto-differentiation adapters for vector-valued cost functions of multiple vector-valued arguments, and in writing unit tests. Another boring but necessary slog (timewise) was the porting of the client-side utilities to handle rotation matrices, and their angle-axis and quaternion representations - the latter somewhat eased by the fact that spire direcly supports quaternions on fields. 

Sunday, January 31, 2016

On calling Ceres from Scala - 5.0: Autodiffin'it with spire

This is the sixth installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.

Autodiffin'it with spire


One of the most useful features of the Ceres solver is its support for the automatic differentiation (in forward-mode) of vector cost functions of multiple vector arguments. This is entirely implemented caller-side with respect to the optimizer, through the use of a pair of templated adaptors (high and low-level ones). They work together to "convert" a Ceres cost function into a templated functor, written by the user, that the can be called with one of two different types of arguments: ordinary double-valued parameter blocks for just computing the values of the residuals, and parameter blocks whose elements are Jet-valued for computing both the residuals and their partial derivatives with respect to the parameters. 

So, the optimizer only "sees" an ordinary cost function, but:
  • When it calls it with a NULL jacobian matrix, the adapter chooses the double-only specialization of the user-provided cost functor, and returns the straight residuals to the optimizer. 
  • Otherwise, it packs the parameter values into Jet's, calls the Jet-typed specialization of the functor, and then unpacks the result into residuals and jacobians.

Unfortunately none of these caller-side goodies can directly be used in our SWIG-based port, since SWIG cannot specialize the C++ templates for as-yet undefined implementations of the cost functors, to be provided by the users (or, to repeat the usual mantra, Java generics are not the same as C++ templates). Sad!

However, we can take advantage of Ceres's clean separation of the autodiff implementation from the core optimizer code. Autodiff is math, and there are plenty of excellent numerical math libraries for the JVM. A particularly good one in the Scala world is spire, whose authors have gone through extreme lengths toward providing powerful type-safe mathematical abstractions while sacrificing little or nothing in the way of execution performance. This article and this YouTube video provide good introductions to the library.

Some time ago, in preparation for this port, I contributed to spire an implementation of the Jet type that seamlessly integrates with the rest of spire's algebraic types as well as its generic implementations of the standard math libraries. Now we get to use it. 

The basic idea is to replicate in a bit of Scala code the with/without jacobian code switch described above. Let's first define a CostFunctor base class, parametrized on the argument type, which the user can later override with their particular implementation of a residual.


This defines a function representing a cost term of dimension "kNumResiduals", dependent on a variadic number of parameter blocks, each of size N(0), N(1), etc., so that the parameter space is composed of N.length parameter blocks. The parameter blocks themselves are passed to the cost evaluation method as Array's parametrized on a type T which, through the given typeclasses, allow for algebraic calculations on their components.

The conversion method "toAutodiffCostFunction" produces the actual auto-differentiable cost function. It is defined as follows:


There is quite a bit of code there, but it should be fairly easy to parse.

First I define, following Ceres, a SizedCostFunction that makes explicit the input and output dimensions of a CostFunction, then I extend it into an AutodiffCostFunction, which is constructed from a given CostFunctor (hence with known such dimensions).

The evaluate method of AutodiffCostFunction implements the CostFunction interface. Note that it consists of two main branches, respectively for the case of pure cost evaluation (no jacobians), and cost with partial derivatives. In the second case I "convert" the input parameter blocks into Jet's set up for computing the associated derivatives, and then map back the functor's output to residuals and jacobians.

A bit of extra complication, present in the Ceres C++ code as well, is due to the fact that we want to differentiate a function of several separate blocks, whereas the Jet framework is designed for computing partial derivatives with respect to variables represented in a single container. Therefore we must "unroll" the blocks prior to calling the cost functor, and manage the variable indices accordingly.

Testing it


We can test all the above by re-implementing in Scala the Powell function example from the Ceres distribution:


Running it we get the same results as the C++ version:

As before, the timing does not reflect JIT optimizations.

Next steps


Getting auto-differentiation was the last nontrivial step of this port. From now on it's just "plumbing". My next post will point to a first version of the entire thing.

Monday, January 18, 2016

On calling Ceres from Scala - 4.2: Hello World, a brief note on speed

This is the sixth installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.

Hello World, A Brief Note on Speed


We all know how, in computer science, speed is often reported in terms of lies, big lies and benchmarks. However, silly benchmarks are fun, so let's write one.

On my 2.3 GHz i7 mac, the ceres-reported timings for a run of the native C++ implementation of the "helloworld" example are:


My Scala implementation based on the SWIG-generated wrap (HelloWorld.scala) instead clocks at:

This is not an apple-to-apple comparison, as the C++ version's cost function uses automatic differentiation, but the AD overhead should be in the noise.

So Scala is about 3x slower in terms of iteration time, and about 53x slower in total time, and that's OMG horrible right?

Well, not quite. The native version is statically optimized ("c++ -O3") in its entirety, whereas the port side calls JVM routines at every cost function evaluation. So, to be fair, we ought to give the JIT a chance to work its magic.

Running the optimization problem Scala-side 10 times in a row in the same process, but reinitializing all data structures at every run, should give us a better idea of the JIT-induced speedups.


Note the rather dramatic speedup after the first iteration, and the eventual convergence to essentially the same runtime as the native implementation.

This is, of course, just a toy benchmark, but the short answer to "Yo, how much slower is this port going to be compared to native?" should be (a) there is no short answer and (b) tell me more about your use case.

Sunday, January 17, 2016

On calling Ceres from Scala - 4.1: Hello World explained

This is the fifth installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.

Hello World Explained


In this post I'll go through the nontrivial parts of my first SWIG wrap of the ceres-solver library. The code is in the sandbox section of the skeres github repo, and includes a port to pure Scala of Ceres's helloworld.cc example.

Build

The library and example are built using a simple Bourne shell script (we'll tackle sbt configuration later). The ceres-library headers of interest are swigged into a com.google.ceres Java package, and the JNI auto-generated .cxx source file is produced in the toplevel directory, as is the shared library (.dylib file in Mac OSX) they are eventually linked into. The ported example itself belongs to a separate org.somelightprojections.skeres package.

Code

The example's code closely follows the C++ original, with one basic difference: the cost function derivative is computed analytically, rather than through automatic differentiation, as the latter will be the subject of the next step in this port. Therefore the residual term code looks like this:

As you can see, it follows closely the C++ implementation (i.e. what the C++ codes would look like if it didn't use autodiff): it describes the size of its parameter and residual blocks, and then overrides the Evaluate method to actually compute the cost. Note the use of camelCase-style identifiers, in keeping with the usual Scala style conventions.

Probably the only part deserving an explanation is the manipulation of the residual array and of the parameter and jacobian matrices. This takes advantage of a very simple C++ helper declared inline in the ceres.i SWIG configuration file:

Remember the discussion in a previous post about the use of the carrays.i SWIG library. The %array_class stanza declares DoubleArray to be a class wrapping an array of double's. The inline DoubleMatrix helper exposes a null test and access to the matrix by rows. Thanks to the magic of implicit conversions, this is all we need to make these low-level classes behave in a more idiomatic Scala way.

First, for one dimensional arrays of doubles we make use of the enrichment pattern to add the usual get/set accessors, as well as conversions of DoubleArray objects to pointers and Scala arrays:


We use a similar enrichment for matrices:

Note the call to the (swig-wrapped) DoubleMatrix helper to access a matrix row. Note also that these calls involve only tiny JVM object creations, with no copies of the data arrays themselves.

Finally, we define implicit conversions to instantiate these enrichments from the swig-wrapped low-level pointers:

Odds and ends

The ceres.i configuration file only includes as many ceres headers as are needed to compile the example. In addition, however, it also exports all the ceres-predefined "loss function" classes that are used to robustify the residuals. I mention this here to highlight another feature provided by SWIG, namely ownership transfer (i.e. memory leak avoidance). The relevant stanza looks like this:

The inline-declared struct allows us to create loss function instances from Scala using calls like PredefinedLossFunctions.huberLoss(2.0).  The %newobject declarations cause the destructors for the newly created-and-wrapped C++ objects to be called whenever the wrapping Java object is garbage-collected.

A similar issue is addressed in the HelloWorld.scala source where the Problem object is declared:


Here we use the (wrapped) ProblemOptions to instruct Ceres not to take ownership of the cost and loss function objects. This is necessary because they are JVM-owned objects, and the default behavior in which the ceres Problem instance takes ownership and deletes them at in its destructor would likely cause a double-free heap corruption.

Saturday, January 9, 2016

On calling Ceres from Scala - 4.0: Hello World

This is the fourth installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.


Hello World


We are now ready for our first-cut SWIG wrap of the ceres-solver library. The code is in the sandbox section of the skeres github repo, and includes a port to pure Scala of Ceres's helloworld.cc example.

Take a look at the sources, if you wish. I am pressed for time tonight, so will comment on them in a subsequent post. However, I want to post the output of a run just to show it's real:

On calling Ceres from Scala - 3.0: Who owns what?

This is the third installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.


Who owns what?


The pesky issue of memory management rears its head whenever code must cross the boundary between JVM  and native code. Items to be dealt with include:
  • Are copies of the same data kept on both sides? If yes, are we wasting valuable RAM (thus handicapping our ability to scale to large problems)? If no, are we wasting computing time making repeated expensive calls across the JNI boundary?
  • If memory is heap-allocated on the native side to hold the data, is it properly reclaimed to avoid leaks - including in the case of failures on the native side that leave the JVM side running?
  • If the native side accesses memory allocated by the JVM, are the Java-side references reachable throughout the runtime of the native code? Otherwise the latter may try to access memory that's already been garbage-collected.
  • Is the JVM properly initialized? Depending on the expected data split between Java and native size of the programs, the max heap size will need to be properly sized with respect to the overall max process size.
SWIG is actually pretty good at keeping track of C++ allocations caused by Java, in that Java classes wrapping C++ ones are generated with a finalize method that calls operator delete on the pointer to the wrapped C++ instance. Therefore garbage collection will help avoid leaks (with the usual caveat that there are no guarantees on when finalizers are run). There are some subtleties to take into account, see the link above for more.

In our case, the chunks of memory of interest are:
  1. The parameter blocks, altogether comprising the state vector optimized by Ceres.
  2. Data used to compute the cost function for each residual block.
  3. The Jacobian matrices.
We really have no control on (3), short of rewriting big chunks of Ceres code, so storage for the Jacobians will be C++ owned, and we'll need to copy its content when required for the computation of the cost function derivatives JVM-side.

For (2) the design choice is a no brainer: these data are only "felt" by Ceres indirectly through the CostFunction interface, but they are not actually accessed by the library itself. Therefore there is really no reason for shifting their ownership to C++: the JVM can keep them just fine, thank you very much and see y'all later. 

We have some design freedom for (1), but it is rather easy to see that having that memory held by Java leads to significant code complexity for little gain. The parameter blocks need to be manipulated at a low level by the Ceres solver directly through pointers. If their memory were held by Java we'd need a shim of C code, including JNI env calls to GetArrayLength, GetDoubleArrayElements, to get in C++ the pointers to the Java arrays. In some JVM implementations these actually cause memory to be duplicated, but the saner ones prefer to "pin" the memory buffers to prevent the garbage collector from touching them. The logical place for such shims would be in the wrappers for Ceres's Problem::AddParameterBlock and Problem::AddResidualBlock. So we'd need to either explicit subclass Problem in C++ for the purposes of this port, or wrap it in a Decorator, or take advantage of SWIG's  %extend mechanism (which amounts to the same thing). However, we'd also need to release the Java memory from the C side every time we call the cost function, as well as when the solver is done, through a JNI env call to ReleaseDoubleArrayElements. If we don't do it, Java would either never "see" the result of the solver updates to the state (if the JVM copied the buffers), or never be able to GC the parameter blocks if they were pinned (thus leaking memory). Therefore we'd need another C shim, to be called during the optimization and at its end. The shim would need to be wrapped in Java, and there really isn't an obvious place for it. For the optimization end part, we could perhaps place it in the destructor of our Problem subclass (to be called by the finalizer of its Java wrap), but for this to work the subclass instance would need to keep track of all the parameter blocks it has handled, and this bookkeeping would further complicate the implementation of the AddParameterBloc/AddResidualBlock overloads. 

All in all, it seems much saner to let C++ own the parameter block memory, and just copy it out to Java when we are done. SWIG offers a nice helper for this approach in its carrays.i library: it allows one to allocate an array C-side and get an opaque pointer-like Java object for it, plus methods for reading from and writing to it. These methods cross the JNI boundary, so they are a not-so-cheap addition to the cost function evaluation, but this seems an acceptable price to pay at least for now.

So our design will have:
  1. Parameter blocks: C++ owned, copied to JVM upon cost function evaluation and after a solver's run.
  2. Cost function data: JVM-owned.
  3. Jacobians: C++ owned.
That's it for today. With this decision we are ready to attempt our first Ceres wrap.

Monday, December 28, 2015

On calling Ceres from Scala - 2.0: Swiggin' it

This is the second installment of a series documenting my progress on porting the Ceres solver API to Scala. All my sources quoted below are available on github.

Swiggin' callbacks


Using SWIG to build most (hopefully all) the JNI wrapping code used to call Ceres from the JVM is a choice that hardly needs justification. To wit, its advantages are:
  • Decently well-thought and documented API that tries hard to help the coder avoid common pitfalls.
  • Ogles of helper methods to address common use cases.
  • Minimal configuration required, reduces maintenance work for keeping wrappers in synch with the wrapped library.
Disadvantages include the possible generation of underperforming code. However I believe that, with due care, the wrap interfaces can be kept minimalistic enough to be essentially optimal by design (i.e. near the limitations of the JNI/JVM interface and the non-native user application code running in the JVM).

SWIG is well documented and fairly easy to use in practice. Our case, however, needs a couple of its "advanced" features, due to the need cross the JNI boundary in both directions. This need arises because we want to write our models (and cost functions) in Scala, calling Ceres in a two-step workflow:
  • First, problem specification. We want to give Ceres some Scala objects - the residual terms - that know how to compute some cost terms (and perhaps their derivatives) for some putative values of the unknowns.
  • Second, we run Ceres's C++ optimization code. This in turn will call the Scala residual term computations iteratively, until some optimum set of parameters is (hopefully) found.
So we need to cross the JNI boundary in both directions:
  • In the first step we call Scala  C++ to configure the native optimizer so it can use the Scala residual terms. 
  • Then we call Scala → C++ to start the optimizer.
  • Then the optimizer calls back C++ → Scala to compute the residuals.
  • The residual computation themselves may or may not call Scala → C++ to access some Ceres utilities.
Going from Scala to C++ with SWIG is very easy. Going the other way is easy modulo some wading trough the docs. The SWIG facility to use for this use case is directors.  Ultra-simplifying, directors make Java classes behave like subclasses of C++ classes. The subclass behavior includes, crucially, resolution of virtual methods. 

In our case, we can declare an abstract C++ class that implements the residual term virtual interface as expected by Ceres. We then run SWIG to create an appropriate wrapper/director for it, and extend it in Scala to actually implement the computation. 

In practice


To make it concrete, let's code a toy residual term and evaluator library, consisting of:

  • A ResidualTerm abstract base class that assumes residuals are evaluated by its call operator, with the result returned into its by-reference second argument.
  • A ResidualEvaluator class that holds a collection of concrete ResidualTerms, with a method to register new ones, and an "eval" method to compute the sum of all residual terms at an input point x.
Let's wrap it using the following SWIG script (see comments herein for explanation):



Running the swig(1) command on it generates both a residuals_wrap.cxx C++ interface, and Java sources to call it. The former are compiled into a DLL along with the actual sources, the latter are java-compiled into class files, ready to be used by our Scala implementation to follow:


Here I added a Test executable to exercise the whole thing. Note how the wrapped classes behave quite naturally - e.g. the evaluator's AddResidualTerm is called in a foreach callback. The only really quirky item is the use of the pointer-wrapping class "SWIGTYPE_p_double" to wrap the by-reference output argument of the cost Residual. This too could be finessed using a SWIG typemap, or even just a rename.

Running yields the expected output:

$ scala -cp classes org.somelightprojections.skeres.Test
Creating/adding residual terms
Evaluating at 10.0
Computed residuals[0]=7, total=7
Computed residuals[1]=5, total=12
Total residual = 12.0

That's it for today. All code is available in the sandbox section of the skeres github repo.

Saturday, November 28, 2015

On calling Ceres from Scala - 1.0: What / Why / Where

What?

This is the first item of a multipart series on my experience of porting the Ceres Solver Library to the Scala programming language. This is still a work in progress, and I intend to document here my work notes, design decisions, etc., in the hope that someone else may find them useful.

Here and afterwards, this "port" is meant to include at least a few separate categories of code:
  • JNI interfaces allowing the call of the Ceres library itself from the JVM. This involves the specification of the type and controlling parameters for the solver to be run, as well as the data to be used
  • Client-side Scala code called by the solver itself to evaluate cost functors and its derivatives during execution.
  • "Utility" Scala code to inspect/verify a solver specification and solution run
  • Unit and integration tests for all the above.

Why?

First, because it's fun (at least for a nerd like myself). 

Second, it's a shame that a nice programming language like Scala is not more widely used by the computer vision / photogrammetry community for use cases (and they are the majority) where speed is not of paramount concern. I consider the usual alternative for writing quick prototypes, Python, far inferior to Scala in situation where a move from prototype to production code seems likely. 

So, this work is in pursuit of a hobby, but with the ambition of making something useful for the Scala and vision communities at large.

Where?

All the code will be posted to github. Stay tuned for more