Interfacing Fortran with OCaml

While OCaml is still one of the best overall programming languages, it is syntactically slightly inconvenient and gets a bit slow for tight numerical routines, especially if they are written in a functional style. Written in imperative style, compiled Ocaml numerical code usually is 2-3 times slower than single-core C,C or Fortran, and therefore usable and much faster than Python or Cython, but optimized, CPU-tuned Fortran can be faster and uses less memory. Fortran, with its native support for arrays of any reasonable rank, built-in bounds checking, vectorized operations and rich intrinsics remains the best programming language for such code. But it remains cumbersome as a general-purpose or systems programming language.

Using CTypes it is possible to easily call Fortran routines from OCaml, provided a few compilation and linking tricks described here are known. This combination of OCaml and Fortran provides, in my opinion, the best small-to-medium scale platform for production-quality scientific computing.

While writing OCaml bindings to call C isn't too hard, it's easy to make mistakes and it's a repetitive procedure. If one wants to call Fortran code, then one would have to write C stub code to be called from OCaml, that would call Fortran. This is three languages to deal with, which becomes annoying. Using CTypes we don't have to write any stub code.

This approach can also be used to get access to LAPACK routines for which Lacaml bindings are not availble, such as dgglse .

First, any serious numerical work in OCaml (such as processing remote sensing data) requires the use of Bigarrays .

These are arbitrary rank arrays of unboxed primitive data types such as single- and double-precision real and complex numbers, or integers of 16, 32 or 64-bit precision. Indexing syntax is provided for arbitrary rank, and simplified routines are provided up to rank 3. The array layout can be Fortran-like, in which case the data is stored in column-major order and indices start at one, or C-like, in which case it's row-major and starts at zero.

Let's say we want a simple routine that takes a vector of point coordinates given as an (m,3) double array and calculates the minimum distance between any pair of points using a naive O(nĀ²) algorithm. (If we were really serious about this problem, we would use a proper algorithm such as an octree structure or one of the complicated 3D Voronoi algorithms.) Here's a Fortran 90 implementation suitable for calling from OCaml via CTypes:

module accel
  use iso_c_binding, only: c_int, c_double
  use, intrinsic :: iso_fortran_env
  use, intrinsic :: ieee_arithmetic
  implicit none
  integer, parameter :: dp = c_double
  subroutine min_dist(m,a,d_min,i_min) bind(C,name="accel_min_dist")
    integer(c_int), value :: m
    real(dp), intent(in) :: a(m,3)
    real(dp), intent(out) :: d_min
    real(dp) :: d2_min,d2
    integer(c_int), intent(out) :: i_min(2)
    integer :: i1, i2, j
    d2_min=ieee_value(d2, ieee_positive_inf)
    i_min(1:2) = -1
    do i1=1,m
       do i2=i1+1,m
          do j=1,3
          end do
          if (d2 < d2_min) then
             d2_min = d2
             i_min(1) = i1
             i_min(2) = i2
          end if
       end do
    end do
  end subroutine min_dist
end module accel

This goes into a file called accel.f90 .

The min_dist(m,a,d_min,i_min) function takes as inputs the number of points m , the coordinate array a which must have m rows and 3 columns. Note that we have used iso_fortran_env to get the proper Fortran real kind c_double that matches C doubles and therefore the Bigarray float64 kind. As outputs it puts the minimum distance into d_min and the row numbers of a pair of points having that minimum distance in i_min . It does not make use of the norm2 Fortran intrinsics.

Since Fortran, by default, passes everything by reference, you have to use the attribute value on arguments unless you want to box and debox them from your little OCaml wrapper.

To call this routine from OCaml, we define a module Accel that contains a min_dist function:

open Bigarray
open Ctypes
open Foreign

external accel_min_dist_link : unit -> unit = "accel_min_dist"

type da2 = (float, float64_elt, fortran_layout) Array2.t

let min_dist =
  let f = foreign "accel_min_dist"
    (int @-> ptr double @-> ptr double @-> ptr int @-> returning void)
  fun np (a : da2) ->
  let d_min = allocate double 0.0 in
  let i_min = allocate_n int ~count:2 in
  let a_c =
      (genarray_of_array2 a)
  let a' = bigarray_start array2 a_c in
  f np a' d_min i_min;
  !@d_min, !@ i_min, !@ (i_min +@ 1)

At module initialization, it will call Ctypes to resolve accel_min_dist and then create a function that takes the number of points np and the point array a (this way a doesn't have to be competely filled).

The type signature int @-> ptr double @-> ptr double @-> ptr int @-> returning void must match that of the Fortran subroutine.

Note that the return values are passed by reference. Therefore one double, and two integers are allocated with allocate and allocate_n . Their contents are retrieved using the !@ and +@ Ctypes operators.

Did you notice that we are not using the value accel_min_dist_link ? It is included nonetheless to prevent the linked from garbage-collecting the routine which will be dynamically resolved at run-time via CTypes. I've heard there are some linker flags to do it, but it seems that this works as well.

Another thing is that we are using Genarray.change_layout . This is becaus CTypes only knows about C-layout bigarrays. This change_layout function creates a new Bigarrray proxy sharing the data but presenting the indices in C layout, which makes CTypes happy. The Fortran code doesn't need to know about this trick, and the indices in Fortran and in OCaml are the same, 1-based, column-major Fortran indices.

Now, we need an OCamlbuild rule for compiling Fortran code. In place the following after (* OASIS_STOP *) :

open Ocamlbuild_plugin

let () =
    (fun hook ->
     dispatch_default hook;
     match hook with
     | After_rules ->
	flag ["ocaml"; "link"; "program"] & S[A"-ccopt";A"-Wl,-E"];

        rule "F90 to object"
	     (fun env _ -> Cmd(S[P"gfortran";
                                 A(env "%.f90")]));
     | _ -> ())

This adds a rule for producing .o files from .f90 files using optimization features, but with array bounds checking.

Finally, let's say your program is called foo ; then your _oasis section should look like this:

Executable foo
  Path:       .
  CompiledObject: best
  CSources:     accel.c
  BuildDepends: ctypes.foreign
  CCLib:        -lgfortran

Notice two tricks. The first one is that we are listing our Fortran with a .c suffix even if the file has a .f90 suffix. This seems to work with the current version of Oasis. The other trick is that you need Fortran's runtime libraries; in this case since we're using gfortran it's -lgfortran .

That's it.

I've been using this quite happily with both 4.01.0 and 4.05.x in production code (at GHGSat) and personal code and I haven't had issues so far.