The OCaml programming language embodies a variety of languages: a core functional language, an object-oriented language, a module language, and a formatting language. As with any self-respecting compiler, the OCaml compiler is capable of bootstrapping itself, i.e. the compiler is capable of compiling its own source. A typical user of OCaml need not worry about bootstrapping, unless, you happen to be developing the OCaml compiler. Recently, the formatting language and the bootstrap process made me scratch my head — multiple times.
Though, first a bit of background: I am currently at OCaml Labs in Cambridge working with Leo White, KC Sivaramakrishnan, Stephen Dolan, and Anil Madhavapeddy on effect typing for Multicore OCaml. This work is based on previous work by White et al. . The main goal of this work is to switch from nominal effects to structural effects. The motivation for this is worth its own post (in other words I will write about this in a future post). However, to kick things off I began by rebasing Leo's previous work on top of the Multicore OCaml master branch.
A little technical background
The following two subsections provide very brief primers to the bootstrapping process and the format language.
A hard bootstrapping pass is required whenever you add or remove a primitive in the OCaml compiler source. Usually, the bootstrapping process is rather straightforward, the OCaml Makefile even provides a simple recipe for bootstrapping the compiler
# make coreboot [old system -- you were in a stable state] # <change the source> # make clean runtime coreall # <debug your changes> # make clean runtime coreall # make coreboot [new system -- now in a stable state]
make clean runtime coreall builds the base bootstrapping compiler including the standard library. The final command
make coreboot builds the actual bootstrapping compiler and thereby lifts any changes to the source into the compiler.
Typed format expressions
OCaml has built-in support for formatting which enables OCaml to type check format expressions which in turn means that OCaml can provide type-safe format expressions, e.g.
# let say = Printf.printf "%s %s%c";; val say : string -> string -> char -> unit = <fun>
For this code OCaml has inferred, based on the string
"%s %s%c", that
say is function that takes three arguments of types
char respectively, and produces a unit value, meaning that we can say
# say "Hello" "World" '!';; Hello World!- : unit = ()
While OCaml will not let us get away with providing an integer instead of a string, i.e.
# say "Hello" 0 '!';; Error: This expression has type int but an expression was expected of type string
Internally format expressions get compiled into the datatype
format6 whose definition is
and ('a, 'b, 'c, 'd, 'e, 'f) format6 = Format of ('a, 'b, 'c, 'd, 'e, 'f) fmt * string
fmt is an algebraic datatype which has a data constructor for every possible format modifier (such as
%S for strings). The code below is part of its definition
and ('a, 'b, 'c, 'd, 'e, 'f) fmt = | String : (* %s *) ('x, string -[!p]-> 'a) padding * ('a, 'b, 'c, 'd, 'e, 'f) fmt -> ('x, 'b, 'c, 'd, 'e, 'f) fmt | Caml_string : (* %S *) ('x, string -[!p]-> 'a) padding * ('a, 'b, 'c, 'd, 'e, 'f) fmt -> ('x, 'b, 'c, 'd, 'e, 'f) fmt ...
As you can imagine the whole handling of format expressions involve a fair bit of effectful code.
An effective format
The aforementioned rebasing went rather smoothly until I got to the commit which ascribes effect types to the whole standard library. The standard library comprises the Pervasives module, List module, Array module, and so forth, including the built-in formatting library.
Rather crucically this commit changes the definition of
format6 such that it takes an additional parameter
!p which is an effect variable
and ('a, 'b, 'c, 'd, 'e, 'f, !p) format6e = Format of ('a, 'b, 'c, 'd, 'e, 'f, !p) fmt * string
Note the commits also renames the datatype to
format6e (I am guessing e for effective).
Actually, just rebasing that particular commit was rather painless as there were only a few merge conflicts which were easy to fix, but doing it right turned out to be somewhat difficult. After the initial rebase the compiler could no longer bootstrap! Yikes! Immediate loss of all self-respect! The bootstrapping process failed at the second step with the following error
File "camlinternalFormat.ml", line 1854, characters 42-70: Error: This expression has type string but an expression was expected of type ('a -[!p]-> 'b, unit, string, 'c, 'd, 'e, !q wio) CamlinternalFormatBasics.format6e
The error arose while type checking the following innocuous looking code
let invalid_box () = failwith_message "invalid box description %S" str in
failwith_message expects a format string as its argument, and based on the shape of the string it generates the corresponding format function which in this particular case a function that takes a string as its only argument.
However, the error is a red herring as the cause of the problem has little to do that function. The problem is that there is a cyclic dependency between the type checker and the format definitions in the standard library. The type checker depends on format definitions for compilation of format expressions and therefore expects those definitions to have a particular shape. In turn the format definitions depend on the type checker for type checking. So the question is which should we build first?
The problematic code is in the type checker for the core language
| Pexp_constant(Const_string (str, _) as cst) -> ( (* Terrible hack for format strings *) let ty_exp = expand_head env ty_expected in let fmt6_path = Path.(Pdot (Pident (Ident.create_persistent "CamlinternalFormatBasics"), "format6", 0)) in let is_format = match ty_exp.desc with | Tconstr(path, _, _, _) when Path.same path fmt6_path -> if !Clflags.principal && ty_exp.level <> generic_level then Location.prerr_warning loc (Warnings.Not_principal "this coercion to format6"); true | _ -> false in
The name of the
format6e in the commit) datatype is hardwired into the compiler which is used to determine whether a string literal is format expression. The rather telling comment should have alerted us of a code hazard coming up ahead.
An effective hack
I tried a variety of build strategies to get the compiler to build. My common approach was to separate out the problematic parts: the type checker and the format library. I tried carefully staging the changes to each part by building the base compiler one component at a time without the new standard library. However, this approach fails when you get to stage where you have to lift changes into the bootstrapping compiler as its construction depends on the standard library which now does not type check. I tried various variations of this strategy, but with no luck.
Meanwhile Leo had an eureka. How about we counter one hack with another hack? Leo suggested that we allow both datatypes
format6e simultaneously. Thus, I made
fmt6e_path, an alternative version
fmt6_path, which would point to the new
format6e datatype and extended the case guard accordingly. The code fragment below contains both changes (annotated with
(* change *)).
| Pexp_constant(Const_string (str, _) as cst) -> ( (* Terrible hack for format strings *) let ty_exp = expand_head env ty_expected in let fmt6_path = Path.(Pdot (Pident (Ident.create_persistent "CamlinternalFormatBasics"), "format6", 0)) in let fmt6e_path = (* change *) Path.(Pdot (Pident (Ident.create_persistent "CamlinternalFormatBasics"), "format6e", 0)) in let is_format = match ty_exp.desc with | Tconstr(path, _, _, _) when Path.same path fmt6_path || Path.same path fmt6e_path -> (* change *) if !Clflags.principal && ty_exp.level <> generic_level then Location.prerr_warning loc (Warnings.Not_principal "this coercion to format6"); true | _ -> false in
This hack allows
format6e names to coexist — this is only intended as a temporary hack while bootstrapping the compiler. Still some care is required to build the compiler. The bootstrapping process still fails if we try to apply the standard bootstrapping recipe at this stage. Rather I proceeded as follows
- Patch the type checker to include
- Build base system without effect types in the standard library via
make clean runtime coreall.
- Lift the patched typechecker into the compiler via
- Cherry pick Leo's commit which ascribes effect types to the standard library.
- Revert the type checker to my patched version.
- Build the base system with effect types in the standard library via
make clean runtime coreall.
- Patch the type checker to use the new
- Lift the changes into the compiler via
After step 8 I had a bootstrapping compiler with an effect-typed standard library which I could use to build a native version with effect typing enabled. Success!
On one hand compiler bootstrapping can be somewhat mind bending. In particular when your dependency graph has cycles. On the other hand, going through this process gave me a lot of insight into the more delicate and intricate details of compiler bootstrapping.
I am hoping to have a working prototype with structural effect typing by the end of next week.
 L. White, S. Dolan, M. Pretnar, and K. Sivaramakrishnan, “Effective programming: Bringing algebraic effects and handlers to OCaml.” Keynote at HOPE, Nara, Japan, 2016.