C++20 Compile-time Metamath Proof Verification using C'est

Published: by Creative Commons Licence

This post introduces C'est, a non-standard version of the C++20 standard library, with enhanced support for compile-time evaluation. The C'est library is also shown applied to the porting of an existing C++ Metamath database verifier for compile-time execution.

C++20 includes support for dynamic memory allocation within constant expressions. GCC 10 and Clang 10 both support the feature, originally one of seven proposals included under "Relaxations of constexpr restrictions": P1064, P1002, P1327, P1330, P1331, P1668 and P0784. Of these, P0784 (More constexpr containers), includes details on "constexpr new-expressions". The following example demonstrates basic usage, and will compile and run with the -std=c++2a flag (or /std:c++latest in MSVC):

#include <cassert>

constexpr bool all_ok()
{
  int *p = new int{42};
  bool b = 42 == *p;
  delete p;

  return b;
}

int main(int argc, char *argv[])
{
  assert(all_ok());
  static_assert(all_ok());

  return 0;
}

Of course, many parts of C++ are still unavailable within constant expressions: variables of non-literal type; static variables in constexpr functions; reinterpret_cast; an invocation of a non-constexpr function; to list just a few. Nevertheless, the inclusion of dynamic memory allocation in constant expressions fires the imagination and feels like a step change. Complex compile-time programs can be less ad-hoc, ornate and fragile. The opportunity presents itself to port substantial, pre-existing, idiomatic C++ programs for execution at compile-time.

The movement is somehow reminiscent of the introduction of NVIDIA's CUDA. In 2007 CUDA allowed developers with less focus on graphics programming and shaders to use C/C++ code on GPUs. C++20 similarly allows those averse to template metaprogramming an avenue to transport realistic C++ programs, with loops, lambdas, new-expressions, etc., to the realm of compile-time execution. There is furthermore a similar requirement for deft syntactic placement of function specifiers: just as __device__ is required with CUDA, or __restrict(amp) with C++AMP, so is constexpr required in C++20. In contrast, Khronos SYCL requires no such annotations.

As more language components are permitted within constant expressions, so support broadens within the standard library. For example, all methods of std::array are constexpr; as are most function templates within the standard algorithms library. The following C++20 program will compile in GCC/Clang:

#include <array>
#include <algorithm>
#include <functional>

int main(int argc, char *argv[])
{
  using namespace std;
  static_assert(3==*for_each_n(begin(array{3,2,1}), 0, negate{}));
  return 0;
}

Thanks to Louis Dionne's adopted proposals P0980 and P1004, C++20's support for dynamic memory allocation within constant expressions enables constexpr functionality for the venerable std::vector and std::string of the standard library. Alas, neither is yet available in libstdc++, libc++, or Microsoft's STL. Regardless, assuming a package has a dependency on the standard library, it is as likely to rely on other containers and objects; for example std::list, std::deque, std::map, std::unique_ptr and std::cout. Given the many compile-time features now available at the language level, must we wait for C++2b before experimenting with larger compile-time programs?

In considering wider compile-time support for the C++ standard library, it is worth pausing to emphasise something that will not be possible, even when P0980 and P1004 do have implementation support. The code below shows three top-level constexpr declaration attempts; each will fail to compile in C++20. P0784 allows only for transient constexpr allocations (fleeting, perhaps?): memory allocated within a constant expression must be de-allocated by the end of it. While the first of these, the double pointer p, makes this especially clear, each suffers from the same failing: constexpr memory allocated within each constant expression is not de-allocated. Values calculated using dynamic allocation can for now be saved into statically sized types, such as std::array; while the propconst qualifier of P1974 may provide a solution for C++23.

#include <vector>
#include <string>

constexpr double *p = new double{0.577215};
constexpr std::vector v{0,1,1,2,3,5,8};
constexpr std::string str = "rosebud";

A number of standard library containers must distinguish memory allocation from value construction. Helpfully, all methods of the default Allocator, std::allocator, are qualified as constexpr in C++20. Nevertheless, an additional concern for compile-time container implementations does arise. P0593 was also adopted in C++20, and introduces the notion of implicit lifetime types. Feral, idiomatic codes commonly use values of such elementary types as soon as the memory has been allocated; i.e. without a constructor call. This is now defined behaviour in C++20 runtime programs … but not within constant expressions; where it will produce a compilation error. Consequently, a call to std::construct_at (or std::allocator_traits<Alloc>::construct) is necessary; a placement new expression is not an option as only replaceable (global) allocation functions can be used within constant expressions. A destructor call (via std::destroy_at) remains optional; destruction of a value with an implicit lifetime type executes no code, but does formally end its lifetime. The following code correctly constructs the 4th array element before writing to it:

#include <memory>

constexpr void alloc_ok()
{
  std::allocator<int> alloc;
  int *p = alloc.allocate(4);

  std::construct_at(&p[3]);
  p[3] = 42;

  alloc.deallocate(p,4);
}

Many common C++ programming idioms, including those within container implementations, require equality comparison of memory addresses. Alas, GCC produces a compilation error when comparing addresses from separate allocations within a constant expression. GCC's __builtin_constant_p predicate determines if its argument is constant at compile-time. Comparing two pointers, of compatible types, within a constant expression is only problematic for GCC when they come from separate allocations. But, as they come from separate allocations, we know that their address values should not be equal; and __builtin_constant_p can spot this. Consequently, an equality function template can be defined. The operators of built-in types, including pointers, cannot be overloaded; so operator== is not an option. The C'est library does not in fact go down this road; and is consequently only compatible with Clang. The code below illustrates the idea anyway.

#include <cassert>
#include <type_traits>

template <typename T>
constexpr bool equal_to(const T& lhs, const T& rhs)
{
  if constexpr (!std::is_pointer_v<T>)
    return lhs==rhs;

  if (std::is_constant_evaluated() && !__builtin_constant_p(lhs==rhs))
    return false;

  return lhs==rhs;
}

constexpr void ptr_compare()
{
  int *p1 = new int{17}, *p2 = new int{17};

#ifdef __clang__
  assert(p1!=p2);
#else
  assert(!equal_to(p1,p2));
#endif

  delete p1, delete p2;    
}

int main(int argc, char *argv[])
{
  assert((ptr_compare(),true));
  static_assert((ptr_compare(),true));

  return 0;
}

So then, the C'est project aims to provide a non-standard version of the C++ standard library, with enhanced support for compile time evaluation, for use in research. Header files in C'est are named after those in the standard library: where there was #include <vector>, with C'est we have #include "cest/vector.hpp". Classes and objects from the C'est library reside within the cest namespace; and consequently usage such as std::deque would become cest::deque. The C'est library has incomplete support for the following class templates and containers: vector, string, forward_list, list, set, map, queue, deque, istringstream, unique_ptr and complex. Given a constexpr container, most function templates from the algorithm and numeric libraries can now also be used within a constant expression. To simplify the porting of existing codes cout, cerr and associated streaming operators are also available; currently they do nothing at compile time, while at runtime they behave as usual. Please visit the repository at https://github.com/pkeir/cest and give it a whirl.

Metamath is a formal language for the specification of mathematical proofs, accompanied by a program for their verification and analysis. The Metamath website lists over a dozen proof verifiers. The C++ version listed there, checkmm.cpp is written by Eric Schmidt, and presented a tempting and pragmatic goal as a port for compile time execution. The C++ source code is a single 1400-line C++ file, with the C++ standard library its only dependency; it uses the queue, string, set, deque, vector, pair and map containers; the algorithm library's set_intersection and find; as well as cout, cerr and operator<<. Having completed the port, the following code will compile with no error; indicating that the snippet of Metamath included, a declaration of 9 constant symbols, is valid. Removing the dollar sign from the $c constant symbol keyword is enough to introduce a compilation error.

#include "ctcheckmm.hpp"

constexpr int basic_checkmm_test()
{
  checkmm app;
  cest::string txt = R"($c 0 + = -> ( ) term wff |- $.)";
  return app.run("", txt);
}

static_assert(EXIT_SUCCESS == basic_checkmm_test());

Changes to the original C++ code naturally began with the addition of the constexpr qualifier to all functions. With static data prohibited in constexpr functions, a simple class was introduced to allow the static data to exist instead as a non-static data member; existing functions became its members. Of course, within the constant expression, input no longer comes from a Metamath file read at runtime; instead a string is initialised, and so too an istringstream, as the argument for an unchanged nexttoken(istream&) function; meanwhile at runtime, the program still constructs an ifstream. Until std::embed of P1040 is adopted, a #include directive is used to initialise a string based on the contents of the file indicated by a magic MMFILEPATH preprocessor token. Compile-time verification of larger Metamath databases can take some time, and it is simplest to set Clang's -fconstexpr-steps parameter to the maximum possible value of 2147483647. The final program is configured to compile-time verify the file specified by MMFILEPATH; while still accepting a file path at runtime via argv[1]. A suitable compilation command is: clang++ -std=c++2a -fconstexpr-steps=2147483647 -I $CEST_INCLUDE main.cpp -DMMFILEPATH=peano.mm.raw.

#include "ctcheckmm.hpp"

#define xstr(s) str(s)
#define str(s) #s

constexpr int run_checkmm()
{
  checkmm app;

#ifdef MMFILEPATH
  cest::string txt =
#include xstr(MMFILEPATH)
;
  return app.run("", txt);
#else
  return EXIT_SUCCESS;
#endif
}

int main(int argc, char ** argv)
{
  if (argc != 2)
  {
    cest::cerr << "Syntax: checkmm <filename>" << cest::endl;
    return EXIT_FAILURE;
  }

  static_assert(EXIT_SUCCESS == run_checkmm());

  checkmm app;
  int ret = app.run(argv[1]);

  return ret;
}

The repository for the C'est library is at https://github.com/pkeir/cest. For ctcheckmm, the compile-time Metamath database verifier which uses C'est, see https://github.com/pkeir/ctcheckmm. The C'est project features contributions from Andrew Gozillon of the University of the West of Scotland; and Jules Penuchot of Université Paris-Saclay. Additional help from Hossein Haeri of Université Catholique de Louvain is gratefully received.