C++20 Compile-time Metamath Proof Verification using C'est
This post introduces C'est, a non-standard version of the C++20 standard library, with enhanced support for compile-time evaluation. The C'est library is also shown applied to the porting of an existing C++ Metamath database verifier for compile-time execution.
C++20 includes support for dynamic memory allocation within constant
expressions. GCC 10 and Clang 10 both support the feature, originally one of
seven proposals included under "Relaxations of constexpr restrictions":
P1064, P1002,
P1327, P1330,
P1331, P1668 and
P0784. Of these, P0784 (More constexpr
containers), includes details on "constexpr new-expressions". The following
example demonstrates basic usage, and will compile and run with the
-std=c++2a
flag (or /std:c++latest
in MSVC):
#include <cassert>
constexpr bool all_ok()
{
int *p = new int{42};
bool b = 42 == *p;
delete p;
return b;
}
int main(int argc, char *argv[])
{
assert(all_ok());
static_assert(all_ok());
return 0;
}
Of course, many parts of C++ are still unavailable within constant expressions:
variables of non-literal type; static
variables in constexpr
functions;
reinterpret_cast
; an invocation of a non-constexpr
function; to list just a
few. Nevertheless, the inclusion of dynamic memory allocation in constant
expressions fires the imagination and feels like a step change. Complex
compile-time programs can be less ad-hoc, ornate and fragile. The opportunity
presents itself to port substantial, pre-existing, idiomatic C++ programs for
execution at compile-time.
The movement is somehow reminiscent of the introduction of NVIDIA's CUDA. In 2007 CUDA allowed developers with less focus on graphics programming and shaders to use C/C++ code on GPUs. C++20 similarly allows those averse to template metaprogramming an avenue to transport realistic C++ programs, with loops, lambdas, new-expressions, etc., to the realm of compile-time execution. There is furthermore a similar requirement for deft syntactic placement of function specifiers: just as __device__
is required with CUDA, or __restrict(amp)
with C++AMP, so is constexpr
required in C++20. In contrast, Khronos SYCL requires no such annotations.
As more language components are permitted within constant expressions, so
support broadens within the standard library. For example, all methods of
std::array
are constexpr
; as are most function templates within the
standard algorithms library. The following C++20 program will compile in GCC/Clang:
#include <array>
#include <algorithm>
#include <functional>
int main(int argc, char *argv[])
{
using namespace std;
static_assert(3==*for_each_n(begin(array{3,2,1}), 0, negate{}));
return 0;
}
Thanks to Louis Dionne's adopted proposals P0980 and
P1004, C++20's support for dynamic memory allocation
within constant expressions enables constexpr
functionality for the venerable
std::vector
and std::string
of the standard library. Alas, neither is yet
available in libstdc++, libc++, or Microsoft's STL. Regardless, assuming a
package has a dependency on the standard library, it is as likely to rely on
other containers and objects; for example std::list
, std::deque
,
std::map
, std::unique_ptr
and std::cout
. Given the many compile-time
features now available at the language level, must we wait for C++2b before
experimenting with larger compile-time programs?
In considering wider compile-time support for the C++ standard library,
it is worth pausing to emphasise something that will not be possible, even
when P0980 and P1004 do
have implementation support. The code below shows three top-level constexpr
declaration attempts; each will fail to compile in C++20.
P0784 allows only for transient constexpr
allocations (fleeting, perhaps?): memory allocated within a constant
expression must be de-allocated by the end of it. While the first of these,
the double
pointer p
, makes this especially clear, each suffers from the
same failing: constexpr
memory allocated within each constant expression is
not de-allocated. Values calculated using dynamic allocation can for now be
saved into statically sized types, such as std::array
; while the propconst
qualifier of P1974 may provide a solution for C++23.
#include <vector>
#include <string>
constexpr double *p = new double{0.577215};
constexpr std::vector v{0,1,1,2,3,5,8};
constexpr std::string str = "rosebud";
A number of standard library containers must distinguish memory
allocation from value construction. Helpfully, all methods of the default
Allocator, std::allocator
, are qualified as constexpr
in C++20.
Nevertheless, an additional concern for compile-time container implementations
does arise. P0593 was also adopted in C++20, and
introduces the notion of implicit lifetime types. Feral, idiomatic codes
commonly use values of such elementary
types as soon as the memory has been allocated; i.e. without a
constructor call. This is now defined behaviour in C++20 runtime
programs … but not within constant expressions; where it will produce a
compilation error. Consequently, a call to std::construct_at
(or
std::allocator_traits<Alloc>::construct
) is necessary; a placement
new expression
is not an option as only
replaceable
(global) allocation functions can be used within constant expressions. A
destructor call (via std::destroy_at
) remains optional; destruction of a
value with an implicit lifetime type executes no code, but does formally end
its lifetime. The following code correctly constructs the 4th array element
before writing to it:
#include <memory>
constexpr void alloc_ok()
{
std::allocator<int> alloc;
int *p = alloc.allocate(4);
std::construct_at(&p[3]);
p[3] = 42;
alloc.deallocate(p,4);
}
Many common C++ programming idioms, including those within container
implementations, require equality comparison of memory addresses. Alas, GCC
produces a compilation error when comparing addresses from separate allocations
within a constant expression. GCC's __builtin_constant_p
predicate determines
if its argument is constant at compile-time. Comparing two pointers, of
compatible types, within a constant expression is only problematic for GCC when
they come from separate allocations. But, as they come from separate
allocations, we know that their address values should not be equal; and
__builtin_constant_p
can spot this. Consequently, an equality function
template can be defined. The operators of built-in types, including pointers,
cannot be overloaded; so operator==
is not an option. The C'est library does
not in fact go down this road; and is consequently only compatible with Clang.
The code below illustrates the idea anyway.
#include <cassert>
#include <type_traits>
template <typename T>
constexpr bool equal_to(const T& lhs, const T& rhs)
{
if constexpr (!std::is_pointer_v<T>)
return lhs==rhs;
if (std::is_constant_evaluated() && !__builtin_constant_p(lhs==rhs))
return false;
return lhs==rhs;
}
constexpr void ptr_compare()
{
int *p1 = new int{17}, *p2 = new int{17};
#ifdef __clang__
assert(p1!=p2);
#else
assert(!equal_to(p1,p2));
#endif
delete p1, delete p2;
}
int main(int argc, char *argv[])
{
assert((ptr_compare(),true));
static_assert((ptr_compare(),true));
return 0;
}
So then, the C'est project aims to provide a non-standard version of the C++
standard library, with enhanced support for compile time evaluation, for use in
research. Header files in C'est are named after those in the standard library:
where there was #include <vector>
, with C'est we have #include
"cest/vector.hpp"
. Classes and objects from the C'est library reside within
the cest
namespace; and consequently usage such as std::deque
would become
cest::deque
. The C'est library has incomplete support for the following class
templates and containers: vector
, string
, forward_list
, list
, set
,
map
, queue
, deque
, istringstream
, unique_ptr
and complex
. Given a
constexpr
container, most function templates from the algorithm and numeric
libraries can now also be used within a constant expression. To simplify the
porting of existing codes cout
, cerr
and associated streaming operators are
also available; currently they do nothing at compile time, while at runtime
they behave as usual. Please visit the repository at
https://github.com/pkeir/cest and give it
a whirl.
Metamath is a formal language for the specification
of mathematical proofs, accompanied by a program for their verification and
analysis. The Metamath website lists over a dozen proof verifiers. The C++
version listed there,
checkmm.cpp is written by Eric
Schmidt, and presented a tempting and pragmatic goal as a port for compile time
execution. The C++ source code is a single 1400-line C++ file, with the C++
standard library its only dependency; it uses the queue
, string
, set
,
deque
, vector
, pair
and map
containers; the algorithm library's
set_intersection
and find
; as well as cout
, cerr
and operator<<
.
Having completed the port, the following code will compile with no error;
indicating that the snippet of Metamath included, a declaration of 9 constant
symbols, is valid. Removing the dollar sign from the $c
constant symbol
keyword is enough to introduce a compilation error.
#include "ctcheckmm.hpp"
constexpr int basic_checkmm_test()
{
checkmm app;
cest::string txt = R"($c 0 + = -> ( ) term wff |- $.)";
return app.run("", txt);
}
static_assert(EXIT_SUCCESS == basic_checkmm_test());
Changes to the original C++ code naturally began with the addition of
the constexpr
qualifier to all functions. With static
data
prohibited in constexpr
functions, a simple class was introduced to allow
the static
data to exist instead as a non-static data member; existing functions became its members. Of course,
within the constant expression, input no longer comes from a Metamath file read
at runtime; instead a string
is initialised, and so too an istringstream
, as
the argument for an unchanged nexttoken(istream&)
function;
meanwhile at runtime, the program still constructs an ifstream
. Until
std::embed
of P1040 is adopted, a #include
directive is used to initialise a string
based on the contents of
the file indicated by a magic MMFILEPATH
preprocessor token.
Compile-time verification of larger Metamath databases can take
some time, and it is simplest to set Clang's -fconstexpr-steps
parameter to the maximum possible value of 2147483647
.
The final program is configured to compile-time verify the file specified
by MMFILEPATH
; while still accepting a file path at runtime via argv[1]
.
A suitable compilation command is:
clang++ -std=c++2a -fconstexpr-steps=2147483647 -I $CEST_INCLUDE main.cpp -DMMFILEPATH=peano.mm.raw
.
#include "ctcheckmm.hpp"
#define xstr(s) str(s)
#define str(s) #s
constexpr int run_checkmm()
{
checkmm app;
#ifdef MMFILEPATH
cest::string txt =
#include xstr(MMFILEPATH)
;
return app.run("", txt);
#else
return EXIT_SUCCESS;
#endif
}
int main(int argc, char ** argv)
{
if (argc != 2)
{
cest::cerr << "Syntax: checkmm <filename>" << cest::endl;
return EXIT_FAILURE;
}
static_assert(EXIT_SUCCESS == run_checkmm());
checkmm app;
int ret = app.run(argv[1]);
return ret;
}
The repository for the C'est library is at https://github.com/pkeir/cest. For ctcheckmm, the compile-time Metamath database verifier which uses C'est, see https://github.com/pkeir/ctcheckmm. The C'est project features contributions from Andrew Gozillon of the University of the West of Scotland; and Jules Penuchot of Université Paris-Saclay. Additional help from Hossein Haeri of Université Catholique de Louvain is gratefully received.