Sparks

C++20 Coroutines: a short series

4: Implementing simple generator coroutines in C++20

Coroutines: Basics, Py->C++20, Benchmarking

This is a mini-series leading up to implementing coroutines in C++20, (“what I learnt on my holiday”). It also has comparison with other implementations, as part of my ModernCPP explorations. (first post)

In a previous post I showed some sample generators in C++ and noted that you could not know what they really did without knowing about the implementations. In the last post, I discussed the implementation in C++03, so naturally this post discusses the C++20 implementation.

Implementation

The implementation discussed here is in my “blog-extras” repo on github, and specifically the base implementation can be found here:

Basic usage of this can be found here:

And here:

Implementation Overview

This is more complex than C++03 or python generators, so I’ll break in down in parts:

  • Templated to allow changing the return type
  • An external API for using the generator
  • An internal API for the compiler’s generated code to call when managing/running the coroutine
  • An external API for use in for loops.

In more detail:

  • Templated to allow changing the return type
  • Provides an external API for things that are simple_generators to use:
    • constructors/destructors (normal, move)
    • start() - to start the generator
    • try_next() - to give the generator some CPU time
    • take() - to get the last returned value
    • running() - to confirm it’s running/runnable
  • An internal API for the compiler to call when certain things happen in the coroutine body, including:
    • How to construct it
    • What to do when the coroutine is started.
    • What to do when a co_yield statement is run.
    • What to do if you drop off the bottom of the coroutine
    • What to do if co_return is called.
  • An external API designed to be used by the ranges format of for, to allow for the Modern C++ style of iteration through a thing.

The fact that these 3 APIs are a bit intermingled can be confusing upfront - most tutorials I’ve seen don’t seem to look at it from this perspective.

Out of these:

  • The external API you provide is up to you. If you want a particularly wacky style of API, you can do this. The fact there isn’t a standard user API (like python’s) I feel is problematic. There’s one coming in C++23 apparently. I’ve not looked into details yet though because compiling C++20 can be challenging depending on the features you need, let alone C++23.

  • The internal API is well defined and fixed. You have to implement this, and it has a predetermined name - promise_type . This is incredibly badly named. A better name would be “coroutine_frame_state_machine” or similar.

  • The external API support for ranges/iterator protocol is well defined and mercifully brief.

High Level C++20 Skeleton

The high level skeleton for our code looks like this:

template<typename T>
class simple_generator {
public:
    class promise_type; // Forward decl - to be defined lower inside this class
    using handle_type = std::coroutine_handle<promise_type>;
private:
    handle_type mCoro;

public:
    // External API implementation

    class promise_type { } // Internal coroutine API
    class iterator { } // Actual iterator impl
    // iterator external API
    // TBD - see below
}

Of note:

  • We must have an internal structure called (or findable as) promise_type. If you don’t the compiler complains, exits and has an identity crisis.

  • You have to use that handle_type in lots of places. You can call it what you like, but something shorter than it’s full name is a blessing. Most people tend to use something like handle_type as far as I can see.

  • When your generator is created, via a bounce, you are given a handle to the frame for your coroutine. This is of type handle_type, and you interact with your coroutine using this. So you need to have a place to store one. Here I call it mCoro.

Constructors / etc - Basics needed to function

You don’t really have many options here, so we’ll cover them here.

    explicit simple_generator(handle_type h) : mCoro(h) {}

As noted, when someone uses your coroutine, your coroutine gets created. Because the compiler detects the co_yield, co_await or co_return keywords, it knows it needs to construct a co-routine handle (which uses your promise type) and uses that to initialise your coroutine. As a result, this is why you end up with this odd looking initialiser. (odd, because you never trigger this call this directly)

The next two are the move constructor and move assignment constructor. These are pretty standard fare at this point:

    simple_generator(simple_generator &&other_sg) noexcept : mCoro(other_sg.mCoro) {
        other_sg.mCoro = nullptr;
    }

    simple_generator &operator=(simple_generator &&other) noexcept {
        if (this != other) {
            mCoro = other.mCoro;
            other.mCoro = nullptr;
            return *this;
        }
    }

If you don’t understand what’s happening here, please go off and read up on move constructors and move assignment constructors. (We’ll be here when you’re back.) This stuff is necessary if you want to do something like move coroutines into a vector or queue for example. (think: run queue)

We want to remove copy constructors, because things will not work the way you want if you leave them there:

    simple_generator(const simple_generator &) = delete;
    simple_generator &operator=(const simple_generator &) = delete;

Lastly we can’t just use the default destructor, since we need some explicit shutdown type code:

    ~simple_generator() { if (mCoro) { mCoro.destroy(); } }

So far so good/normal.

External API - used by people using these generators

So let’s start with the external API that we want to provide. The decisions we make here will affect everything else. Bear in mind, I didn’t develop this API from this approach, but derived it from experimentation, fiddling, thinking and refactoring.

First off some design choices. When reviewing the C++20 coroutine interface, I initially started doing so thinking like a python programmer. (hence why my “old” coroutine interface copies python and uses a StopIteration exception). When I stopped and rethought about it, from a C++ perspective it made more sense. There are some platforms (not many) where exceptions are problematic.

So where, in python, you might write this…

    try:
        x = next(f)
    except StopIteration:
        print("Done")

… because this matches python’s norm of “Better to Ask forgiveness”, you would normally try to avoid forcing exceptions onto your user in C++20 - even if exception handling is the recommended approach in Modern C++. The reason for this, because in C++ the default is normally “look before you leap”. The equivalent approach is something like this:

    if (f.running() ) {
        f.try_next();
        x = f.take();
    } else {
        std::cout << "Done\n");
    }

So this gives us our basic external API:

  • void start() - Function just to make it clearer we’re starting the generator. Actually calls try_next()
  • void try_next()- resumes mCoro - which is the actual thread of control. We also rethrow any exception raised within the coroutine to avoid masking errors.
  • bool running() - returns the opposite of mCoro.done - which gets set by the compiler generated code when the coroutine can’t be resumed.
  • T take() - Returns/moves the last value yielded back to the caller.

The implementation of this external API is relatively brief.

    // Implementation of the external API called by the user to actually use the generator
    void start() {  try_next(); }
    bool running() { return not mCoro.done(); }
    void try_next() {
        mCoro.resume();
        if (mCoro.promise().m_latest_exception)
            std::rethrow_exception(mCoro.promise().m_latest_exception);
    }
    T take() { return std::move(mCoro.promise().m_current_value); }

We don’t really need a start() method, but it does make things clearer.

Internal API to manage the coroutine frame - promise_type

The questions here are:

  • What is mCoro - how is it created?
  • What’s with this promise() method?
  • Where do the attributes m_latest_exception and m_current_value get set?

Well, these are all handled by the compiler creating code that calls something you create called a promise_type object. This is badly named, but really implements functions within a simple state machine to control activities during a coroutine. The reason for this is because C++20 coroutines leave it up to you how they operate. This is both useful, but also annoying if you just want to use them.

The promise_type handles

Rather than dive into the implementation of promise_type, let’s step back to our fibonacci and see what needs handling. I’ll annotate those points here:

simple_generator<int>  // NOTE: Need to create this return object
fibs(int max)          // NOTE: Do we run immediately or wait?
{
    int a{1}, b{1}, n{0};
    for(int i=0; i< max; i++) {
        co_yield a;   // NOTE: How do we handle the yielded value?
        n = a + b;
        a = b;
        b = n;
    }
    // NOTE: What if an exception was thrown and not caught?
    // NOTE: What if they just did co_return?
    // NOTE: What if we reached the end normally?
}

Rather than provide you with default behaviours for these, you’re expected to pick an implementation and use that. So let’s provide an implementation.

Note, you can’t just ignore this - there are no defaults. If you try, the compiler whinges at you that you’ve missed off bits. (I tested this by starting with an empty promise_type and adding pieces as necessary)

The promise_type skeleton

Let’s taking this API a piece at a time. First of all the skeleton of the promise_type itself. This sits within our simple_generator class:

    // Implementation of the internal API called when co_yield/etc are triggered inside the coroutine
    class promise_type {
        T m_current_value;
        std::exception_ptr m_latest_exception;
        friend simple_generator;
    public:
        // TBD - Note - all these methods need to be public
        ...
    }

promise_type and simple_generator initialisation

First, let’s look at the how the mCoro object gets created, as part of the initialisation process.

Specifically compiler looks for WHATEVER::promise_type::get_return_object in order to create a promise_type object to manage the coroutine stack frame. In our case it looks like this:

        // NOTE: Need to create this return object
        auto get_return_object()      { // Inside promise_type
            { return simple_generator{ handle_type::from_promise(*this) } ; }

This essentially calls the simple_generator constructor:

    explicit simple_generator(handle_type h) : mCoro(h) {}

This is where our mCoro object comes from. The type of this is handle_type. That’s an alias for std::coroutine_handle<promise_type>. This allows the external API to interact with the coroutine hande.

promise_type state machine management

Now let’s step through our simple_generator the promise_type methods needed and what they relate to. All these methods sit within promise_type.

The functionality to actually drive the coroutine is relatively short, so we’ll use the comments we added to the generator and place them next to the implementation. Note that next to each I’ve also annotated reponses to the notes from the generator.

        // NOTE: Need to create this return object
        auto get_return_object()      {
            { return simple_generator{ handle_type::from_promise(*this) } ; }

        // NOTE: Do we run immediately or wait?
        //     -> We wait for a resume
        auto initial_suspend()        { return std::suspend_always{}; }

        // NOTE: How do we handle the yielded value?
        //     -> We get passed a value, so we capture it
        //     -> We then wait to resume
        auto yield_value(T some_value)
            m_current_value = some_value;
            return std::suspend_always{};

        // NOTE: What if an exception was thrown and not caught?
        //     -> Catch it to allow rethrowing to the caller
        auto unhandled_exception()    { m_latest_exception = std::current_exception(); }

        // NOTE: What if they just did co_return?
        //      -> Should not resume after co_return
        auto return_void()            { return std::suspend_never{}; }

        // NOTE: What if we reached the end normally?
        //      -> If we suspend_never, this promise gets destroyed here
        //       *and* in our simple_generator definition (segfault)
        auto final_suspend() noexcept { return std::suspend_always{}; }
}

As a result, ultimately these methods control the statemachine. Some key points:

  • The compiler generates code that calls these callbacks as noted.
  • The generated code calls our yield_value coroutine when co_yield value is called, and provides us with a value we can capture in the promise object.
  • We can do the same to capture any exception thrown from within the coroutine.
  • The promise_type object is accessible via mCoro.promise(), which gives the external API access to this.

Control flow example

  1. simple_generator<int> fib = fibbonaci() Is called, but that actually causes

    • simple_generator::promise_type::get_return_object() is to be called
    • That creates an object of type std::coroutine_handle<simple_generator::promise_type>
    • And that uses simple_generator::simple_generator(handle h) to actually create the simple_generator<int>
  2. Additionally, promise_type::initial_suspend is called - which controls what happens immediately after construction. In our generators we suspend immediately.

  3. The user calls fib.start(). This actually does this:

    • Calls try_next
    • That calls mCoro.resume() to give it some CPU. (Internally mCoro.done() may get toggled)
    • That may raise an exception which wasn’t handled. If there was it gets rethrown to the caller.
  4. Inside the resumption, inside the coroutine, the code may co_yield a to return a value to the generator consumer.

    • That triggers *yield_value(some_value)* to be called
    • Inside there, we store some_value inide the promise_type object as m_current_value,
    • If an exception was raise we store that inside the promise_type as m_latest_exception (let’s assume here that doesn’t happen)
  5. The user then calls fib.running() to see what to do next

    • This returns the logical inverse of mCoro.done()
    • Let’s assume initially this returns true.
  6. The user calls fib.try_next()

    • That calls mCoro.resume() to give it some CPU. (Internally mCoro.done() may get toggled)
    • That may raise an exception which wasn’t handled. If there was it gets rethrown to the caller.
  7. The user then calls fib.running() to see what to do next

    • This returns the logical inverse of mCoro.done()
    • Let’s assume initially this returns true. If it does, we can loop back to 4.
    • If it doesn’t we carry on.
  8. The reason we got true from mCoro.done() is because either:

    • The coroutine exited at the end of the body
    • An exception was raised
    • co_return was called. But either way, the coroutine has finished.
  9. If the co_routine finishes by co_returning a value, promise_type::return_void gets called, which renders the coroutine unrestartable.

  10. If the co_routine finishes by an unhandled_exception**, thenpromise_type::unhandled_exception` gets called, which in our case doesn’t destroy the promise

  11. Simlar for normal exit - final_suspend is called

  12. Lastly, as fib() drops out of scope, the destructor is called, which destroys the handle if it is (still) valid.

Interlude

It’s worth noting, we’re not done yet!!! There’s still the ranges/iterator support API to write. When you consider this, there’s a remarkably large amount to deal with, from hidden compiler transforms to a whole slew of other details.

Get one wrong and the house of cards falls down.

Supporting range/iterator in C++20 Coroutine based generator

This consists of two halves:

  • An internal implementation
  • An external API

We’ll start with the latter and work backwards.

External simple_generator iterator API

The API for this is short, and sits directly withing simple_generator:

    iterator begin() { return iterator{*this, false}; }
    iterator end() { return iterator{*this, true}; }

This creates begin() and end endpoints which are used to the ranges API to determine whether it’s done or not. And that’s literally the since the prototype for iterator is: the simple_generator being iterated and a flag to signify done or not.

For out iterator to function, it needs

  • to capture the owner of the iterator - the the simple_generator being iterated
  • a done flag
  • a means of driving the generator forward.

These look like this:

    class iterator {
        simple_generator<T> &owner;
        bool done;
        void iter_next() {
            owner.try_next();
            done = not owner.running();
        }
    // NOTE: TBD
    }

Once those are in place, we then need to implement the standard iterator operations: - operator != - to see if we’re done or not - operator * - to return the current value - operator ++ - to advance to the next value - A constructor

These look like this:

    public:
        bool operator != (const iterator &r) const { return done != r.done; }
        auto operator * () const { return owner.take(); }
        iterator &operator ++ () { iter_next(); return *this; }
        iterator(simple_generator<T> &o, bool d) : owner(o), done(d) { if ( not done ) iter_next(); }

Once these are in place the simple_generator can be used in a “ranges-style” for loop.

Where’s the code?

Update: 11/4/2023

Want to read and runcode, not the blog? I know the feeling. I recently created a github repo specifically for putting code from blog posts in. Though recognising that it might also have non-code stuff added at some point, I’ve called it blog-extras.

The blog-extras repo can be found here:

The code specific to this series of short posts can be found here:

That repo will get fleshed out with code/examples from other posts in this blog too.

Brief Discussion

That was an awful lot to go through, of seemingly escalating complexity. There’s a longer discussion/set of thoughts on the next/last post in this short series.

However, I think it’s useful to reflect on what this escalating complexity. has bought us. It’s a lot.

Let’s take another look at the C++03 fibonnaci generator:

class Fibonnaci: public Generator {
    int a, b, s;
public:
    Fibonnaci() : a(1), b(1), s(0)  { }
    int next() {
    GENERATOR_CODE_START
        while ( true ) {
            YIELD(a);
            s = a + b;
            a = b;
            b = s;
        };
    GENERATOR_CODE_END
    };
};

… and also the C++20 fibonnaci generator:

simple_generator<int> fibs(int max) {
    int a{1}, b{1}, n{0};
    for(int i=0; i< max; i++) {
        co_yield a;
        n = a + b;
        a = b;
        b = n;
    }
}

… versus the python original:

def fibonnaci():
    a, b = 1,1
    while True:
        yield a
        a,b = b, a+b

At the end of the day, the C++20 generator looks a lot simpler and cleaner than the C++03 version. But the effort to get there was quite substantional. That said, once the simple_generator<> class is defined, it’s simple to use - and unlike the C++03 version, you’re not having to remember to build the scaffolding each time.

But is that it?

Not really, no.

NEXT POST: I got curious about performance…

Updated: 2023/09/15 21:12:34