Overview#

An overview of the fgen universe.

Aim#

The aim of fgen is to provide Python interfaces to Fortran code. More specifically, to provide Python interfaces to Fortran derived types. We have tested a number of patterns, but it still a work in progress. If wrapping your particular derived type does not work for some reason, please raise an issue. If you would like to see what we have already tested, please see the test cases in tests/test-data.

Crossing the Python-Fortran border#

At the moment, the generated wrappers only return copies of the relevant attribute/return type. We never return pointers. This makes life generally much simpler, because you can only change values if you do so explicitly. It does, of course, come with a memory and performance cost because you end up with multiple copies of things.

There is one exception to this, which is attributes which are themeselves derived types. In these cases, the behaviour depends on how the attribute is handled. The case you want will depend on what you’re doing (it may also be that you find one of these more natural, probably depending on what you’re used to). As a basic rule of thumb, if you want to be able to retrieve the attribute, make changes to it and have those changes appear on the Fortran side, without having to create a new instance of the original object (which held the attribute in the first place), you want pointers. If, instead, you want to be able to retrieve the attribute, but don’t want any changes thereafter to affect the original Fortran object, you want allocatable attributes. Another way to think about it is, if you want your class to encapsulate its attribute, i.e. you want the class to own its data so that no-one else can change the attribute without explicitly going through the class, you want your attributes to be allocatable. If you’re not sure, use allocatables because they’re more memory safe and generally easier to reason about (for further discussion on this point, see here).

We attempt to write our type hints so that they give you some guide (e.g. not accepting no setters versions of our wrappers if the underlying Fortran expects a pointer), but this is a work in progress. Having said that, we mostly don’t have to think about this stuff in Python, so this might be an entirely new concept if you’re just getting started with Fortran. If you are, we recommend asking colleagues or co-workers who are more experienced for help. If you find any good docs which explain the concept, please feel free to make a merge request adding them here!

Attributes that are allocatable, derived types#

If the attribute itself is allocatable in Fortan, the Python wrapper will respect this and also not behave like a pointer. In other words, the behaviour is like the below (for more detail, see the behaviour of tests/test-data/derived_type/derived_type_via_allocatable.f90)

derived_type_attribute = DerivedType(value=[1, 2, 3])
derived_type_user = DerivedTypeUser(allocatable_derived_type_attribute=derived_type_attribute)
retrieved_attribute = derived_type_user.allocatable_derived_type_attribute

derived_type_user.allocatable_derived_type_attribute.value
# result: [1, 2, 3]

retrieved_attribute.value = [3, 2, 1]
# result: AttributeError
# We make it so that attributes that are allocatable derived types
# have no setters, so you can't even think
# that you might be able to change the underlying Fortran.

derived_type_attribute.value
# result: [1, 2, 3]
# Unaffected by the attempted assignment to retrieved_attribute (obviously).

derived_type_user.allocatable_derived_type_attribute.value
# result: [1, 2, 3]
# Unaffected by the attempted assignment to retrieved_attribute (obviously).

Pointers to derived types#

In contrast to the previous example, if the attribute itself is a pointer, the Python wrapper will behave like a pointer. In other words, the behaviour is like the below (for more detail, see the behaviour of tests/test-data/derived_type/derived_type.f90)

derived_type_attribute = DerivedType(value=[1, 2, 3])
derived_type_user = DerivedTypeUser(pointer_to_derived_type=derived_type_attribute)
retrieved_attribute = derived_type_user.pointer_to_derived_type

derived_type_user.pointer_to_derived_type.value
# result: [1, 2, 3]

retrieved_attribute.value = [3, 2, 1]
# This assignment will affect both the original derived type
# and the derived type user's attribute,
# because the derived type is handled via a pointer.

derived_type_attribute.value
# result: [3, 2, 1]
# Even though we never directly modified derived_type_attribute.value from Python!

derived_type_user.pointer_to_derived_type.value
# result: [3, 2, 1]
# Even though we never directly modified derived_type_user.pointer_to_derived_type.value from Python!

Models, steppers and solvers#

[ TODO: move this all to libfgensolving or whatever as part of magicc/fgen#28 ]

When we talk about running models, what we usually actually mean is “integrating a model” or, even more precisely, “solving the initial-value problem defined by the ordinary differential equation (ODE) that defines our model, the inputs to the model (often described as ‘forcings’) and the initial-state of the system”. In most cases, this precision isn’t needed. However, when implementing our models in code and running them, the more precise picture is extremely helpful.

The first concept is “model”. When we use the term model, what we mean is the thing which defines the physics of the system. In most cases, this means the thing which defines how the system changes in response to its inputs, i.e. dy/dt.

The second concept is “stepper”. When solving an initial-value problem numerically, we must take “steps” to solve it (assuming that we cannot or don’t want to find a continuous solution because, if we had find a continuous solution we wouldn’t be using numerical methods in the first place). Our steppers define how we want to step forward in time when solving the model. There are a number of different techniques to do this, with some well known ones being Euler forward integration, Euler backward integration and the Runge-Kutta family of steppers. Each technique has its own strengths and weakness. Unfortunately, the right choice depends on the model, the model’s inputs and the time domain of interest. Becoming familiar with the strengths and weaknesses of different steppers is a worthwhile pursuit because it can be the difference between your model running quickly and solving with minimal error and a long run-time with a terrible result. Stepper isn’t a common name for this, but we find it much easier than the more complete names we have seen, such as methods used in temporal discretisation to find approximate solutions to equations.

The third concept is the solver. The solver combines the model with a stepper and the inputs to run (i.e. solve) the model. To do this, the solver has to do three things:

  • hold a reference to the model

  • hold the state of the system to use when starting the next time step’s integration

  • combine the above with a stepper and the model’s input (i.e. the solver is responsible for correctly linking the model with the inputs) to run/solve the model over time, returning a result

In short:

  • model: implements the physics of the system

    • keep this is as thin as possible to preserve the distinction between the physics of the system and numerical integration choices (i.e. strive for loose coupling between models and steppers)

  • stepper: defines the numerical integration scheme we want to use

  • solver: combines the model, stepper and model inputs to run/solve the model

For an example, see tests/test-data/two_layer_model, specifically tests/test-data/two_layer_model/two_layer_solver_demo.py (which shows how to use the model, stepper and solver together), tests/test-data/two_layer_model/two_layer_model.f90 (which implements the model), and tests/test-data/two_layer_model/two_layer_model_solver.f90 (which implements the solver).

fgen’s domain model#

fgen’s domain model is the following:

  • there is some Fortran that we wish to wrap.

  • alongside this Fortran, there must also be a description of the Fortran (currently, this description must be written in yaml) that provides the key details about the modules we wish to wrap. This information is required for the wrapping to happen.

    • We choose to write this information in standalone files because it is significantly simpler than trying to parse the Fortran. It also allows us to specify the things we support explicitly (rather than having to parse the entire Fortran language domain). The trade-off is that you end up duplicating some information and only certain ways of using Fortran are supported. This is a trade-off we are so far happy with.

  • based on the descriptions alone, fgen generates wrappers which allow the Fortran to be called directly from Python.

Within this higher-level domain model, fgen uses the following data model for its descriptions of packages, Fortran modules and their Python wrappers.

Package#

At the top level, we have a Package. This is just a collection of Module’s and ModuleEnumDefining’s. It is related to the idea of a package in Python, but isn’t exactly the same thing so it is best not to think about it in exactly the same way. The Package class allows us to put the information about the package and common functionality related to the package (e.g. get_module_that_provides_values_type()) in one container.

Module defining an enum#

If you don’t use enums, you can skip this section.

If you do use enums, it is possible to specify them using fgen. Fortran doesn’t have pure enums, but the same sort of behaviour can be achieved (see e.g. the enum definition in tests/test-data/exposed_attrs/enum_options.f90). Fgen will ensure that such enum-like objects in Fortran appear as actual enums in Python. It does this via ModuleEnumDefining.

Each ModuleEnumDefining corresponds to a Fortran module that defines an enum (see provides). Based on this definition, we can then generate the Python-equivalent of the enum. Unlike a module that defines a derived type (see Module), an enum is defined simply by its name and the values it can take. Enums don’t have attributes or methods, hence are much simpler to write and handle.

Currently, we assume that each enum module we wrap defines one, and only one, enum. This assumption forces developers to only have one enum definition per module, which makes it much easier to navigate the Fortran. Given Fortran modules are basically free, we think this requirement does not impose any real pain on developers, but are happy to discuss use cases we have not thought of.

Module#

Each Module corresponds to a Fortran module. The module data model defines the derived type (what we would call a class in Python) it provides (see provides for further details on how define this derived type) and the other modules this module depends on, which is required for correctly compiling/running the modules (see requirements). [TODO as part of magicc/fgen#34: fix up the truncated_name vs. short-name stuff, then explain how that works too. Even though it duplicates the docstring, I would be tempted to at least briefly discuss it here as it is the thing which has tripped me up most to date) ]. Currently, we assume that each module we wrap defines one, and only one, derived type. This assumption forces developers to only have one derived type per module, which makes it much easier to navigate the Fortran. Given Fortran modules are basically free, we think this requirement does not impose any real pain on developers, but are happy to discuss use cases we have not thought of.

Fortran derived types#

The derived types are represented as FortranDerivedType objects. These objects represent the derived type, its attributes (attributes) and its methdods (methods). At the moment, we think we can support arbitrary derived types. However, that doesn’t necessarily mean we would update our setup to support any use case. Keeping the scope of fgen limited is key to the success of the project.

Attributes of the derived type are represented by Value’s, because each attribute is simply a value.

Values#

Values are the combination of two things. The first is a UnitlessValue, which defines common quantities about the value such as its name (name), description (description), and fortran type (fortran_type). It is vital to note that the fortran type can itself be a derived type, which is how we can wrap objects that have fortran types as attributes (this allows us to wrap objects that have an arbitrarily deep tree of fortran derived types sitting underneath them). The second part of the value is the unit (unit), which defines the value’s unit.

Units#

The unit can be defined in two ways: fixed/static or dynamic. If a value’s units are specified via fgen.data_models.value.Value.unit or fgen.data_models.multi_return.MultiReturn.unit, for example as “K” or “kg”, then the units are fixed. Whenever the value is passed to Fortran from Python, its value will first be converted to the specified units first. As a result, in Fortran, all calculations can be done, safe in the knowledge that the value is in the desired units. When leaving Fortran, the units are reattached to the value before the Python user receives them (allowing us to provide values with quantities in the Python world). The caveat of this approach is that the value can only have the units that have been specified. There is no way for the unit to change or be defined by the Fortran.

As a result of the limitation of fixed units, we also have dynamic units. Dynamic units are useful when describing a data structure (often via a derived type) that can hold information about different “stuff” (e.g. timeseries), where the stuff doesn’t always have the same units. If a value’s units are dynamic, this can be specified via fgen.data_models.value.Value.dynamic_unit or fgen.data_models.multi_return.MultiReturn.dynamic_unit, There are two cases for dynamic units.

The first is that the dynamic units are a string. In this case, we assume that this string tells us how to get the units when we’re in Python, i.e. the string should be valid Python code. We will simply use this string directly as Python code whenever we need to get the units of a value. One example of where this is helpful is return quantities from methods, where the returned quantities units are simply the same as the units of some attribute of the type. For example, the units of a method double_attribute_value might simply just be equal to self.attribute_value, and this can be easily defined by setting fgen.data_models.value.Value.dynamic_unit or fgen.data_models.multi_return.MultiReturn.dynamic_unit equal to the string "self.attribute_value".

The second case is that the dynamic units are simply equal to True. In this case, the user must specify that an attribute of the (Fortran) derived type holds a string representation of the units. This attribute is then passed the units of quantities when going from Python to Fortran and is used to retrieve the units of quantities when going back to Python from Fortran. An attribute can be designated as the Fortran units holder via is_fortran_units_holder. Support for dynamic units allows the units of a derived type to change at runtime while still ensuring consistency between Fortran and Python. Any unit handling, e.g. conversion and consistency checks, must be handled by the Fortran container in this case.

For complete examples of how dynamic units can be used, please see the tests.

A unit is required for quantities that need units (like floats, although this is a little bit complicated, [TODO resolve in magicc/fgen#33 ] and not required for values without units (like strings).

Methods#

Methods of the derived type are defined as Method’s. This class defines the method’s name (name), description (description), parameters (parameters) and return value (returns). The parameters are always Value’s, and if you need more parameters, you just add them. In contrast, the return values can either be a Value or a MultiReturn. The reason for this is that Python methods/functions can only return one value, so if you want to return multiple values then they have to be returned as part of a tuple. MultiReturn allows us to handle this special case, including unit handling for the returned values.

Wrapping strategies#

A different strategy is needed to wrap each of the different Fortran types. These strategies are captured in fgen.wrapping_strategies, with WrappingStrategyLike defining the interface required by all strategies. Wrapper builder objects (e.g. PythonWrapperModuleBuilder) can then use these strategies and data_models to create the wrappers.

The wrapping strategies provide a clean API for the templates/builders (see Building the wrappers) to use when generating the wrapping code, independent of what is actually being wrapped. Put another way, the templates/builders can just say, “get me the wrapping strategy for this value and use it to generate the information I need to pass the value into Fortran”. wrapping_strategies then just provides the required information, without the template/builder having to know exactly what kind of value it is wrapping.

The wrapping strategies allow us to decouple our templates/builders and our supported Fortran types, making it much easier to add support for new types we need to wrap or update the templates/builders. The fact that the interface used by our templates is independent of the values being wrapped also makes it possible to handle arbitrary combinations of inputs to and outputs from Fortran. For each input, we simply get the steps required to pass it to Fortran. Then, we process the output of the called Fortran callable, without the template/builder needing to know what kind of output it actually is.

Building the wrappers#

Based on data_models and wrapping_strategies, fgen’s fgen.wrapper_building module can then generate the wrappers. It does this based on the Python classes within fgen.wrapper_building and a number of jinja2 templates. We have found that Jinja templates are very powerfully when used well. However, the logic within them should be kept to an absolute minimum.

These templates work with the builders in wrapper_building’s modules to generate the wrappers. Each template takes in the builder itself as an input, so that the template does not need to hold onto any logic itself. The builder holds all the generation logic and the template simply calls the builder’s methods to generate the needed parts of the template. As stated above, the logic within the jinja templates should be kept to an absolute minimum, because the developer experience is much more difficult in jinja2 (it isn’t a full programming language).

There are three builders to be aware of, all of which are tightly coupled (i.e. changes in one will likely trigger changes in another).

The first is FortranManagerModuleBuilder. The module that this class generates manages the lifecycles of our derived types. In short, the generated module provides a way for us to ensure that Python processes get their own instances of our derived types and don’t clash with each other, for example trying to use the same instance (which would lead to all sorts of hard to diagnose bugs).

The second builder is FortranWrapperModuleBuilder. The module that this class generates provides the interface needed to expose our derived types from Fortran to Python. The basic idea is that it provides an interface that can be wrapped by numpy’s f2py. f2py has quite specific requirements. This template generates code which complies to f2py’s requirements. For example, to allow us to access derived types from Python, it passes integers across the Python-Fortran interface (which f2py can understand), which works around the fact that f2py does not natively support derived types.

The third builder is PythonWrapperModuleBuilder. The module that this class generates is the Python interface to our Fortran derived type. The derived type appears in Python with the same name as the Fortran derived type. The generated Python interface also allows us to use quantities with units on the Python side (via Pint), without requiring the Fortran side to have a Pint-equivalent. This is a key feature, because it means we can work with a unit-safe universe on the Python side even if our Fortran side does not use quantities with units (e.g. a Pint-equivalent) during calculations. The (yaml-based) descriptions of our Fortran modules allow the developers of the Fortran modules to ensure that values go into Fortran with the correct units and to specify the units of the returned values too. The generated Python API for the derived type is almost the same as the Fortran API, except the intialisation pattern is slightly different because you have to connect/link/pair the Python instance with a Fortran instance before you can do anything (which adds an extra connection establishing step to the initialisation that doesn’t exist in the Fortran API), although even this can be effectively abstracted away by using the Python API’s class methods (which combine the connection and creation steps into one).

Beyond the builders, there are also two more basic functions which generate wrapping-related code. The first is generate_python_init_module(). This generates a Python __init__.py file for the wrapping module if one is not already there.

The second is generate_python_enums_module(). This generates the Python enum equivalent of any Fortran enum-like’s being wrapped.

The command-line interface can then be used to actually generate the wrappers based on the yaml files that describe the Fortran modules and Fortran enum modules (keeping in mind that the yaml files are simply yaml-serialised versions of Module and ModuleEnumDefining objects). (TODO: dedicated docs for CLI i.e. click docs version of fgen.commands).

fgen_runtime#

On the Python side, fgen_runtime provides runtime support for libraries wrapped with fgen. This includes typing support (fgen_runtime.units), custom exceptions (fgen_runtime.exceptions) and unit handling (fgen_runtime.units).

libfgen#

libfgen provides Fortran classes that allow us to do the wrapping. The key class is libfgen.fgen_base_finalizable.BaseFinalizable. This base class must be sub-classed by all classes that will be wrapped with fgen. It does very little, but it is vital to provide a common, stable API for our auto-generated wrappers. All it does is ensure that the class carries around its own instance index, so we can manage its lifecycle, and has a finalize method, which the auto-generated code knows is all it needs to call in order to finalise any instances of this class (e.g. deallocate memory).

libfgen currently also includes a number of things which we wish to split out into a separate repository (see https://gitlab.com/magicc/fgen/-/issues/28). We are going to separate them because we wish to wrap them with fgen too. The things we want to separate out are all our supporting classes, subroutines and functions for model solving and data handling.