<%@meta language="R-vignette" content="-------------------------------- %\VignetteIndexEntry{A Future for R: Future API Backend Specification} %\VignetteAuthor{Henrik Bengtsson} %\VignetteKeyword{R} %\VignetteKeyword{package} %\VignetteKeyword{vignette} %\VignetteKeyword{future} %\VignetteKeyword{Future API} %\VignetteKeyword{backend} %\VignetteEngine{R.rsp::rsp} %\VignetteTangle{FALSE} --------------------------------------------------------------------"%> # Future API Backend Specification Version 0.1.2 ## Introduction This document is written to serve as a reference for developers who are developing a future backend to the future framework as implemented in the [future] package for R that available on CRAN. The Future Application Programming Interface (API) has three fundamental functions at its core: * `f <- future(expr)` - create a future from an R expression (non-blocking but may be blocking) * ` r <- resolved(f)` - check whether a future is resolved or not (non-blocking) * `v <- value(f)` - retrieve the value of a future (blocking) With these three functions alone, it is possible to evaluate one or more R expressions synchronously and asynchronously. How and where these expressions are resolved depends on which "future backend" is in use. For example, one backend may evaluated the expressions sequentially (synchronously) while another may evaluated them in parallel (asynchronously). Regardless of backend, the value of a future expression is always the same. It is fundamental to the future ecosystem that all future backends conform to the Future API specification. Conformance serves as a guarantor of correctness and behavior for both the developer who use futures in their software as well as the end-user of their software. A future backend that meets the requirements can be used in any software that use futures internally. For example, the above three functions serve as building blocks in several higher-level map-reduce APIs. One example is the [future.apply] package on CRAN that provides `future_lapply()`, which is a futurized version of ` lapply()` available in the 'base' package. This function can be used to perform the lapply-like processing in parallel using a parallel backend. The implementation of the 'future.apply' package is 100% invariant to the parallel backend used. This is possible because all future backends conform to a set of rules. Rules that are documented below. A supplement to the specification herein is the 'Test Suite for Future API Backends', which consists of a set of tests that can be used to validated that a future backend meets the minimal requirements of the Future API. These tests run from the command-line, from the R prompt, or as part of the package tests of a backend package. This test suite is documented and implemented in the [future.tests] package available on CRAN. ## Feedback If you find that something in this document to be missing, unclear, or faulty, please report your feedback using the official issue tracker for the 'future' package at . If you have feedback that is specific to the test suite, please use the official issue tracker for the 'future.tests' package at ## Overview of the Future API The Future API has three fundamental functions at its core: * `f <- future(expr)` - create a future from an R expression (non-blocking but may be blocking) * ` r <- resolved(f)` - check whether a future is resolved or not (non-blocking) * `v <- value(f)` - retrieve the value of a future (blocking) The implementation of a future backend for these involves several steps. For simplicity, lets say we call our future backend 'myparallel'. In summary, a future backend needs to implement four API components: * A constructor function `myparallel()` that inherits from class `future`. This function should return a `Future` object (as defined in the [future] package) that also inherits from S3 class `MyParallelFuture` if a non-lazy future is created (the default `lazy = FALSE`). The default should be that this function calls `run()` on the `Future` object before returning, unless a _lazy_ is created. * An S3 method of `run()` for `MyParallelFuture` that starts the evaluation of the future R expression part of the `Future` object. This method is often non-blocking for parallel backends, but may be blocking if all compute resources are exhausted. It is typically blocking for sequential backends. * An S3 method of `resolved()` for `MyParallelFuture` that, in a non-blocking fashion, returns `TRUE` if the future is resolved and `FALSE` if not. * An S3 method of `result()` for `MyParallelFuture` that returns a `FutureResult` object (as defined by the [future] package) when the future is resolved or otherwise fails to resolve. If the future is not yet resolved, this method should block until the future is resolved. With this in place, the selection of using this backend as the future plan, will be done as `plan(myparallel)` with the option of specifying certain arguments to be passed to `myparallel()`. With the plan set, a call to `f <- future(expr)` will then correspond to a `f <- myparallel(expr)` call. With the defaults, `myparallel()` will then launch the evaluation of `expr` asynchronously before returning the `MyParallelFuture` object. When calling `resolved(f)` to query whether the future expression is resolved or not, the underlying S3 method for this class will then check in with the parallel worker whether the expression is resolved or not. When calling `value(f)`, the S3 method for the `Future` class calls `result(f)`, which will return the `FutureResult` object for this future. If the future is not yet resolved, this call will block until it is. If no errors occurred while resolving the future expression, then `value(f)` will return the value of the expression, which is recorded by the backend in the `FutureResult` object. If there was an evaluation error, then `value(f)` will resignal ("relayed") that error. Any captured conditions or standard output will also be relayed at this point. ## Requirements for the backend Future API This section describes in detail what the requirements of the above four components are. The requirements are given as a continuation of the above 'myparallel' example. If otherwise not specified, all functions mentioned below are from the [future] package. ### Constructor function creating a Future The constructor function `myparallel()` for creating a `Future` object must inherits from class `future` such that `inherits(myparallel, "future")` is true. The constructor function should have explicit arguments `expr`, `substitute`, `envir`, and `...`, where argument `expr` is an expression, argument `substitute` is a logical (`TRUE` or `FALSE`), and `envir` is an environment. All or parts of the `...` arguments should be passed along to the `Future()` function. If `substitute` is `TRUE` (default), then `expr` should be re-evaluated as `expr <- substitute(expr)`. Environment `envir` should default to `parent.frame()` as it is used to identify global variables. If the backend supports more than one worker, then it should also have an explicit `workers` argument. Any currying arguments that can be specified when setting the future plan must be explicit arguments of the constructor function such that they appear as named elements in `formals(myparallel)`. Currying arguments are arguments that can be "tweaked" by the end-user, e.g. `plan(myparallel, workers = 2)`. The value of the constructor function should be invisible and a `MyParallelFuture` object that inherits from `Future`. This is achieved by calling `Future()`, with all matching arguments passed along and then prepending `"MyParallelFuture"` to the class attribute of the returned `Future` object. Before returning the `Future` object, the constructor function should launch the future if, and only if, the `lazy` element of the `Future` object (e.g. `f$lazy`) is `FALSE` (default). The constructor function must never evaluate the expression `expr` if `lazy` is `TRUE`. The constructor function must not update the RNG state. ### run() method An S3 method `run()` for `MyParallelFuture` that takes a `Future` object as its first argument is required. It should accept additional arguments via `...`, which are currently not used. The `run()` method should invisibly return the `Future` object, which may be a modified version of the input `Future` object. The `run()` method is responsible for not launching the same future twice. If `run()` is called on an already launched or a resolved future, then an informative `FutureError` error constructed by the `FutureError()` function should be produced. The `run()` method is responsible for evaluation the expression returned by `getExpression()` with the `Future` object as the first argument. The evaluation of this expression should respect any global variables in the `FutureGlobals` object returned by `globals()` with the `Future` object as the first argument. The evaluation should also respect any package names returned by `packages()` with the `Future` object as the first argument. If the backend provides parallel processing, then `run()` should return the future as soon as possible and without waiting for it to be resolved. If all workers are occupied, then `run()` is responsible for waiting until a worker becomes available and then launch the future on that worker and immediatedly return the future. The `run()` method may produce `FutureError` error as created by `FutureError()` in case it fails to launch the future on the worker or the worker has terminated unexpectedly. The `run()` method must not update the RNG state. ### resolved() method An S3 method `resolved()` for `MyParallelFuture` that takes a `Future` object as its first argument and return either `TRUE` or `FALSE` is required. It should accept additional arguments via `...`, which are currently not used. The method may be called zero or more times. The method should return `FALSE` as long as the future is unresolved. It may also return `FALSE` if it fail to establish the state of the future within a reasonable time period ("timeout"). It should return `TRUE` as soon as it can be established that the future is resolved. After it has returned `TRUE` once, any succeeding calls should return `TRUE`. If `resolved()` is called on a future that yet has not been launched, it should launch the future by calling `run()`. This is the only occasion when `resolved()` may block. In all other cases, it should return promptly. The `resolved()` method may produce `FutureError` error as created by `FutureError()` in case communication with the worker has broken down or the worker has terminated unexpectedly. The `resolved()` method must not update the RNG state. ### result() method An S3 method `result()` for `MyParallelFuture` that takes a `Future` object as its first argument and return a `FutureResult` object is required. It should accept additional arguments via `...`, which are currently not used. The method may be called zero or more times. If `result()` is called on a future that yet has not been launched, it should launch the future by calling `run()`. If `result()` is called on a future that is not yet resolved, it should block until the future is resolved. The value of `result()` should be the value from evaluating the `getExpression()` expression that `run()` launched. The `result()` method may produce `FutureError` error as created by `FutureError()` in case communication with the worker has broken down or the worker has terminated unexpectedly. The `result()` method must not update the RNG state. [future]: https://cran.r-project.org/package=future [future.tests]: https://cran.r-project.org/package=future.tests