Kodi Arfer / Code

Pairmemo

Source code (GitHub)

Pairmemo is an R package for flexible on-disk function memoization. It allows you to cache function calls based on their arguments. Compared to memoise and targets, Pairmemo emphasizes manual cache management. Caching is keyed by the function's name and arguments but not by its body, so you can choose which cache entries to delete (if any) when altering the function. You can even add new parameters without invalidating the cache, if you give them default arguments. These features make Pairmemo appropriate for large projects where having to rerun code unnecessarily could take days, while Pairmemo's interface remains convenient for small tasks. Pairmemo also stores its calls on disk in a simple fashion that makes them easy to work with outside of Pairmemo, or outside of R entirely.

See pairmemo::define to get started.

clear — Delete saved calls

pairmemo::clear deletes a function's saved calls from disk (and from memory, if define recieved mem = T). For each call, both paired files are deleted. By default, all calls are cleared; set the filter parameter to clear a subset. The directory for the function is never deleted. If you want to get rid of the function entirely, you should do that manually.

Arguments

f The function to delete calls for.
filter Per metadata. It's wise to use metadata to check that your filter works as intended before you actually delete stuff.

Value

An object of the form c("Cache entries deleted" = n), where n is an integer.

define — Set up memoization for a function

define is the main entry point for Pairmemo.

Details

pairmemo::define requires two arguments: f, the function definition, and directory, the path to a preexisting directory in which calls should be saved. Syntactically, f must be of the form IDENTIFIER <- FUNCTION. You can't use = instead of <-, or surround the whole thing in braces, or that sort of thing, because pairmemo::define needs to extract the function name from f.

Simple usage looks like this:

    pairmemo::define(
        my.cool.function <- function(foo, bar)
           {message("Calling my cool function")
            foo + bar},
        directory = "/tmp/pairmemo")

    my.cool.function(1, 2)
    my.cool.function(1, 2) # Cached, so no message is printed.

    pairmemo::clear(my.cool.function) # Clear the cache.
    my.cool.function(1, 2)

For legibility and namespace hygiene, I encourage calling Pairmemo's functions with :: instead of using library(pairmemo).

If you look in "/tmp/pairmemo", you'll see a single directory named "my.cool.function". Inside "my.cool.function" is a pair of files for each saved call (hence the name "Pairmemo"). One, storing the saved return value, is named with a hash of the arguments, with no file extension. The other file has the same name except that it ends with ".json": this is the metadata file. Inspecting the file (or calling metadata) shows you e.g. what arguments the function was called with.

It helps to understand how Pairmemo regularizes and hashes the arguments, so you can predict what changes to your function will invalidate your caches. The arguments are put in a named list, where the names are the parameter names (in full, whether or not the call used a partial match), in alphabetical order. If an argument is set to its default for that parameter, and the default can be evaluated with eval.const, and there is no ... parameter for the function, that argument is excluded from the list: thus, you can add parameters, with the default value representing previous behavior, to an existing function while keeping your entire cache. The list is then hashed with digest and the character "h" is prepended. If you rename your function, just rename its directory to retain the cache.

In practice, a given project will probably have several different functions to be cached, which have different names but should all use the same Pairmemo directory, so it's convenient to use a wrapper function for define like so:

    pm = function(...)
        pairmemo::define(
            directory = "/home/neumanae/myproject/pairmemo",
            n.frame = 2,
            ...)

    pm(my.cool.function <- function(foo, bar)
       {message("Calling my cool function")
        foo + bar})

define has several optional arguments:

Value

The return value of the assign call is passed through.

Argument preprocessors

It can be useful to transform arguments before Pairmemo hashes them. For example, if a function has a parameter that should be an integer, applying as.integer ensures that Pairmemo treats calling the function with 2.0 the same as calling it with 2L.

The parameter ap ("argument preprocessors") to pairmemo::define names each argument to be preprocessed and provides a unary function to do the processing. Notice that f receives the preprocesssed result, not the original argument; this is construed as a feature.

    pairmemo::define(directory = "…", ap = list(n = as.integer),
        f <- function(n) class(n))
    f(2.0)  # "integer"

PAIRMEMO.KV

If the first argument in a call to a memoized function is named PAIRMEMO.KV, Pairmemo will specially recognize (and remove) this argument. Then, the function will return a named list of two objects k and v, per kvs (but unlike kvs, the result is the same regardless of whether the call has already been cached). This feature is mostly useful to get the execution time of the function.

eval_const — Evaluate a class of constant expressions

eval.const evaluates a quoted expression in a limited environment that only allows things like arithmetic operators and base::list. The function exists mostly for internal use, but who knows, maybe you'll find it handy.

References

The implementation is based on a Stack Overflow answer by Hadley Wickham.

formats — Pairmemo file formats

By default, Pairmemo saves the results of calling memoized functions as RDS objects, via saveRDS. Using the format parameter to define, you can choose one of the other built-in formats, or create your own. I recommend qs for most purposes; I only picked RDS as the default to minimize dependencies.

Details

To use a built-in format, set format to an element of names(pairmemo::builtin.formats) (or an element of pairmemo::builtin.formats itself). To use a custom format, set format to a named list with these elements:

If you change the format assigned to a function that already has cached calls, be sure to clear the cache, or manually convert all the calls. Otherwise, Pairmemo will try to read them in the wrong format.

kvs — Read a function's saved calls en masse

pairmemo::kvs is similar to metadata, but it also reads and return function values. The return value of kvs is a list of named lists; the names are k ("key", for the metadata) and v ("value").

Arguments

f The function to inspect.
filter Per metadata. Calls are filtered out before the call values are read from disk, so a well-chosen filter can save time and memory for a function with massive saved calls.

metadata — List a function's saved calls

pairmemo::metadata returns a list describing all currently memoized calls of a Pairmemoized function. The names of this list are the argument hashes, and the values are the deserialized JSON metadata files.

Arguments

f The function to inspect.
filter An optional function to subset the calls that are returned.

Details

Each JSON metadata file has these elements:

If a filter argument is given to pairmemo::metadata, it should take a named list representing a JSON metadata file and return a boolean value. (Returning a vector with more than one element is an error, but a zero-length vector is understood to mean "false".) The corresponding call is only returned by pairmemo::metadata if the filter returns true.

path2hash — Extract the hash alone from the filepath of a JSON metadata file

This function exists mostly for internal use, but who knows, maybe you'll find it handy.