Pairmemo is an R package for flexible on-disk function memoization. It allows you to cache function calls based on their arguments. Compared to memoise
and targets
, Pairmemo emphasizes manual cache management. Caching is keyed by the function's name and arguments but not by its body, so you can choose which cache entries to delete (if any) when altering the function. You can even add new parameters without invalidating the cache, if you give them default arguments. These features make Pairmemo appropriate for large projects where having to rerun code unnecessarily could take days, while Pairmemo's interface remains convenient for small tasks. Pairmemo also stores its calls on disk in a simple fashion that makes them easy to work with outside of Pairmemo, or outside of R entirely.
See pairmemo::define
to get started.
clear
— Delete saved callspairmemo::clear
deletes a function's saved calls from disk (and from memory, if define
recieved mem = T
). For each call, both paired files are deleted. By default, all calls are cleared; set the filter
parameter to clear a subset. The directory for the function is never deleted. If you want to get rid of the function entirely, you should do that manually.
f |
The function to delete calls for. |
filter |
Per metadata . It's wise to use metadata to check that your filter works as intended before you actually delete stuff. |
An object of the form c("Cache entries deleted" = n)
, where n
is an integer.
define
— Set up memoization for a functiondefine
is the main entry point for Pairmemo.
pairmemo::define
requires two arguments: f
, the function definition, and directory
, the path to a preexisting directory in which calls should be saved. Syntactically, f
must be of the form IDENTIFIER <- FUNCTION
. You can't use =
instead of <-
, or surround the whole thing in braces, or that sort of thing, because pairmemo::define
needs to extract the function name from f
.
Simple usage looks like this:
pairmemo::define( my.cool.function <- function(foo, bar) {message("Calling my cool function") foo + bar}, directory = "/tmp/pairmemo") my.cool.function(1, 2) my.cool.function(1, 2) # Cached, so no message is printed. pairmemo::clear(my.cool.function) # Clear the cache. my.cool.function(1, 2)
For legibility and namespace hygiene, I encourage calling Pairmemo's functions with ::
instead of using library(pairmemo)
.
If you look in "/tmp/pairmemo"
, you'll see a single directory named "my.cool.function"
. Inside "my.cool.function"
is a pair of files for each saved call (hence the name "Pairmemo"). One, storing the saved return value, is named with a hash of the arguments, with no file extension. The other file has the same name except that it ends with ".json"
: this is the metadata file. Inspecting the file (or calling metadata
) shows you e.g. what arguments the function was called with.
It helps to understand how Pairmemo regularizes and hashes the arguments, so you can predict what changes to your function will invalidate your caches. The arguments are put in a named list, where the names are the parameter names (in full, whether or not the call used a partial match), in alphabetical order. If an argument is set to its default for that parameter, and the default can be evaluated with eval.const
, and there is no ...
parameter for the function, that argument is excluded from the list: thus, you can add parameters, with the default value representing previous behavior, to an existing function while keeping your entire cache. The list is then hashed with digest
and the character "h" is prepended. If you rename your function, just rename its directory to retain the cache.
In practice, a given project will probably have several different functions to be cached, which have different names but should all use the same Pairmemo directory, so it's convenient to use a wrapper function for define
like so:
pm = function(...) pairmemo::define( directory = "/home/neumanae/myproject/pairmemo", n.frame = 2, ...) pm(my.cool.function <- function(foo, bar) {message("Calling my cool function") foo + bar})
define
has several optional arguments:
mem
: If true, save calls in memory, as well as on disk. Calls that are saved on disk but weren't made in this R process are only loaded into memory once they're read from disk for the first time.
format
: A string or function specifying the file format for saving calls (see Formats).
ap
: A named list of functions to use as argument preprocessors (see below).
n.frame
: Used as an argument to parent.frame
in the assign
call that pairmemo::define
makes to name the newly defined function. The default, 1, is appropriate when calling define
directly. Use 2 when writing a wrapper function for define
.The return value of the assign
call is passed through.
It can be useful to transform arguments before Pairmemo hashes them. For example, if a function has a parameter that should be an integer, applying as.integer
ensures that Pairmemo treats calling the function with 2.0
the same as calling it with 2L
.
The parameter ap
("argument preprocessors") to pairmemo::define
names each argument to be preprocessed and provides a unary function to do the processing. Notice that f
receives the preprocesssed result, not the original argument; this is construed as a feature.
pairmemo::define(directory = "…", ap = list(n = as.integer), f <- function(n) class(n)) f(2.0) # "integer"
If the first argument in a call to a memoized function is named PAIRMEMO.KV
, Pairmemo will specially recognize (and remove) this argument. Then, the function will return a named list of two objects k
and v
, per kvs
(but unlike kvs
, the result is the same regardless of whether the call has already been cached). This feature is mostly useful to get the execution time of the function.
eval_const
— Evaluate a class of constant expressionseval.const
evaluates a quoted expression in a limited environment that only allows things like arithmetic operators and base::list
. The function exists mostly for internal use, but who knows, maybe you'll find it handy.
The implementation is based on a Stack Overflow answer by Hadley Wickham.
formats
— Pairmemo file formatsBy default, Pairmemo saves the results of calling memoized functions as RDS objects, via saveRDS
. Using the format
parameter to define
, you can choose one of the other built-in formats, or create your own. I recommend qs
for most purposes; I only picked RDS as the default to minimize dependencies.
To use a built-in format, set format
to an element of names(pairmemo::builtin.formats)
(or an element of pairmemo::builtin.formats
itself). To use a custom format, set format
to a named list with these elements:
name
: A string. This is included in the JSON metadata paired with the saved call for the sake of human intelligibility, but is otherwise unused.
read
: A unary function that takes a file path and returns the object that was saved there.
write
: A binary function that takes an object to write followed by the path to write to.If you change the format assigned to a function that already has cached calls, be sure to clear the cache, or manually convert all the calls. Otherwise, Pairmemo will try to read them in the wrong format.
kvs
— Read a function's saved calls en massepairmemo::kvs
is similar to metadata
, but it also reads and return function values. The return value of kvs
is a list of named lists; the names are k
("key", for the metadata) and v
("value").
f |
The function to inspect. |
filter |
Per metadata . Calls are filtered out before the call values are read from disk, so a well-chosen filter can save time and memory for a function with massive saved calls. |
metadata
— List a function's saved callspairmemo::metadata
returns a list describing all currently memoized calls of a Pairmemoized function. The names of this list are the argument hashes, and the values are the deserialized JSON metadata files.
f |
The function to inspect. |
filter |
An optional function to subset the calls that are returned. |
Each JSON metadata file has these elements:
file_format
: A string naming the format in which the call is saved; see Formats.
time
: The time the call took, in seconds.
args
: The arguments to the function, as a named list. Since the jsonlite
package doesn't round-trip all R objects, these values won't always be equal to the real arguments. In particular, two lists of arguments can be non-equal in R terms (and have distinct hashes, hence be treated as separate calls by Pairmemo) but result in the same JSON.If a filter
argument is given to pairmemo::metadata
, it should take a named list representing a JSON metadata file and return a boolean value. (Returning a vector with more than one element is an error, but a zero-length vector is understood to mean "false".) The corresponding call is only returned by pairmemo::metadata
if the filter returns true.
path2hash
— Extract the hash alone from the filepath of a JSON metadata fileThis function exists mostly for internal use, but who knows, maybe you'll find it handy.