commander.memoize: Reusing results

Perhaps the primary reason to use memoization is the convenience during debugging to save computational results to disk (even if memoization is often turned off for cluster runs).

But another reason for memoization is to structure the program. If a temporary result is memoized, there’s no need to explicitly pass it around:

V = SkyObservation(name='V', source='WMAP7yr', ...)
x = compute_frobnification(V) # uses square of RMS map inside
y = compute_bartification(V) # also uses square of RMS map inside

In “traditional” programming, one should (and do, if the computation time is large) figure out everything that the two functions share in terms of temporary result, compute that in the caller, and pass it in. But that can become unwieldy, and when it does, memoization is your friend.

Note

A big part of what’s memoziation is used for in Commander is simply to read in the input data.

Warning

Use memoization sparingly. (Unless until we get a better handle on it.) The problem doesn’t really disappear, because one now needs to figure out when to release the results from the memoization cache, which can’t be done perfectly by any heuristic (and at the time of writing we don’t have any heuristics but just fill up the store forever).

Core idea

Memoized results are always associated with an explicit MemoContext (of which CommanderContext is a subclass). You annotate a function/method with @memoize/@memoize_method:

from commander.memoize import memoize, memoize_method

class MyContext(MemoContext):
    @memoize_method('description_of_result', tags=['disk'])
    def method(self, ctx, arg): ...

@memoize() # default name of result is 'compute_foo'
def compute_foo(ctx, arg): ...

The ctx argument is special, and is where the cached results are stored.

For this to work,

  • all input arguments must support hashing
  • the result must support being made immutable

The tags can be arbitrary strings, which are then picked up on by the memoization policies. The only one currently supported is "disk" which causes it to be stored to disk cache (if disk cache is enabled).

Note

The reason for the separate @memoize_method is that the context is the second argument rather than the first. In subclasses of MemoContext one should use @memoize_in_self.

Hashing protocol

TODO

Immutabilification

The results from a memoized function will be converted to read-only (or if this is not possible, such as a with a dict, an exception will be raised). The conversion process is recursive through lists.

If the type is not known, the memoization will try to call an as_immutable(self) method to convert the result.

Comparison with joblib

The Joblib library is intended to enter your own (presumably dynamic, interactive) workflow with a minimum of intervention. Therefore it, e.g., checks that the source code has changed (and if so invalidates the cache for the function), tries to hash almost any input (by running it through a hashing “pickler”), and so on. A typical joblib example is:

@cache
def f(arr): return fft(arr**2) # arr is a NumPy array

On the other hand, commander.memoize is very explicit. It will not try to hash NumPy arrays for instance, instead you typically use descriptors which you use to fetch the data . Typical example:

@memoize(, tags=['disk'])
def f(ctx, map_descriptor):
    arr = ctx.get_foo(map_descriptor)
    return fft(arr)

Table Of Contents

Previous topic

commander.sphere: Working with spherical data

Next topic

The Commander Developer’s Guide

This Page