:mod:`commander.memoize`: Reusing results ========================================= Perhaps the primary reason to use memoization is the convenience during debugging to save computational results to disk (even if memoization is often turned off for cluster runs). But another reason for memoization is to structure the program. If a temporary result is memoized, there's no need to explicitly pass it around:: V = SkyObservation(name='V', source='WMAP7yr', ...) x = compute_frobnification(V) # uses square of RMS map inside y = compute_bartification(V) # also uses square of RMS map inside In "traditional" programming, one *should* (and do, if the computation time is large) figure out everything that the two functions share in terms of temporary result, compute that in the caller, and pass it in. But that can become unwieldy, and when it does, memoization is your friend. .. note:: A big part of what's memoziation is used for in Commander is simply to read in the input data. .. warning:: Use memoization sparingly. (Unless until we get a better handle on it.) The problem doesn't really disappear, because one now needs to figure out when to release the results from the memoization cache, which can't be done perfectly by any heuristic (and at the time of writing we don't have any heuristics but just fill up the store forever). Core idea --------- Memoized results are always associated with an explicit :class:`MemoContext` (of which :class:`CommanderContext` is a subclass). You annotate a function/method with ``@memoize``/``@memoize_method``:: from commander.memoize import memoize, memoize_method class MyContext(MemoContext): @memoize_method('description_of_result', tags=['disk']) def method(self, ctx, arg): ... @memoize() # default name of result is 'compute_foo' def compute_foo(ctx, arg): ... The ``ctx`` argument is special, and is where the cached results are stored. For this to work, * all input arguments must support hashing * the result must support being made immutable The tags can be arbitrary strings, which are then picked up on by the memoization policies. The only one currently supported is ``"disk"`` which causes it to be stored to disk cache (if disk cache is enabled). .. note:: The reason for the separate ``@memoize_method`` is that the context is the second argument rather than the first. In subclasses of :class:`MemoContext` one should use ``@memoize_in_self``. Hashing protocol ---------------- TODO Immutabilification ------------------ The results from a memoized function will be converted to read-only (or if this is not possible, such as a with a dict, an exception will be raised). The conversion process is recursive through lists. If the type is not known, the memoization will try to call an ``as_immutable(self)`` method to convert the result. Comparison with joblib ---------------------- The Joblib library is intended to enter your own (presumably dynamic, interactive) workflow with a minimum of intervention. Therefore it, e.g., checks that the source code has changed (and if so invalidates the cache for the function), tries to hash almost any input (by running it through a hashing "pickler"), and so on. A typical joblib example is:: @cache def f(arr): return fft(arr**2) # arr is a NumPy array On the other hand, ``commander.memoize`` is very explicit. It will not try to hash NumPy arrays for instance, instead you typically use descriptors which you use to fetch the data . Typical example:: @memoize(, tags=['disk']) def f(ctx, map_descriptor): arr = ctx.get_foo(map_descriptor) return fft(arr)