:mod:`commander.memoize`: Reusing results
=========================================

Perhaps the primary reason to use memoization is the convenience during
debugging to save computational results to disk (even if memoization is
often turned off for cluster runs).

But another reason for memoization is to structure the program. If
a temporary result is memoized, there's no need to explicitly pass it
around::

    V = SkyObservation(name='V', source='WMAP7yr', ...)
    x = compute_frobnification(V) # uses square of RMS map inside
    y = compute_bartification(V) # also uses square of RMS map inside

In "traditional" programming, one *should* (and do, if the computation
time is large) figure out everything that the two functions share in
terms of temporary result, compute that in the caller, and pass it in.
But that can become unwieldy, and when it does, memoization is your friend.

.. note::

    A big part of what's memoziation is used for in Commander is simply
    to read in the input data.

.. warning::

    Use memoization sparingly. (Unless until we get a better handle on
    it.)  The problem doesn't really disappear, because one now needs
    to figure out when to release the results from the memoization
    cache, which can't be done perfectly by any heuristic (and at the
    time of writing we don't have any heuristics but just fill up the store
    forever).


Core idea
---------

Memoized results are always associated with an explicit :class:`MemoContext`
(of which :class:`CommanderContext` is a subclass). You annotate a function/method
with ``@memoize``/``@memoize_method``::

    from commander.memoize import memoize, memoize_method

    class MyContext(MemoContext):
        @memoize_method('description_of_result', tags=['disk'])
        def method(self, ctx, arg): ...

    @memoize() # default name of result is 'compute_foo'
    def compute_foo(ctx, arg): ...

The ``ctx`` argument is special, and is where the cached results are stored.

For this to work,

 * all input arguments must support hashing
 * the result must support being made immutable

The tags can be arbitrary strings, which are then picked up on by the memoization
policies. The only one currently supported is ``"disk"`` which causes it
to be stored to disk cache (if disk cache is enabled).

.. note::

    The reason for the separate ``@memoize_method`` is that the context is the
    second argument rather than the first. In subclasses of :class:`MemoContext`
    one should use ``@memoize_in_self``.


Hashing protocol
----------------

TODO

Immutabilification
------------------

The results from a memoized function will be converted to read-only (or if
this is not possible, such as a with a dict, an exception will be raised).
The conversion process is recursive through lists.

If the type is not known, the memoization will try to call an ``as_immutable(self)``
method to convert the result.

Comparison with joblib
----------------------

The Joblib library is intended to enter your own (presumably dynamic, interactive)
workflow with a minimum of intervention. Therefore it, e.g., checks that the source
code has changed (and if so invalidates the cache for the function), tries to
hash almost any input (by running it through a hashing "pickler"), and so on. A
typical joblib example is::

    @cache
    def f(arr): return fft(arr**2) # arr is a NumPy array

On the other hand, ``commander.memoize`` is very explicit. It will not try to hash
NumPy arrays for instance, instead you typically use descriptors which you
use to fetch the data . Typical example::

    @memoize(, tags=['disk'])
    def f(ctx, map_descriptor):
        arr = ctx.get_foo(map_descriptor)
        return fft(arr)