Global settings¶
For fine control of the provenance tracking with Alpaca, some settings can be set globally.
Every time a tracked function runs, the current state of the setting is used. The elements in the provenance records produced retain information describing which setting value was used, when applicable.
Currently, the following settings can be defined:
- use_builtin_hash_for_module: list of str
Objects from the packages defined in the list will be hashed using the builtin hash function, instead of joblib.hash.
Alpaca uses the joblib.hash function with SHA1 algorithm to obtain hashes used to identify the objects. This may be problematic for some packages, as the objects may be subject to changes that are not useful to be tracked in detail.
One example is matplotlib.Axes objects. joblib.hash is sensitive enough to produce a different hash every time the object is modified (e.g., adding a new plot, legend, etc.). For matplotlib.Axes objects, the builtin hash function is able to disambiguate each object instance among all other matplotlib.Axes objects within the script scope, but every modification will not produce a change in the hash value. Therefore, if adding ‘matplotlib’ to the list in this global setting, the chain of changes in matplotlib.Axes objects are not going to be shown and the provenance track will be simplified.
If objects of the package are elements of a container (e.g., list or NumPy array containing matplotlib.Axes objects) a special hash will be computed for the container, using the builtin hash (i.e., hash of the tuple containing the hashes of each element, which is obtained using the builtin hash).
Default: []
- authority: str
The string defining the authority component used in the identifiers throughout Alpaca.
Data objects, files, scripts, functions, and function executions are serialized to RDF using URN identifiers. The basic form of the identifier string is urn:[authority]:alpaca:[complement], where [complement] is a string composed by specific information of each element identified.
[authority] is a string that points to the institute or organisation which has responsibility over the script execution, and is defined by this setting.
Default: “my-authority”
- store_values: list of str
The values of the objects from the types in the list will be stored together with the provenance information. Note that objects of the builtin types str, bool, int, float and complex, as well as the NumPy numeric types (e.g. numpy.float64) are stored by default. This option should be used to store values of more complex types, such as dictionaries. In this case, the list in this setting should have the builtins.dict entry. The strings are the full path to the Python object, i.e., [module].[…].[object_class].
To set/read a setting, use the function alpaca_setting().
- alpaca.alpaca_setting(name, value=None)[source]¶
Gets/sets a global Alpaca setting.
- Parameters:
- namestr
Name of the setting.
- valueAny, optional
If not None, the setting name will be defined with value. If None, the current value of setting name will be returned. Default: None
- Returns:
- The new value of setting name or its current value.
- Raises:
- ValueError
If name is not one of the global settings used by Alpaca or if the type of value is not compatible. Check the documentation for valid names and their description.