Visualize provenance data¶

The visualize_prov.py script shows the basic functionality to generate a graph file from serialized provenance data.

The script takes a file serialized in one of the RDF formats (e.g. Turtle) and writes a GEXF file.

Running the script¶

The usage is:

python visualize_prov.py [path_to_alpaca_PROV_file] [path_to_dest_GEXF_file]

Importing Alpaca and necessary objects¶

We start by importing the ProvenanceGraph object:

from alpaca import ProvenanceGraph

Selecting data to include in the visualization¶

The captured metadata within the provenance track can be extensive. By default, Alpaca captures all object attributes and, for some specific packages (e.g., Neo), additional information is captured in the form of annotations and array annotations. You can pass lists of names to Alpace to limit the information to include in the visualization graph to avoid cluttering.

Attributes¶

Attributes are the usual Python object attributes (e.g. object.name). Alpaca stores attribute values when the data objects are tracked. In the PROV files, their values are stored with the hasAttribute property.

For example, when working with NumPy arrays, it is useful to check the dimensions of the array (shape attribute), and also the data type stored (dtype attribute). To include these attributes, we need a list/tuple like:

attributes = ['shape', 'dtype', 'name']

Here we also include the value of name if any object has it defined (e.g., Neo objects).

Annotations¶

Annotations are values stored inside a dictionary accessible by the annotations or array_annotations attributes of the Python object. These values are stored by Alpaca in PROV files in the form of hasAnnotation properties. Array annotations are a special type of annotation. For a Python object that is itself an array, with multiple elements, each value in an array annotation will refer to the respective element in the Python object.

For example, the neo.Block object may have a custom field called subject_name to identify the name of the subject used in an electrophysiology recording. For a neo.Block loaded into variable block, this would be stored inside block.annotations. The dictionary would be {‘subject_name’: ‘monkey_L’}.

Additionally, for a neo.SpikeTrain, different annotations could be present, such as id for the neuron identification, and channel_id indicating the channel number from which the signal used to extract the neuron was obtained. As there are multiple spike times stored in the neo.SpikeTrain object, an array annotation will contain metadata referring to each individual spike.

To include the subject_name, id and channel_id annotation values in the visualization, we need a list/tuple like:

annotations = ['subject_name', 'id', 'channel_id']

Generating the visualization graph¶

We can generate the visualization graph by passing the file name and selected attributes/annotations to the ProvenanceGraph object:

prov_graph = ProvenanceGraph(prov_file=file_name,
                             attributes=attributes,
                             annotations=annotations)

Saving the graph file¶

To save the graph as GEXF:

prov_graph.save_gexf(output_file)

Now, output_file is a GEXF file that can be read by Gephi to visualize a graph with the provenance history, the object details, and function parameters. For the output of the simple example (run_basic.ttl), you will have a file called run_basic.gexf.