*************************
Visualize provenance data
*************************

The **visualize_prov.py** script shows the basic functionality to generate
a graph file from serialized provenance data.

The script takes a file serialized in one of the RDF formats (e.g. Turtle) and
writes a GEXF file.


Running the script
------------------

The usage is:

.. code-block:: sh

    python visualize_prov.py [path_to_alpaca_PROV_file] [path_to_dest_GEXF_file]


Importing Alpaca and necessary objects
--------------------------------------

We start by importing the **ProvenanceGraph** object:

.. code-block:: python

    from alpaca import ProvenanceGraph


Selecting data to include in the visualization
----------------------------------------------

The captured metadata within the provenance track can be extensive. By default,
Alpaca captures all object attributes and, for some specific packages (e.g.,
Neo), additional information is captured in the form of annotations and array
annotations. You can pass lists of names to Alpace to limit the information
to include in the visualization graph to avoid cluttering.


Attributes
~~~~~~~~~~

Attributes are the usual Python object attributes (e.g. `object.name`). Alpaca
stores attribute values when the data objects are tracked. In the PROV files,
their values are stored with the `hasAttribute` property.

For example, when working with NumPy arrays, it is useful to check the
dimensions of the array (`shape` attribute), and also the data type stored
(`dtype` attribute). To include these attributes, we need a list/tuple like:

.. code-block:: python

    attributes = ['shape', 'dtype', 'name']

Here we also include the value of `name` if any object has it defined (e.g.,
Neo objects).


Annotations
~~~~~~~~~~~

Annotations are values stored inside a dictionary accessible by the
`annotations` or `array_annotations` attributes of the Python object.
These values are stored by Alpaca in PROV files in the form of
`hasAnnotation` properties. Array annotations are a special type of annotation.
For a Python object that is itself an array, with multiple elements, each
value in an array annotation will refer to the respective element in the
Python object.

For example, the `neo.Block` object may have a custom field called
`subject_name` to identify the name of the subject used in an
electrophysiology recording. For a `neo.Block` loaded into variable `block`,
this would be stored inside `block.annotations`. The dictionary would be
`{'subject_name': 'monkey_L'}`.

Additionally, for a `neo.SpikeTrain`, different annotations could be present,
such as `id` for the neuron identification, and `channel_id` indicating the channel
number from which the signal used to extract the neuron was obtained.
As there are multiple spike times stored in the `neo.SpikeTrain` object, an
array annotation will contain metadata referring to each individual spike.

To include the `subject_name`, `id` and `channel_id` annotation values in the 
visualization, we need a list/tuple like:

.. code-block:: python

    annotations = ['subject_name', 'id', 'channel_id']


Generating the visualization graph
----------------------------------

We can generate the visualization graph by passing the file name and
selected attributes/annotations to the **ProvenanceGraph** object:

.. code-block:: python

    prov_graph = ProvenanceGraph(prov_file=file_name,
                                 attributes=attributes,
                                 annotations=annotations)

Saving the graph file
---------------------

To save the graph as GEXF:

.. code-block:: python

    prov_graph.save_gexf(output_file)

Now, `output_file` is a GEXF file that can be read by Gephi to visualize a
graph with the provenance history, the object details, and function parameters.
For the output of the simple example (**run_basic.ttl**), you will have a file
called **run_basic.gexf**.