Persistent Homology Tutorial 3 of 4

(See also Part 1, Part 2, and Part 4.)

Ripser on your machine

A more advanced (but very useful) step is to now download Ripser to your machine and to run it locally. This allows you to perform larger computations. Ripser is written in C++. You may download the code for Ripser here, which also contains installation instructions. Minimal installation instructions are listed below.

git clone https://github.com/Ripser/ripser.git
cd ripser
make all
./ripser examples/sphere_3_192.lower_distance_matrix

For convenience, you may want to download all the content of this folder and copy the executable file ripser into that folder, then cd into the same folder.

You can use the flag --format distances to specify you are computing on a distance matrix, or --format point-cloud to specify you are computing on a point cloud. The flag --dim k specifies that homology is computed only up to dimension k, and the flag --threshold t specifies that persistent homology is computed only up to scale parameter t. For example, we can recreate all of the examples from Part 1 with the following commands.

House example on the distance matrix:

./ripser --format distance distance_matrices/house_distances.txt

House example on the point cloud:

./ripser --format point-cloud point_clouds/house_points.txt

Torus example, up to 2-dimensional homology:

./ripser --format point-cloud --dim 2 point_clouds/torus_points.txt

Sphere example, up to scale parameter 1.2:

./ripser --format point-cloud --dim 2 --threshold 1.2 point_clouds/sphere_points.txt

Cyclooctane example. Try increasing the distance threshold gradually and see if your computer can do better than Ripser in your browser:

./ripser --format point-cloud --dim 2 --threshold 1.3 point_clouds/cyclooctane_points.txt

Optical image patch example:

./ripser --format point-cloud --dim 1 --threshold 1.3 point_clouds/optical_k300_points.txt

Instead of just printing Ripser's output to the terminal, you can also save it to a text file. The below example also saves the output to the text file house_points_ripser_printed.txt.

./ripser --format point-cloud point_clouds/house_points.txt | tee -a house_points_ripser_printed.txt

Python

The remainder of this tutorial requires one to have Python and/or the Ripser software package installed.

Installing and running new code can be frustrating, especially if it is in a language (perhaps Python) that is unfamiliar to you. Nevertheless, we believe that it is extremely important for all practitioners of machine learning to have some exposure to Python. For this reason, the time you spend getting Python running on your machine is time well spent, even though this can feel like a frustrating investment of time.

If you don't yet have Python, and if you are a PC user, then we recommend installing Anaconda. If you don't yet have Python, and if you are a Mac user, then we recommend installing Python 2.7.15 from here. We expect the code to work with an existing version of Python that you may happen to have already.

Ripser with Python

Melissa McGuirl has written very nice code for using Ripser with Python, which is what we will use in this section. In particular, the file house_points_ripser_printed.txt that we saved in Part 1 is not in a format that is terribly easy to work with. Melissa's code reformats Ripser output in a convenient manner.

Alternatively, there is a Cython wrapper for Ripser available which might be more efficient and better for non-Linux machines. The wrapper is available here or here.

Ensure you are in the folder of data files mentioned above and copy the Ripser executable into that folder as suggested earlier. (Or, alternatively, make sure that Ripser is in your Python path, and then in line 45 of getBarCodes.py, change ./ripser to ripser). In your terminal, try running the following command.

python getBarCodes.py -i distance_matrices/ -o ripser_outputs/

This will take every distance matrix in the folder distance_matrices, compute the persistent homology barcodes for the Vietoris-Rips complex built on top of this metric space, and print the output barcodes to the folder ripser_outputs.

The following command then separates the Ripser output into barcode intervals separated by dimension.

python separateRipser.py -i ripser_outputs/ -o barcodes/

And the following command then plots the corresponding persistence diagrams in your current directory.

python plotpd.py -i barcodes/ -o ./

Melissa's code is written to work only with input metric space data in the form of a distance matrix, but one could edit it to also work with input metric space data in the form of a point cloud, for example.


(See also Part 1, Part 2, and Part 4.)