import numpy as np
import scipy as sp  # Subpackages still need to be imported individually
import matplotlib.pyplot as plt
from matplotlib import cm
import tables
import h5py
# For website
%matplotlib inline

Python: the language for effective scientific computing

Ross Barnowski rossbar@berkeley.edu | @rossbar on GitHub

DOE Python Exchange | 12/01/2021

The Question: Why Python

  • My personal take

  • Looking ahead…

A bit about me…

I began my research career in the national lab system:

  • Summer 2008: SULI internship at Argonne National Laboratory

  • Summer 2009: Summer internship with the Nuclear Data Group at LLNL

    • First exposure to Python

  • 2010-2019: Grad school -> post-doc -> research scientist & lecturer

    • UC Berkeley nuclear engineering department & the Applied Nuclear Physics group at LBNL

I’m a nuclear engineer by training, specializing in radiation instrumentation; particularly gamma-ray spectroscopy and imaging.

Open-source software, specifically scientific Python, was the key component in my research projects.

Real-time 3D gamma-ray mapping

  • Scene data fusion (SDF): computer vision + gamma-ray imaging

  • Applications in e.g. nuclear contamination remediation

Fukushima Cs-137 distribution reconstruction, rear parking lot

Right: Aerial view of measurement area. Left: 3D gamma-ray image reconstruction — Red line = path of imager through the scene, B/W dots = 3D point cloud model of the scene, Heatmap = image reconstruction; color scale corresponds to relative reconstructed intensity.

Volumetric gamma-ray imaging

  • Imaging modalities for e.g. nuclear safeguards or small-animal imaging

Nearfield tomographic reconstruction of Cs-137 line source

Demonstration of Compton imaging for applications in small animal molecular imaging. Left: A Cs-137 calibration source in the shape of a thin rod mounted on a rotating stage. Right: Three projections of reconstructed image from a tomographic measurement.

My background

  • Systems integration

    • Interfacing with hardware, data acquisition systems

    • Synchronizing & fusing data from disparate sources

  • Real-time acquisition and analysis

    • Distributed computing: performance, load-balancing

  • Workstation-scale computing

    • Mobile (hand-held or cart-based) measurement systems

    • Datasets usually O(GB) - not big data

On to the question: Why Python?

  • General-purpose language: can address a wide range of problems & computational tasks

  • Optimizes developer time

  • Effective language for the communication of scientific ideas

  • Community-based development model!

General purpose programming language

[This is] a distinguishing feature of Python for science and one of the reasons why it has been so successful in the realm of data science: instead of adding general features to a language designed for numerical and scientific computing, here scientific features are added to a general-purpose language. This broadens the scope of problems that can be addressed easily, expands the sources of data that are readily accessible and increases the size of the community that develops code for the platform.

Scipy 1.0: fundamental algorithms for scientific computing in Python

* Emphasis mine

See also: Jim Hugunin’s position paper laying out the motivation and design principles for Numeric (ancestor of NumPy) in 1995(!)

Integrating with other languages

  • Performance - e.g. writing performance-critical chunks in low-level languages

  • Extending - adding custom algorithms or data structures (e.g. ndarray)

  • Wrapping - providing convenient, high-level interfaces to existing low-level code

    • Common in e.g. scipy (BLAS, Eigen, etc.)

    • Legacy code and software from vendors, e.g. commercial hardware

A fundamental feature of Python, and many additional tools to fit various use-cases:

  • Cython, Pythran, numba, PyBind11, …

#include <Python.h>
#include "numpy/arrayobject.h" // provides the numpy C API

...  // Defining all the wrapper functions

static PyMethodDef SISMethods[] = {
    {"connectToDAQ", (PyCFunction)wrap_connectToDAQ, METH_VARARGS, "Connect to SIS3150"},
    {"configuration", (PyCFunction)wrap_configuration, METH_VARARGS, "Configure SIS3302"},
    {"startacquisition", (PyCFunction)wrap_startacquisition, METH_VARARGS, "Start SIS3302"},
    {"stopacquisition", (PyCFunction)wrap_stopacquisition, METH_VARARGS, "Stop SIS3302"},
    {"acquiredata", (PyCFunction)wrap_acquiredata, METH_VARARGS, "Acquire data from SIS3302"},
    {"acquireDataWithRaw", (PyCFunction)wrap_acquireDataWithRaw, METH_VARARGS, "Acquire edata and rdata from SIS3302"},
    {NULL,NULL} 
};

PyMODINIT_FUNC initsis(void) { // must be init<modulename> (init_cext => _cext)
    (void) Py_InitModule("sis", SISMethods);
    import_array(); // load numpy (effectively "import numpy") for use in this module 
};

Original wrapping work by Cameron Bates (@crbates), now at LANL, whom I credit with convincing me of the value of the Python C/API!

import time
from SIS import sis

class SISDAQThread:
    def __init__(self, config_file):
        self.hardware_started = False
        self.paused = True  # state controlled via GUI button press
        # Initialize hardware communication
        sis.connectToDAQ()
        sis.configuration(config_file)

    def run(self):
        if not self.hardware_started and not self.paused:
            self.start_hardware()

        if self.hardware_started and not self.paused:
            timestamps, energies, channel_nos, trigger_values = sis.acquiredata()

        # Validation + send to other processes for analysis

    def start_hardware(self):
        sis.startacquisition()
        self.start_time = time.time()
        self.hardware_started = True
  • Incorporate into scripts, local GUIs, Web-UI

  • Instantly gained platform-independence

Optimize Developer time

Syntactically, Python code looks like executable pseudo code. Program development using Python is 5-10 times faster than using C/C++… Often, [a prototype program] is sufficiently functional and performs well enough to be delivered as the final product, saving considerable development time.

GVR, Glue It All Together With Python

  • Not to mention the time it takes to become proficient in the language!

  • Powerful operations via simple, expressive syntax

  • Incremental approach to data analysis

    • Identify and address bottlenecks sequentially

    • Simple path(s) for scaling up to larger problems

Example: Digital signal processing for gamma-ray spectroscopy

with tables.open_file("_data/digitized_preamp_signals.h5") as hf:
    signal = hf.root.signals[0, ...]

fig, ax = plt.subplots()
ax.plot(signal)
ax.set_title("Digitzed raw signal from a radiation spectrometer")
ax.set_ylabel("Amplitude (ADC units [arbitrary])")
ax.set_xlabel("Sample # $10 \frac{ns}{sample}$");
_images/presentation_36_0.png
  • Signal amplitude reflects total deposited energy

Time-domain analysis for trapezoidal signal shaping

Algorithm described in Digital synthesis of pulse shapes in real time for high resolution radiation spectroscopy by Valentin Jordanov (pdf link)

  • Simple operations: multiplication, addition, accumulation, delays (Z-n)

Select values for the parameters:

k = 450  # "Peaking time", in sample units (i.e. 4.5 microseconds)
m = 60  # "Gap time", in sample units (i.e. 600 nanoseconds)
M = 3600  # Estimate of exponential decay constant in sample units

Implement signal delay with slicing:

s = signal[:-(2*k+m)]
sk = signal[k:-(m+k)]
skm = signal[k+m:-k]
s2km = signal[2*k+m:]

Apply shaper:

S1 = ((s - sk) + (s2km - skm)).astype(np.int64)
S2 = M * S1 + np.cumsum(S1)
shaped = np.cumsum(S2)

A little cleanup:

# Pad result for time-alignment with input signal
shaped = np.hstack((np.zeros(2*k+m), shaped))

# Gain compensation
shaped /= M*k

How’d we do?

fig, ax = plt.subplots()
ax.plot(signal, label="Input signal")
ax.plot(shaped, label="Shaper output")
ax.set_title("Digitzed raw signal and trapezoidal filter output")
ax.set_ylabel("Amplitude (ADC units [arbitrary])")
ax.set_xlabel("Sample # $10 \frac{ns}{sample}$")
ax.legend();
_images/presentation_48_0.png

Scaling up

Scaling the analysis up to multiple signals is straightforward thanks to broadcasting:

def trapezoidal_shaper(signals, k, m, M):
    signals = np.atleast_2d(signals)  # Each row represents a single measurement
    # Apply delays to all signals
    s = signals[..., :-(2*k+m)]
    sk = signals[..., k:-(m+k)]
    skm = signals[..., k+m:-k]
    s2km = signals[..., 2*k+m:]
    # Apply shaper operations along appropriate axis
    S1 = ((s - sk) + (s2km - skm)).astype(np.int64)
    S2 = M * S1 + np.cumsum(S1, axis=1)
    shaped = np.cumsum(S2, axis=1)
    # Time-alignment and gain correction
    shaped = np.hstack((np.zeros((signals.shape[0], 2*k+m)), shaped))
    shaped /= M * k
    return shaped
# Load
with tables.open_file("_data/digitized_preamp_signals.h5", "r") as hf:
    print(f"Total number of signals in file: {hf.root.signals.shape[0]}")
    signals = hf.root.signals.read()

# Analyze
shaped = trapezoidal_shaper(signals, k, m, M)
Total number of signals in file: 1104
# Visualize
fig, ax = plt.subplots()
ax.plot(signals[:10].T)
ax.plot(shaped[:10].T)
ax.set_title("Digitzed raw signal from a radiation spectrometer")
ax.set_ylabel("Amplitude (ADC units [arbitrary])")
ax.set_xlabel("Sample # $10 \frac{ns}{sample}$");
_images/presentation_52_0.png

Takeaways

  • Implementation of analysis is very near the original algorithm

  • Incremental approach: explore/test with subset, easily scale up

  • What about scaling further? Purpose-built tooling that interoperates with arrays!

    • Big data - consider dask.

    • Performance bottlenecks - will GPU’s help? Consider cupy.

A quick aside on interoperability

import dask.array as da

# Data still on disk
data = h5py.File("_data/digitized_preamp_signals.h5")["/signals"]

chunked_signals = da.from_array(data, chunks=(276, 4096))
chunked_signals
Array Chunk
Bytes 17.25 MiB 4.31 MiB
Shape (1104, 4096) (276, 4096)
Count 5 Tasks 4 Chunks
Type int32 numpy.ndarray
4096 1104
# Note: Calling a function full of *numpy* functions on a *dask array*
shaped = trapezoidal_shaper(chunked_signals, k, m, M)
shaped
Array Chunk
Bytes 34.50 MiB 6.60 MiB
Shape (1104, 4096) (276, 3136)
Count 98 Tasks 8 Chunks
Type float64 numpy.ndarray
4096 1104

Our analysis works on a new, non-numpy array object out-of-the-box! No changes/explicit conversions required!

This works because Dask implements protocols provided by NumPy for interoperability with array objects. For more information, see e.g. NEP 18, which describes the __array_function__ protocol.

An effective language for communicating STEM concepts

List-mode maximum likelihood expectation maximization

An equation for list-mode maximum likelihood expectation maximization (MLEM)

Or, in NumPy:

def compute_em_iteration(λ, α, s):
    term_one = 1 / (α @ λ)
    term_two = α.T @ term_one
    return (λ / s) * term_two
  • and of course, it’s executable!

Example: Gamma-ray image reconstruction

Compton imaging is a specific modality of gamma-ray imaging

  • Based on the physics of gamma-ray scattering

Some fun quirks of this modality

  • Each photon interaction results in a cone in the imaging space

  • No lensing or collimation, inherently wide field-of-view

We’ll use a simulated dataset that contains 100 photon interactions from a gamma-ray source directly in front of a Compton camera

import scipy.sparse
fname = "_data/system_matrix_100cones_4piSpace_simulated_5degConeOpenAngle.h5"
img_shape = (181, 361)  # 4-pi imaging
with tables.open_file(fname, "r") as hf:
    system_matrix = sp.sparse.csr_matrix(hf.root.sysmat.read())
system_matrix
<100x65341 sparse matrix of type '<class 'numpy.float32'>'
	with 2378244 stored elements in Compressed Sparse Row format>

Let’s take a look at a few of the cones to get a sense for what the image looks like:

subset = np.ravel(system_matrix[:10].sum(axis=0))  # Sum of 10 backprojected cones

# Visualize
fig, ax = plt.subplots()
ax.imshow(subset.reshape(img_shape), cmap=cm.plasma)
ax.set_title("Backprojection of 10 Compton cones")
ax.set_xlabel("Azimuthal angle $\phi (^{\circ})$")
ax.set_ylabel("Polar angle $\\theta (^{\circ})$");
_images/presentation_68_0.png

Now let’s try our ML-EM reconstruction technique:

# Use the backprojection to initialize the reconstruction
img = np.ravel(system_matrix.sum(axis=0))
initial_img = img.copy()  # Pre-reconstruction, for comparison
sensitivity = 1.0  # Ignore sensitivity for this simple example
n_iter = 10

for _ in range(n_iter):
    img = compute_em_iteration(img, system_matrix, sensitivity)

A quick qualitative comparison:

fig, ax = plt.subplots(1, 2, figsize=(16, 4))

for a, im, t in zip(ax, (initial_img, img), ("Backprojection", "Reconstruction")):
    a.imshow(im.reshape(img_shape), cmap=cm.plasma)
    a.set_title(t)
fig.tight_layout()
_images/presentation_72_0.png

Takeaways

  • Scientific Python is excellent for communicating scientific ideas

    • Minimal language cruft

    • “Executable pseudo-code”

  • Reproducibility and extensibility

    • Readability is important!

Community

The secret-sauce of scientific Python

  • The user-contributor model

    • User input inherently a top priority

    • Strong connection between development and usage

  • Coherence

    • Projects built from or extend from same fundamental elements (e.g. ndarray)

    • Interoperablity!

    • Similar development practices amongst projects

Scientific Python Ecosystem

Scientific Python Ecosystem

  • Projects built from or extend from same fundamental elements (e.g. ndarray)

  • Interoperablity!

  • Similar development practices

  • Sustainability?

A Healthy Ecosystem

An initiative to ensure the sustainable growth of the ecosystem moving forward:

https://scientific-python.org/

Discussion

Cross-project discussion forum: https://discuss.scientific-python.org/

Coordination

Mechanism for project coordination and loosely guiding ecosystem policy: Scientific Python Ecosystem Coordination documents (SPECs).

Growth

Getting the best tools into the hands of the most users!

The National Labs

The national labs have had a long history of developing and supporting open-source scientific Python

  • Numerical Python (aka Numeric, predecessor to NumPy) - LLNL 90’s-00’s

  • NetworkX got it’s start at LANL

  • Many, many others; including, but not limited to, Python (LFortran)

Another example: the SuperLU sparse matrix factorization package.

Benefits users and the original researchers!

Removing barriers to contributing

Engaging with the wider scientific Python community: https://discuss.scientific-python.org/

Not everything belongs in scipy:

Thank you!

Ideas? Feedback? Want to get involved in NumPy, NetworkX? Publish your own project? Please don’t hesitate to contact me: rossbar@berkeley.edu or ping me on GitHub @rossbar!