Radiopadre: remote, interactive, zero-admin visualization of data pipeline products
2020-11-09, 18:30–19:00, Times in UTC

Modern [not only radio] astronomers are coming to terms with being separated from their data and pipelines. Sheer data size alone dictates that data reductions are hardly ever “local” in any sense, but rather have to run on a big node or cluster somewhere remote, with SSH gateways and network latency in between. The new work patterns of the covid-19 pandemic only exacerbate this separation. At the same time, the complexity of new telescopes and pipelines results in a far greater volume and variety of intermediate diagnostics and final data products. The following scenario is becoming familiar: my pipeline run has finished (or crashed), it’s produced 300 log files, 200 intermediate plots, 50 FITS images, and a dozen HTML reports -- on a remote cluster node, which doesn’t even have a basic image viewer installed (and which network lag would have made difficult to use in any case). How do I make sense of all this, without transferring gigabytes of products to my laptop or local workstation first?

Radiopadre (Python Astronomy Data Reductions Examiner, https://github.com/ratt-ru/radiopadre) provides (at least part of) the answer. It is a combination of a client-side script, Docker or Singularity images, a Jupyter Notebook framework, and integrated browser-based FITS viewers (CARTA and JS9) which allows for quick visualization of remote data products. Radiopadre is virtually zero-admin, in the sense that it requires nothing more than a web browser on the client side, an SSH connection, and Docker or Singularity support on the remote end. It allows for both interactive (exploratory) visualization via a Jupyter Notebook, as well as the development of rich, extensive report-style notebooks tuned to the outputs of a particular pipeline.

The demo will showcase the interactive visualization capabilities of Radiopadre, using the output of various MeerKAT imaging pipelines as a working example.


Theme – Science Platforms and Data Lakes, Data Processing Pipelines and Science-Ready Data