Wolf, M. et al., 2021 IEEE International Conference on Cluster Computing

Reusability First: Toward FAIR Workflows

Matthew Wolf, Jeremy Logan, Kshitij Mehta, Daniel Jacobson, Mikaela Cashman, Angelica M. Walker, Greg Eishenhauer, Patrick Widener and Ashley Cliff
26-Septembrer--2021, 2021 IEEE International Conference on Cluster Computing (CLUSTER) Vol 2: pp.  444 – 455 https://doi.org/10.1109/Cluster48925.2021.00053

Abstract

The FAIR principles of open science (Findable, Accessible, Interoperable, and Reusable) have had transformative effects on modern large-scale computational science. In particular, they have encouraged more open access to and use of data, an important consideration as collaboration among teams of researchers accelerates and the use of workflows by those teams to solve problems increases. How best to apply the FAIR principles to workflows themselves, and software more generally, is not yet well understood. We argue that the software engineering concept of technical debt management provides a useful guide for application of those principles to workflows, and in particular that it implies reusability should be considered as ‘first among equals’. Moreover, our approach recognizes a continuum of reusability where we can make explicit and selectable the tradeoffs required in workflows for both their users and developers.To this end, we propose a new abstraction approach for reusable workflows, with demonstrations for both synthetic workloads and real-world computational biology workflows. Through application of novel systems and tools that are based on this abstraction, these experimental workflows are refactored to rightsize the granularity of workflow components to efficiently fill the gap between end-user simplicity and general customizability. Our work makes it easier to selectively reason about and automate the connections between trade-offs across user and developer concerns when exposing degrees of freedom for reuse. Additionally, by exposing fine-grained reusability abstractions we enable performance optimizations, as we demonstrate on both institutional-scale and leadership-class HPC resources.

Citation

M. Wolf, et al., “Reusability First: Toward FAIR Workflows,” in 2021 IEEE International Conference on Cluster Computing (CLUSTER), Portland, OR, USA, 2021 pp. 444-455. doi: 10.1109/Cluster48925.2021.00053

Outside Links

https://www.computer.org/csdl/proceedings-article/cluster/2021/966600a444/1xFuT3MJOdq