Towards Workflow Ecosystems Through Standard Representations

This page represents a Research Object containing different additional materials for a paper under revision in the WORKS14 workshop. The purpose of this web page is to provide a summary of the paper, support links and short descriptions of the contents used as input and generated as output of the described work. A copy of the paper will be made available here once the workshop notifies the authors.

Abstract

Workflows are increasingly used to manage and share scientific computations and methods. Workflow tools can be used to design, validate, execute and visualize scientific workflows and their execution results. Other tools manage workflow libraries or mine their contents. There has been a lot of recent work on workflow system integration as well as common workflow interlinguas, but the interoperability among workflow systems remains a challenge. Ideally, these tools would form a workflow ecosystem such that it should be possible to create a workflow with a tool, execute it with another, visualize it with another, and use yet another tool to mine a repository of such workflows or their executions. In this paper, we describe our approach to create a workflow ecosystem through the use of standard models for provenance (OPM and W3C PROV) and extensions (P-PLAN and OPMW) to represent workflows. We introduce WEST (Workflow Ecosystems through STandards), a workflow ecosystem that integrates different workflow tools with diverse functions (workflow generation, execution, browsing, mining, and visualization) created by a variety of research groups. This is, to our knowledge, the first time that such a variety of functions and systems are integrated.

Description of the tools in the ecosystem

The following figure shows an overview of the WEST ecosystem:

WEST Ecosystem

The different tools that WEST integrates are represented in ellipses, while the workflow repositories are represented with rounded boxes with thick border. The small rectangular boxes depict the converters of the internal vocabularies used within the individual tools into the standard representations used by WEST. Converters planned but not yet implemented are indicated with dashed lines. Lightweight converters adapting a vocabulary to other extensions of the standards are shown in small rounded boxes (their respective tools are represented as dotted ellipses). The directionality of the arrows between the different tools indicates what tools produce and consume the workflows exchanged across tools in the ecosystem.

The figure also indicates all the workflow tools that we have integrated in the WEST ecosystem to date. They represent a wide variety of functions:

  • Workflow Generation: WEST integrates the WINGS workflow generation tool. Users create workflow templates, which WINGS can specialize to generate workflow instances. WINGS can submit the workflow instances for execution by different workflow mapping and execution engines.
  • Workflow Mapping and Execution: WEST includes three workflow execution engines: Pegasus, Apache OODT, and LONI Pipeline. These systems map the workflow tasks to available execution resources, and then manage their execution. It would be easy to include other workflow execution engines, since many of them use OPM and are beginning to use PROV.
  • Workflow Mining: WEST includes the FragFlow system for workflow mining, which includes several algorithms for extracting common workflow fragments from repositories of workflow templates and workflow executions.
  • Workflow Visualization: WEST includes the Prov-o-viz tool for visualizing provenance structures expressed in the W3C PROV standard.
  • Workflow Browsing: WEST uses a Workflow Explorer tool, WExp, which allows for exploring different workflow templates, their metadata and their workflow execution results.
  • Workflow Documentation: WEST includes the Organic Data Science Wiki, an extension of semantic wikis designed to develop meta-workflows that result in many workflow explorations and runs.
  • Workflow Storage and Sharing: WEST has a workflow repository that includes workflow templates, workflow instances, and workflow executions. This is a public repository , implemented using Virtuoso, and is populated by WINGS. All workflows are Web objects that are openly accessible to any application that queries the repository.

Abstraction heterogeneity in WEST

Different applications of the workflow environment have different needs. For example, mining and presentation applications typically care for workflow templates or workflow executions and their provenance, while execution engines need the workflow instances for their execution. Here we provide the link to the queries that have been used to illustrate abstraction heterogeneity in WEST (available here and also as a FigShare resource). A step by step example explaining all the queries can be seen on the Section 4.4 of the paper.

About the authors

Daniel Garijo Daniel Garijo is a PhD student in the Ontology Engineering Group at the Artificial Intelligence Department of the Computer Science Faculty of Universidad Politécnica de Madrid. His research activities focus on e-Science and the Semantic web, specifically on how to increase the understandability of scientific workflows using provenance, metadata, intermediate results and Linked Data.
Yolanda Gil Yolanda Gil Yolanda Gil is Director of Knowledge Technologies and at the Information Sciences Institute of the University of Southern California, and Research Professor in the Computer Science Department. Her research interests include intelligent user interfaces, social knowledge collection, provenance and assessment of trust, and knowledge management in science. Her most recent work focuses on intelligent workflow systems to support collaborative data analytics at scale.
Oscar Corcho Oscar Corcho is an Associate Professor at Departamento de Inteligencia Artificial (Facultad de Informática , Universidad Politécnica de Madrid) , and he belongs to the Ontology Engineering Group. His research activities are focused on Semantic e-Science and Real World Internet. In these areas, he has participated in a number of EU projects (Wf4Ever, PlanetData, SemsorGrid4Env, ADMIRE, OntoGrid, Esperonto, Knowledge Web and OntoWeb), Spanish Research and Development projects (CENITS mIO!, España Virtual and Buscamedia, myBigData, GeoBuddies), and has also participated in privately-funded projects like ICPS (International Classification of Patient Safety), funded by the World Health Organisation, and HALO, funded by Vulcan Inc.