GC3Pie programming tutorials

Implementing scientific workflows with GC3Pie

This is the course material prepared for the “GC3Pie for Programmers” training, held at the University of Zurich for the first time on July 11-14, 2016. (The slides presented here are revised at each course re-run.)

The course aims at showing how to implement patterns commonly seen in scientific computational workflows using Python and GC3Pie, and provide users with enough knowledge of the tools available in GC3Pie to extend and adapt the examples provided.

Introduction to the training

A presentation of the training material and outline of the course. Probably not much useful unless you’re actually sitting in class.

Overview of GC3Pie use cases

A quick overview of the kind of computational use cases that GC3Pie can easily solve.

GC3Pie basics

The basics needed to write simple GC3Pie scripts: the minimal session-based script scaffolding, and the properties and features of the Application object.

Useful debugging commands

Recall a few GC3Pie utilities that are especially useful when debugging code.

Customizing command-line processing

How to set up command-line argument and option processing in GC3Pie’s SessionBasedScript

Application requirements

How to specify running requirements for Application tasks, e.g., how much memory is needed to run.

Application control and post-processing

How to check and react on the termination status of a GC3Pie Task/Application.

Introduction to workflows

A worked-out example of a many-step workflow.

Running tasks in a sequence

How to run tasks in sequence: basic usage of SequentialTaskCollection and StagedTaskCollection

Running tasks in parallel

How to run independent tasks in parallel: the ParallelTaskCollection

Automated construction of task dependency graphs

How to use the DependentTaskCollection for automated arrangement of tasks given their dependencies.

Dynamic and Unbounded Sequences of Tasks

How to construct SequentialTaskCollection classes that change the sequence of tasks while being run.

A bottom-up introduction to programming with GC3Pie

This is the course material made for the GC3Pie 2012 Training event held at the University of Zurich on October 1-2, 2012.

The presentation starts with low-level concepts (e.g., the Application and how to do manual task submission) and then gradually introduces more sophisticated tools (e.g., the SessionBasedScript and workflows).

This order of introducing concepts will likely appeal most to those already familiar with batch-computing and grid computing, as it provides an immediate map of the job submission and monitoring commands to GC3Pie equivalents.

Introduction to GC3Pie

Introduction to the software: what is GC3Pie, what is it for, and an overview of its features for writing high-throughput computing scripts.

Basic GC3Pie programming

The Application class, the smallest building block of GC3Pie. Introduction to the concept of Job, states of an application and to the Core class.

Application requirements

How to define extra requirements for an application, such as the minimum amount of memory it will use, the number of cores needed or the architecture of the CPUs.

Managing applications: the SessionBasedScript class

Introduction to the highest-level interface to build applications with GC3Pie, the SessionBasedScript. Information on how to create simple scripts that take care of the execution of your applications, from submission to getting back the final results.

The GC3Utils commands

Low-level tools to aid debugging the scripts.

Introduction to Workflows with GC3Pie

Using a practical example (the The “Warholize” Workflow Tutorial) we show how workflows are implemented with GC3Pie. The following slides will cover in more details the single steps needed to produce a complex workflow.

ParallelTaskCollection

Description of the ParallelTaskCollection class, used to run tasks in parallel.

StagedTaskCollection

Description of the StagedTaskCollection class, used to run a sequence of a fixed number of jobs.

SequentialTaskCollection

Description of the SequentialTaskCollection class, used to run a sequence of jobs that can be altered during runtime.

The “Warholize” Workflow Tutorial

In this tutorial we show how to use the GC3Pie libraries in order to build a command line script which runs a complex workflow with both parallelly- and sequentially-executing tasks.

The tutorial itself contains the complete source code of the application (see Literate Programming on Wikipedia), so that you will be able to test/modify it and produce a working warholize.py script by downloading the pylit.py:file: script from the PyLit Homepage and running the following command on the docs/programmers/tutorials/warholize/warholize.rst file, from within the source tree of GC3Pie:

$ ./pylit warholize.rst warholize.py

Example scripts

A collection of small example scripts highlighting different features of GC3Pie is available in the source distribution, in folder examples/:file:

gdemo_simple.py

Simplest script you can create. It only uses Application and Engine classes to create an application, submit it, check its status and retrieve its output.

grun.py

a SessionBasedScript that executes its argument as command. It can also run it multiple times by wrapping it in a ParallelTaskCollection or a SequentialTaskCollection, depending on a command line option. Useful for testing a configured resource.

gdemo_session.py

a simple SessionBasedScript that sums two values by customizing a SequentialTaskCollection.

warholize.py

an enhanced version of the warholize script proposed in the The “Warholize” Workflow Tutorial