Michael R. Crusoe, Niels Drost – Common Workflow Language Project, Netherlands eScience Center
Portable and Reproducible data analysis with the Common Workflow Language
The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. CWL is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, Physics, and Chemistry.
In this workshop we will get to work with the CWL, and during a hands-on session build a workflow using CWL. We will show how to run a workflow both locally on your laptop, and in HPC environment, without any changes to the workflow description itself.
Neil Chue Hong – Software Sustainability Institute
How successful is my software?
How can we tell if research software is on the right track? What measures are good indicators for scientific software success?
For instance, is community size a key measure? If it’s always rising does this mean the project is likely to succeed? Is code complexity overrated for research software? What does the state of practice in industry tell us? What does social buzz tell us? Is a heavily tweeted piece of software more likely to be successful? And how can we use this information to help developers adopt techniques that might lead to greater impact and success?
This interactive session will help developers and users of research software understand what techniques they can use to help make their software better and more successful, how to tell if they should make course corrections, and how to communicate software success (and failure) to their stakeholders.
Anna Brown – Oxford University
An introduction to GPU optimisation using the NVIDIA visual profiler
Graphical processing units (GPUs) are a vital HPC tool due to their high computing power and good energy efficiency compared to CPUs. However it is not straightforward to write performant GPU code due to certain requirements of the architecture, including management of the memory hierarchy, the need for high computational throughput and grouping of thread execution. To help understand these critical optimisation opportunities this workshop will introduce the NVIDIA Visual Profiler (NVVP), and through that basic optimisation strategies for GPU application developers.
This workshop is aimed at developers with beginning to intermediate level CUDA – please note basic CUDA syntax will not be taught. The exercises will be driven by use of NVVP; through information gained from it small edits to example code in one of C, C++ or Fortran will be made to illustrate the points above. Consider this workshop if you have written basic programs in CUDA and want a deeper understanding of your program and possibilities for its optimisation.
Jason Maassen – Netherlands eScience Center
Distributed Computing with Xenon
Many scientific applications require far more computation or data storage than can be handled on a regular PC or laptop. For such applications, access to remote storage and compute facilities is essential. Unfortunately, there is not a single standardized way to access such facilities. There are many competing protocols and tools in use by the various scientific and commercial infrastructure providers.
Xenon is a library designed to solve this problem. It offers a unified way to use many remote computation and data storage facilities, and hides the complicated protocol and tool specific details from the developer.
The tutorial will focus on using Xenon’s command line interface. In addition, we will briefly cover Xenon’s Java, Python, and REST APIs at the end of the tutorial.
Peter Hill – York Plasma Institute, University of York
100% Emacs: How To Do Everything In Emacs
As RSEs, we do a wide variety of things around text — editing code, writing proposals, configuring servers — and many of us like to tinker and customise our tools to our own preferences. Emacs is the perfect text editor for us, flexible enough to use for writing code, text documents and presentations, and powerful enough to turn into a full-blown IDE when needed. In this tutorial, you will learn how to get the most out of Emacs in your day-to-day work.
Andy Turner and Luke Mason – EPCC, The University of Edinburgh and Hartree Centre, STFC
RSE support for researchers on advanced computing facilities
In this workshop we will discuss the role of RSE’s in supporting researchers using advanced computing facilities both remote (e.g. ARCHER, DiRAC, EPSRC Tier-2) and local (e.g. institutional HPC) to the RSE/researcher. This discussion should be of interest to any RSE’s who support researchers on shared advanced computing facilities (e.g. HPC facilities, private/public cloud).
We will also discuss the Champions initiative (ARCHER Champions and EPSRC Tier-2 Champions): how they are working to share experience and best practice; how to get involved; and how the community can get the most out of this initiative.
Discussion will be driven by the interests of the group but examples of topics may include:
* Sharing, discovering and accessing expertise outwith your local organisation
* Sustaining support within different team sizes and for the long tail of software support
* Coordinating support and technical work across different organisations and facilities
* Accessing remote facilities to help support local researchers
* Discovering what remote facilities are available and how to get access
* Identifying key future technologies and areas of interest
There will be an short initial presentation to set the scene and introduce work that is currently ongoing in the Champions initiative in this area before moving on to a discussion with the items prioritised by the workshop attendees.
A workshop report and blog post will be produced covering the discussion and highlighting the points of interest.
The workshop will be facilitated by representatives from ARCHER, the UK national supercomputing service, the EPSRC Tier-2 HPC facilities and current Champions.
Sergi Siso – STFC – Hartree Centre
Open source tools for the performance analysis of parallel applications.
A common requirement among research software is to maximize the achieved performance and make use of the capabilities offered by modern hardware. This is important to tackle larger problems in the specific science domain they are working and/or reduce the amount of money spend in the compute resources. However, it is often difficult to know the exact causes that produce suboptimal performance or even to assess if the obtained speeds are near the attainable peak performance. It is a good practise to use tools to simplify this task and get a better understanding of the research application.
This tutorial presents two open source tools to help research software developers analyse the performance characteristics of their applications. Firstly, it shows the Linux ‘perf’ tools (available on modern Linux distributions) to obtain micro-architectural characteristics like hardware counters or memory accesses. Then, the tutorial will introduce the BSC Extrae and Paraver tools to understand the parallelism of the code using MPI and OpenMP programming models.
Both tools will be used by attendees on several hand-on exercises with pre-computed traces generated in a large HPC machine, this will allow the attendees to focus on analysing the traces and spot the potential issues.
Christopher Cave-Ayland – University of Southampton
An Introduction to Sumatra: A Package for Automated Provenance Tracking of Computational Data
A broadly experienced challenge in computational research is the capture and provision of simulation metadata. This workshop will provide an overview and hands-on introduction to Sumatra (pythonhosted.org/Sumatra/), a Python tool supporting reproducible computational research through automated provenance tracking of simulation data. Sumatra is agnostic with respect to research software and provides capabilities to track the provenance of research outputs fully: from the generation of raw simulation data, through all stages of analysis and including resulting figures and documents. As well as a standard command line interface, Sumatra supports extensive customisation and adaptability through a Python API.
Tania Allard – University of Sheffield
Jupyter Notebooks for reproducible research
In recent years, the concern for reproducibility in computational science has gained traction, leading to the development of better standards for research. These standards include treating software as a central intellectual product, adding automation to the core of the data handling and analysis pipelines as well as the open sharing of the digital objects generated.
Jupyter notebooks have become an invaluable tool for many kinds of data analysis and have become one of the preferred tools of data scientists. This is due, in part, to their high versatility with the availability of over 80 kernels which provide access to different programming languages and systems. They also provide the ability to develop notebooks to a high publication standard. Thus, Jupyter notebooks constitute a bastion for reproducible research as they not only enable the display of the final results of a scientific analysis but also allow for the presentation of the analysis pipeline used to obtain these results.
This workshop will provide an introduction to some tools and techniques that we consider essential for responsible use and development of Jupyter notebooks. The workshop will be centered around the use of Jupyter notebooks for reproducible research purposes: from automated data analysis pipelines to the generation and publication of digital objects.
Critically, we will introduce two technologies recently developed as part of the EU OpenDreamKit project, nbval and nbdime, which bring unit testing and version control to the Jupyter notebook ecosystem.
Radovan Bast – Nordic e-Infrastructure Collaboration
Mixed Martial Arts with CodeRefinery
Interfacing Fortran, C, C++, and Python for Great Good!
The programming languages Fortran, C, C++, and Python each have their strengths and weaknesses and their own fan base. This workshop is for people who would like to be able to combine these languages within one code project:
1) When writing a high-level abstraction layer or interface to a “bare metal” legacy software written for instance in Fortran or C.
2) When writing an efficient back-end to a code mainly written in a high-level language such as Python.
3) When combining modules written in different programming languages.
4) When writing a Python interface to a software in C or C++ or Fortran to leverage the wealth of libraries available in Python.
In this hands-on tutorial we will learn and exercise how to interface Fortran, C, C++, and Python using modern tools such as iso_c_binding, Python CFFI or pybind11.
Matt Williams – University of Bristol
Introduction to Data Analysis in Python
The workshop will give a basic introduction to the numerical and data analysis tools available in Python, particularly focusing on Pandas for time-series analysis, NumPy for fast numerical calculations on large data sets and Matplotlib for publication-quality graphs and plots. It will also introduce the Jupyter Notebook as a way of interleaving prose with your code as a form of ‘literate programming’.