Select the poster title to display its abstract.
DAFNI: Using Agile to create a Large Scale Research Facility
James Hannah, Mostafa Nabi, Samuel Chorlton
The Data Analytics Facility for National Infrastructure (DAFNI) is an £8M research initiative focusing on providing a step-change for national infrastructure research. The Science Technology and Facilities Council (STFC) aims to bring researchers together on a platform where they can run their own models – faster, and with simpler mechanisms than available at their own institutions. To increase opportunities for collaboration and innovation researchers will be able to share to and access a central model/data repository.
DAFNI is using agile development to ensure software quality and track the progress of the project, alongside modern software engineering processes to ensure robustness of delivery. Agile is a relatively new approach for research organisations managing large software projects and as such we have encountered challenges interoperating with existing management processes.
In the talk we’ll describe our experiences in using Agile methodologies, the software topology and the delivery toolkit we've chosen to help us on this journey (Atlassian, GitLab, VSCode, etc.). We'll also talk about our ideas on how to manage a large project such as this, and how we've performed so far through the requirements and design phases.
Developing Software for the Statistical Genetics Research Community
Technology is quickly advancing to gather human DNA data, as well as other biological data, providing an ever increasing amount of data. This brings new opportunities to discover relationships between genetics, biological functions, human characteristics and disease, which may lead to other applications such as drug development. The amount of data can be enormous and very rich in structure thus requiring new statistical analysis methods and software to exploit the data as much as possible. This requires collaboration between statisticians and software engineers. The statistician must understand the practicalities of analysing the data, such as processing speed, memory requirements and data anomalies. The software engineer must understand the statistical tests and how to implement them correctly as well as identifying issues and suggesting adaptions to the statistical methods where necessary. I present a summary of the software I have developed taking on the roles of both statistician and software engineer. All the software is freely available and open source and it is aimed at researchers wishing to analysis their genetic data who may not be experts in statistics, genetics or software development.
Testing and Auto-Tuning GPU code with Kernel Tuner
Ben van Werkhoven
(Netherlands eScience Center)
To create highly-efficient GPU programs you have to select exactly the right code parameters, such as the number of threads per block, the amount of work per thread, how much shared memory to use, and many more. Finding the right values for all of these by hand is very labor intensive. Auto-tuning automates the process of searching for the best performing combination of all these values, and is one of the few ways to achieve performance portability. So far auto-tuning has still to become mainstream. Codes that do use auto-tuning often involve application-specific code generators and custom code for benchmarking all possible configurations.
With Kernel Tuner, programmers can create simple Python scripts that specify: where the code is, how it should be called, and which tunable parameters the code supports. Kernel Tuner automates the search for the best performing kernel configuration, with support for many search optimization methods. Kernel Tuner also allows to write tests for GPU kernels from Python in a high-level and user-friendly interface.
Kernel Tuner is developed open source, please visit our GitHub repository with extensive documentation, many examples, and tutorials:
The FAIR data principles and their practical implementation in InterMine
Daniela Butano, Justin Clark-Casey
(University of Cambridge)
Science is generating ever more data, faster than ever before. Reliably storing and retrieving this data isn’t enough – a great portion of its value comes from integration between datasets produced by different institutions at varying times and places. The required scale and automation of integration is increasing, in part driven by new applications in fields such as machine learning. Yet datasets are often spread out over a long tail of repositories, with inconsistent description, structure and APIs.
These are the issues that the FAIR principles address. They aim to guide system design that makes data consistently Findable, Accessible, Interoperable and Reusable. The principles are gaining traction, especially in the life sciences where they have been embedded into funding programmes such as the European Open Science Cloud and the NIH Data Commons.
In this talk, we’ll introduce the FAIR principles and talk about their practical implementation and associated issues, as illustrated by our work with the InterMine life sciences data integration platform. We’ll cover topics such as the design of persistent data entity URIs, standards for embedding data descriptions into web pages, and semantic web Linked Data support.
Parsl: Scripting Scalable Workflows in Python
Yadu Babuji, Kyle Chard, Ian Foster, Daniel S. Katz, Mike Wilde, Anna E. Woodard, Justin M. Wozniak
(University of Illinois Urbana-Champaign, University of Chicago, Argonne National Laboratory)
Many researchers use Python for interactive data science, machine learning, and online computing. However, computations that are simple to perform at small scales (e.g., on a laptop) can easily become prohibitively difficult as data sizes and analysis complexity grows. For example, efficient interactive analysis at scale can require real-time management of parallel and/or cloud computing resources, orchestration of remote task execution, and data staging across wide area networks. In this talk we introduce Parsl (Parallel Scripting Library), a pure Python library for orchestrating the concurrent and parallel execution of interactive and many-task workloads, and demonstrate how it integrates with the scientific Python ecosystem and how it is being used in a variety of scientific domains.
Defining Policies to Turn a Team and Project Around
Jason M. Gates
(Sandia National Laboratories)
What kind of workflow should we use—master/develop, feature branches, some combination of the two? How should we document our code—is Doxygen the best option, or should we investigate Sphinx? What about a coding style guide—CamelCase, pot_hole_case, mixedCase? How can one determine what's best when there are so many options to choose from? Unfortunately in the realm of software engineering best practices there are no right answers to these questions. While there are some general guidelines—use some sort of version control, do some sort of testing, do some sort of documentation, use some sort of issue tracker—the only real "best" practice is that a team must clearly define its policies, whatever they may be, and then stick to them. This talk will follow one team's efforts to do just that—when needing to coordinate multiple repositories and a distributed development team comprised of subteams—and the impact it's had on the last year of software development. Don't let the plethora of options leave you stymied in indecision. It's easy to think, "We'll get to that when the code is more mature." Make choices today that will get your project and team headed in the right direction.
Where's The Friendly Manual? Experiences Implementing Software Sustainability
I just landed responsibility for this code – now what? The purpose of this talk is to offer advice and share my experiences taking over an established software project which had been run by a single developer for many years. I used the Criteria-based Software Evaluation tool from the Software Sustainability Institute to evaluate the project and improve the sustainability of both the software and the project.
This talk should give the audience an overview of the criteria to judge the sustainability of their own software projects and an introduction to a number of tools which can help to put the best practices in to place.
Supporting community of GAP developers and package authors
(University of St Andrews)
GAP (https://www.gap-system.org/) is a open source software system for discrete computational mathematics. Its development has been started in 1986 in LDFM Aachen, and by now it became an international project with four official centres in St Andrews, Aachen, Braunschweig and Fort Collins, and other contributors worldwide. GAP consists of a kernel written in C, a library written in the GAP language and implementing a number of mathematical algorithms, and a large collection of data libraries. GAP is redistributed with user contributed extensions, called GAP packages, which have independent authorship and may be submitted for a formal refereeing.
In my talk I will look into challenges of community building aspects of GAP: bringing together mathematicians and computer sciences; developers of the core system and package authors; version control experts and absolute beginners, and working on achieving culture changes in this community.
Refactoring and development of tools for the Surface Site Interaction Point approach to solvation.
M. D. Driver, M. J. Williamson, N. De Mitri, T. Nikolov, C. A. Hunter
(Department of Chemistry, University of Cambridge)
Understanding solvation is important in many areas from drug design to shampoos. Determining solubility is generally based on an experimental approach, but as computational power increases, theoretical approaches to predict these properties are becoming more commonplace and are maturing in their ability and predictive quality.
The Surface Site Interaction Point (SSIP) approach makes use of critical points on the molecular electrostatic potential surface (MEPS) of a molecule to a collection of SSIPs. One application of this model is the generation of a Functional Group Interaction Profile (FGIP) for a solvent. The FGIP shows the change in behaviour of solute-solute interactions within a solvent graphically.
The original computational workflow associated with the calculation was fragile and linguistically diverse. Refactoring of this legacy code has provided a solid foundation for later novel functionality to be discussed here. Concurrent development by multiple members in the group has been possible using revision control with the use of unit testing frameworks and continuous integration to ensure reliability of the new code.
Interoperable software for reproducible research
(The University of Bristol)
In most research communities there is not a single, unified software framework. Instead, researchers are presented with a collection of competing packages from which they pick and choose the functionality that they desire. Interoperability between packages is often poor, with incompatabilities between file formats, hardware, etc. This leads to brittle workflows, poor reproducibility, and lock in to specific packages.
For the biomolecular simulation community, our solution has been the introduction of an interoperable framework that collects together the core functionality of many packages and exposes it through a simple Python API. By not choosing to reinvent the wheel, we can take advantage of all the fantastic software that has already been written, and can easily plug into new software packages as they appear.
Our software can convert between many common molecular file formats and automatically find packages available within the environment on which it is run. I will show how this allows the user to write portable workflow components that can be run with different input, on different environments, and in completely different ways, e.g. command-line, Jupyter notebook, Knime.
Implementing a multi-core algorithm on a massively parallel low-power system
Andrew Gait, Andrew Rowley, Juan Yan, Michael Hopkins
(University of Manchester)
SpiNNaker is a massively parallel low power supercomputer designed to model large spiking neural networks in real time, consisting of roughly 1 million ARM cores, each with a low instructional memory (ITCM) limit of 32K. However, applications for the machine are not limited to neuromorphic models; for example, code which allows the implementaton of Markov Chain Monte Carlo (MCMC) models has been designed and in the process found in most useful cases to well exceed the ITCM limit on a single core. This has necessitated the use of multiple cores to each perform different parts of the algorithm, and the communication of (short) messages and the sharing of data between a subset of cores in order to perform the calculations required at each timestep. Results presented here will show comparisons between an implementation of an auto-regressive (AR), moving average (MA) model using MCMC that simulates wind power, on both SpiNNaker (Python host code / C machine code) and on a host machine running in MATLAB.
Leveraging graph databases for big data management
Mike Gavrielides, Wei Xing
(The Francis Crick Institute)
It’s a big challenge for researchers to retrieve most valuable and relevant datasets from big heterogeneous data system to answer a particular scientific question. In this talk, we present a data network solution to search, find and explore large amount datasets by research topics. In particular, we will demonstrate, using our data network technology and graph databases, how scientists can share and access distributed big data by masking out and automating interactions with computing infrastructures, such as hybrid clouds, large-scale distributed storage systems, on-premise HPC computing clusters, and various databases as well. We design and implement a Big Omics Data Tracker (BODE) system, which is implemented using data network technology and the Neo4J graph database. Employing graph database technology, we can build various omics data networks to provide a 360-degree of a data entity based on its biological meaning and computational features.
Implementation and performance of an AVX512 extension to OpenQCD
(Swansea Academy of Advanced Computing)
I introduce a recent extension of OpenQCD, an open source simulation program for lattice field theories, to the AVX512 instruction set. The goal of the extension is to increase the performance of the code on modern HPC systems with Intel processors. Success is limited by the memory bound nature of the problem in question, but significant improvement can be obtained. I describe the implementation of the extension and its performance in different data and parameter regimes.
RSE Group Evidence Bank
Andy Turner, Scott Henwood, Sandra Gesing, Jan Dietrich, Mark Turner, Simon Hettrick
( EPCC, The University of Edinburgh, UK; CANARIE, Canada; University of Notre Dame, USA; Potsdam Institute for Climate Impact Research, Germany; University of Newcastle, UK; University of Southampton, UK )
More and more organisations are keen to set up RSE groups to pool expertise and knowledge. In order to set up such groups, people often need access to evidence (both qualitative and quantitative) to assist in writing business cases and funding requests. At the RSE International Leaders meeting at the ATI in Jan 2018 a working group was set up to investigate how to gather and tag useful resources to assist people trying to start RSE groups. This poster summarises progress with this RSE group Evidence Bank to date and showcases the type of resources available.
The Evidence Bank is envisioned as community-driven effort and existing RSE groups as well as any RSE/researcher can contribute material to make it as useful as possible. The poster will provide information on how you can go about adding resources from your RSE group to help the community move forwards; we welcome input from RSE groups internationally. The Evidence Bank will also prove useful to existing RSE groups to get ideas of how they can measure and demonstrate impact going forwards. We are keen to discuss with existing RSE groups on how we can make the Evidence Bank as useful as possible for them. Please do come and speak to us about any of these topics!
Improving Performance and Scalability of NISMOD: National Infrastructure Systems Model
(University of Oxford)
The National Infrastructure Systems Model (NISMOD) is the first national system-of-systems long-term planning tool for infrastructure systems, including energy, transport, water, waste and digital communications. The tool has reached a stage of maturity to attract a diverse, expanding user group from researchers to government agencies. This user group is utilizing NISMOD's modelling capabilities to answer real-world questions for policy makers around planning the future of the UK’s national infrastructure.
With this ever-growing user group has come increased demands for model runtime and a need to understand the sensitivity of model results to the assumptions around key model parameters. To meet this demand the ITRC-Mistral development team has collaborated with the Science and Technology Facility to make use of their computing facility at Harwell called the Data and Analytics Facility for National Infrastructure (DAFNI).
The team developed functionality to enable NISMOD to farm out multiple model runs on the DAFNI computing cluster. This new functionality has enable thousands of different combinations of model parameter values to be run against NISMOD to examine the sensitivity of model results to the parameter uncertainty.
Time integration support for FEniCS through CVODE
(University of Cambridge)
This project builds on FEniCS, a widely used programming environment which can discretise many different types of partial differential equations. For many parabolic or hyperbolic PDEs specialised time-stepping routines are needed, and these exist in various libraries(PETSc TS, SUNDIALS CVODE) but weren’t directly interface in FEniCS. A direct interface to CVODE was developed and better time integration methods were made available. This enabled the development of open source micromagnetic codes on ARCHER as there is an interest in the micromagnetics community for access to this type of resource.
FAIRsharing – making standards, data repositories and policies FAIR
Milo Thurston, Peter McQuilton, Massimiliano Izzo, Allyson L. Lister, Alejandra Gonzalez-Beltran, Philippe Rocca-Serra, Susanna-Assunta Sansone
(Oxford e-Research Centre)
FAIRsharing (https://www.fairsharing.org) is a cross-discipline, manually curated, searchable portal of three linked registries, covering standards, databases and data policies. Launched in 2011, FAIRsharing began life as BioSharing, mapping resources in the life, biomedical and environmental sciences, before expanding to cover the natural sciences and beyond. Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relationships with other life science infrastructures. FAIRsharing maps the landscape and serves it up in a structured, human and machine-readable fashion, through a web front-end, API and embeddable visualisation tools. We are implementing a custom application ontology which, combined with search tools, will allow users to easily identify those resources relevant to their research.
FAIRsharing is part of ELIXIR, the pan-European infrastructure programme, and works with a number of funders, journal publishers, data managers and researchers to ensure databases, standards and policies are Findable, Accessible, Interoperable and Re-usable, through the promotion of the FAIR principles.
TeSS – the ELIXIR Training portal
Niall Beard, Finn Bacall, Milo Thurston, Aleksandra Nenadic, Susanna-Assunta Sansone, Carole A. Goble, Teresa K. Attwood
(Oxford e-Research Centre)
Across Europe, training materials and events to help researchers in the use of new tools and technologies are freely available, yet information about such training resources is dispersed across the Internet..
TeSS, ELIXIR’s Training Portal, is attempting to address this problem by automatically aggregating training materials and events from various sources, e.g., e-learning resources, online tutorials, slides, recorded lectures, MOOCs, workshops, courses, summer schools, conferences, webinars, etc. TeSS automatically aggregates metadata about such resources and opportunities from 31 content providers with a further 16 registering manually, and presents these within a searchable portal, linked to their original sources. Subscription services allow users to keep abreast of upcoming opportunities, via e-mail notifications or import to their preferred calendar application.
To facilitate the aggregation of content, TeSS has been advocating and helping to construct, Bioschemas training specifications. These provide light-weight interoperable specifications to help structure website data in meaningful ways.
The Tensor Network Theory Library: Exponentially Faster Algorithms for Solving Partial Differential Equations
(Department of Physics, University of Oxford)
The Tensor Network Theory Library is an open source software library that provides a large set of highly optimized algorithms from tensor network theory with a user-friendly interface. This software library is available online at:
Tensor network theory is a mathematical framework that has proven very powerful for the numerical analysis of quantum many-body systems, and in this context can be exponentially faster than other numerical approaches. This is known since many years. Only recently, it was discovered that tensor network theory can be applied to general partial differential equations and that similar speedups can be obtained in this context. In my talk, I will present our recent work  in which we show how multigrid methods can be exponentially faster when they are combined with tensor network theory. I will demonstrate this exponential speedup for linear Poisson and nonlinear Schroedinger equations.
 M. Lubasch, P. Moinier, and D. Jaksch, "Multigrid Renormalization", arXiv:1802.07259 (2018).