Workshop Programme


10:00 - 10:30 Welcome with coffee



10:30 - 11:00 Ben van Werkhoven (eScience Center) - Test-driven development of auto-tuned GPU code with Kernel Tuner

Abstract
This presentation will focus on the practical aspects of developing well-tested GPU code. We'll explain how you can use Kernel Tuner to develop tests for your GPU code using simple Python scripts, and extend that to output verification during tuning. When working with tunable GPU code you actually maintain many different versions of the same code in a single source file. However, subtle interactions between different code optimizations could lead to bugs that may appear only for certain combinations of parameter values. Output verification during tuning allows programmers to detect bugs when introducing new parameterized code optimizations to the kernel.

11:00 - 11:30 Marieke Huisman (University of Twente) - Verification of GPU kernels with VerCors

Abstract
VerCors is a tool set for practical mechanised verification of parallel and concurrent software under different concurrency models, notably heterogeneous and homogeneous threading. VerCors has verification support for multiple languages with concurrency features, including Java, OpenCL, and OpenMP for C (i.e., compiler directives). The VerCors tool set uses separation logic with permission accounting as its logical foundation. In this presentation I will discuss the VerCors tool set and its verification capabilities. I will discuss in particular how GPU kernels with barriers and atomics can be verified with VerCors.



11:30 - 12:00 Alastair Donaldson (Imperial College London) - Exposing Errors Related to Weak Memory in GPU Applications

Abstract
In this presentation, I will discuss the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. This approach is evaluated across several CUDA applications that use fine-grained concurrency, on seven GPUs spanning three Nvidia architectures. The results show that applications that rarely, or never, exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in the testing environment. The testing environment also provides a means to identify the root causes of erroneous weak effects, and automatically suggests how to insert fences that experimentally eliminate these errors. This empirical fence insertion method carries significantly lower overhead, in terms of execution time and energy consumption, than a more conservative, guaranteed-sound approach. Joint work with Alastair Donaldson.





12:00 - 14:00 Lunch



14:00 - 14:30 Dragan Bosnacki (Eindhoven University of Technology) - Probabilistic Model Checking on GPUs

Abstract
Probabilistic Model Checking (PMC) is a formal technique for establishing with mathematical certainty if a given model satisfies a given property with some predefined probability. The application of PMC range from proving correctness of probabilistic programs and randomized algorithms to analyzing stochastic phenomena in the living organisms. The algorithms behind PMC have a significant numerical component which boil down to linear algebraic operations, like sparse matrix-vector multiplications. Capitalizing on this fact, the main probabilistic algorithms can be efficiently parallelized on GUPs. The prototype implementations of these algorithms in the tool PRISM show a speedup of at least one order of magnitude compared to their sequential counterparts. In the talk we revisit these results and discuss some future directions.



14:30 - 15:00 Qixia Yuan (University of Luxembourg) - GPU-accelerated Steady-state Computation of Large Probabilistic Boolean Networks

Abstract
Computation of steady-state probabilities is an important aspect of analysing biological systems modelled as probabilistic Boolean networks (PBNs). For small PBNs, efficient numerical methods can be successfully applied to perform the computation with the use of Markov chain state transition matrix underlying the studied networks.However, for large PBNs, numerical methods suffer from the state-space explosion problem since the state-space size is exponential in the number of nodes in a~PBN. In fact, the use of statistical methods and Monte Carlo methods remain the only feasible approach to address the problem for large PBNs. Such methods usually rely on long simulations of a PBN. Since slow simulation can impede the analysis, the efficiency of the simulation procedure becomes critical.Intuitively, parallelising the simulation process can be an ideal way to accelerate the computation. Recent developments of general purpose graphics processing units (GPUs) provide possibilities to massively parallelise the simulation process. In this work, we propose a trajectory-level parallelisation framework to accelerate the computation of steady-state probabilities in large PBNs with the use of GPUs. To maximise the computation efficiency on a GPU, we develop a dynamical data arrangement mechanism for handling different size PBNs with a GPU, and a specific way of storing predictor functions of a PBN and the state of the PBN in the GPU memory. Experimental results show that our GPU-based parallelisation gains a speedup of approximately 400 times for a real-life PBN.



15:00 - 15:30 Coffee



15:30 - 16:00 Thomas Neele (Eindhoven University of Technology) - Optimizations in GPU model checking

Abstract:
Model checking is an automated technique for verifying the correctness of software by exploring its complete state space. Applying the parallel power of GPUs to this setting is a challenging task. Over the past years, we have developed the tool GPUexplore, which can offer significant speed-ups over CPU-based implementations. In this talk, I will introduce the tool and explain what we have learned during its development. The optimizations we applied were key to achieving a speed-up of two orders of magnitude.



16:00 - 16:30 Anton Wijs (Eindhoven University of Technology) - Efficient GPU algorithms for parallel decomposition of graphs into strongly connected and maximal end components

Abstract:
I will present parallel algorithms for component decomposition of graph structures on general purpose graphics processing units (GPUs). In particular, I will consider the problem of decomposing sparse graphs into strongly connected components, and decomposing graphs induced by stochastic games (such as Markov decision processes) into maximal end components. These problems are key ingredients of many (probabilistic) model-checking algorithms. I will explain the main rationales behind our GPU algorithms, and show a significant speed-up over the sequential (as well as existing parallel) counterparts in several case studies.



16:30 - 17:10 Brainstorming session: A Dutch GPU Community

17:10 Drinks, Snacks, & Networking