(HEP) Software is painful
As a C++/Python developer and former software architect of one of the four LHC experiments, I can tell you from vivid experience that software is painful to develop. One has to tame deep and complex software stacks with huge dependency lists. Each dependency comes with its own way to be configured, built and installed. Each dependency comes with its own dependencies. When you start working with one of these software stacks, installing them on your own machine is no walk in the park, even for experienced developers. These software stacks are real snowflakes: they need their unique cocktail of dependencies, with the right version, compiler toolchain and OS, tightly integrated on usually a single development platform.
Granted, the de facto standardization on
docker did help with some of these aspects, allowing projects to cleanly encapsulate the list of dependencies in a reproducible way, in a container.
Alas, this renders code easier to deploy but less portable: everything is
linux/amd64 plus some arbitrary Linux distribution.
In HENP, with
C++ being now the lingua franca for everything that is related with framework or infrastructure, we get unwiedly compilation times and thus a very unpleasant edit-compile-run development cycle.
C++ is a very complex language to learn, read and write - each new revision more complex than the previous one - it is becoming harder to bring new people on board with existing
C++ projects that have accumulated a lot of technical debt over the years: there are many layers of accumulated cruft, different styles, different ways to do things, etc…
Also, HENP projects heavily rely on shared libraries: not because of security, not because they are faster at runtime (they are not), but because as
C++ is so slow to compile, it is more convenient to not recompile everything into a static binary.
And thus, we have to devise sophisticated deployment scenarii to deal with all these shared libraries, properly configuring
-rpath, adding yet another moving piece in the machinery.
We did not have to do that in the
FORTRAN days: we were building static binaries.
From a user perspective, HENP software is also - even more so - painful. One needs to deal with:
- overly complicated Object Oriented systems,
- overly complicated inheritance hierarchies,
- overly complicated meta-template programming,
and, of course, dependencies.
It’s 2018 and there are still no simple way to handle dependencies, nor a standard one that would work across operating systems, experiments or analysis groups, when one lives in a
Finally, there is no standard way to retrieve documentation - and here we are just talking about APIs - nor a system that works across projects and across dependencies.
All of these issues might explain why many physicists are migrating to
The ecosystem is much more integrated and standardized with regard to installation procedures, serving, fetching and describing dependencies and documentation tools.
Python is also simpler to learn, teach, write and read than
But it is also slower.
Most physicists and analysts are willing to pay that price, trading reduced runtime efficiency for a wealth of scientific, turn-key pure-Python tools and libraries.
Other physicists strike a different compromise and are willing to trade the relatively seamless installation procedures of pure-Python software with some runtime efficiency by wrapping
C++ are no panacea when you take into account the vast diversity of programming skills in HENP, the distributed nature of scientific code development in HENP, the many different teams’ sizes and the constraints coming from the development of scientific analyses (agility, fast edit-compile-run cycles, reproducibility, deployment, portability, …)
To add insult to injury, these languages are rather ill equiped to cope with distributed programming and parallel programming: either because of a technical limitation (
CPython’s Global Interpreter Lock) or because the current toolbox is too low-level or error-prone.
Are we really left with either:
- a language that is relatively fast to develop with, but slow at runtime, or
- a language that is painful to develop with but fast at runtime ?
Mending software with Go
Of course, I think Go can greatly help with the general situation of software in HENP. It is not a magic wand, you still have to think and apply work. But it is a definitive, positive improvement.
Go was created to tackle all the challenges that
Python couldn’t overcome.
Go was designed for “programming in the large”.
Go was designed to strive at scales: software development at Google-like scale but also at 2-3 people scale.
But, most importantly, Go wasn’t designed to be a good programming language, it was designed for software engineering:
Software engineering is what happens to programming when you add time and other programmers.
Go is a simple language - not a simplistic language - so one can easily learn most of it in a couple of days and be proficient with it in a few weeks.
Go has builtin tools for concurrency (the famed
channels) and that is what made me try it initially.
But I stayed with Go for everything else, ie the tooling that enables:
- code refactoring with
- code maintenance with
- code discoverability and completion with
- local documentation (
go doc) and across projects (godoc.org),
- integrated, simple, build system (
go build) that handles dependencies (
go get), without messing around with
pom.xmlbuild files: all the needed information is in the source files,
- easiest cross-compiling toolchain to date.
And all these tools are usable from every single editor or IDE.
Go compiles optimized code really quickly.
So much so that the
go run foo.go command, that compiles a complete program and executes it on the fly, feels like running
python foo.py - but with builtin concurrency and better runtime performances (CPU and memory.)
Go produces static binaries that usually do not even require
One can take a binary compiled for
linux/amd64, copy it on a Centos-7 machine or on a Debian-8 one, and it will happily perform the requested task.
As a Gedankexperiment, take a standard
docker image from docker-hub and imagine having to build your entire experiment software stack, from the exact gcc version down to the last wagon of your train analysis.
- How much time would it take?
- How much effort of tracking dependencies and ensuring internal consistency would it take?
- How much effort would it be to deploy the binary results on another machine? on another non-Linux machine?
Now consider this script:
#!/bin/bash yum install -y git mercurial curl mkdir /build cd /build ## install the Go toolchain curl -O -L https://golang.org/dl/go1.10.3.linux-amd64.tar.gz tar zxf go1.10.3.linux-amd64.tar.gz export GOROOT=`pwd`/go export GOPATH=/go export PATH=$GOPATH/bin:$GOROOT/bin:$PATH ## install Go-HEP and its dependencies go get -v go-hep.org/x/hep/...
Running this script inside said container yields:
$> time ./install.sh [...] go-hep.org/x/hep/xrootd/cmd/xrd-ls go-hep.org/x/hep/xrootd/server go-hep.org/x/hep/xrootd/cmd/xrd-srv real 2m30.389s user 1m09.034s sys 0m14.015s
In less than 3 minutes, we have built a container with (almost) all the tools to perform a HENP analysis. The bulk of these 3 minutes is spent cloning repositories.
$> GOOS=windows \ go build go-hep.org/x/hep/rootio/cmd/root-dump $> file root-dump.exe root-dump.exe: PE32+ executable (console) x86-64 (stripped to external PDB), for MS Windows ## now, for windows-32b $> GOARCH=386 GOOS=windows \ go build go-hep.org/x/hep/rootio/cmd/root-dump $> file root-dump.exe root-dump.exe: PE32 executable (console) Intel 80386 (stripped to external PDB), for MS Windows
Fun fact: Go-HEP was supporting Windows users wanting to read ROOT-6 files before ROOT itself (ROOT-6 support for Windows landed with
Go & Science
Most of the needed scientific tools are available in Go at gonum.org:
- network graphs,
- statistical analysis,
- linear algebra,
- numerical differentiation,
- probability functions (univariate and multivariate),
- discrete Fourier transforms
Gonum is almost at feature parity with the
Gonum is still missing some tools, like ODE or more interpolation tools, but the chasm is closing.
Right now, in a HENP context, it is not possible to perform an analysis in Go and insert it in an already existing C++/Python pipeline. At least not easily: while reading is possible, Go-HEP is still missing the ability to write ROOT files. This restriction should be lifted before the end of 2018.
That said, Go can already be quite useful and usable, now, in science and HENP, for data acquisition, monitoring, cloud computing, control frameworks and some physics analyses. Indeed, Go-HEP provides HEP-oriented tools such as histograms and n-tuples, Lorentz vectors, fitting, interoperability with HepMC and other Monte-Carlo programs (HepPDT, LHEF, SLHA), a toolkit for a fast detector simulation à la Delphes and libraries to interact with ROOT and XRootD.
I think building the missing scientific libraries in Go is a better investment than trying to fix the
C++/Python languages and ecosystems.
Go is a better trade-off for software engineering and for science:
PS: There’s a nice discussion about this post on the Go-HEP forum.