The open subsurface stack

Two observations:

  1. Agile has been writing about open source software for geology and geophysics for several years now (for example here in 2011 and here in 2016). Progress is slow. There are lots of useful tools, but lots of gaps too. Some new tools have appeared, others have died. Conclusion: a robust and trusted open stack is not going to magically appear.

  2. People — some of them representing large corporations — are talking more than ever about industry collaboration. Open data platofrms are appearing all over the place. And several times at the DigEx conference in Oslo last week I heard people talk about open source and open APIs. Some organizations, notably Equinor, seem to really mean business. Conclusion: there seems to be a renewed appetite for open source subsurface software.

A quick reminder of what ‘open’ means; paraphrasing The Open Definition and The Open Source Definition in a sentence:

Open data, content and code can be freely used, modified, and shared by anyone for any purpose.

The word ‘open’ is being punted around quite a bit recently, but you have to read the small print in our business. Just as OpenWorks is not ‘open’ by the definition above, neither is OpenSpirit (remember that?), nor the Open Earth Community. (I’m not trying to pick on Halliburton but the company does seem drawn to the word, despite clearly not quite understanding it.)

The conditions are perfect

Earlier I said that a robust and trusted ‘stack’ (a collection of software that, ideally, does all the things we need) is not going to magically appear. What do I mean by ‘robust and trusted’? It goes far beyond ‘just code’ — writing code is the easy bit. It means thoroughly tested, carefully documented, supported, and maintained. All that stuff takes work, and work takes people and time. And people and time mean money.

Two more observations:

  1. Agile has been teaching geocomputing like crazy — 377 people in the last year. In our class, the participants install a lot of Python libraries, including a few from the open subsurface stack: segyio, lasio, welly, and bruges. Conclusion: a proto-stack exists already, hundreds of users exist already, and some training and support exist already.

  2. The Software Underground has over 1200 members (you should sign up, it’s free!). That’s a lot of people that care passionately about computers and rocks. The Python and machine learning communies are especially active. Conclusion: we have a community of talented scientists and developers that want to get good science done.

So what’s missing? What’s stopping us from taking open source subsurface tech to the next level?

Nothing!

Nothing is stopping us. And I’ve reached the conclusion that we need to provide care and feeding to this proto-stack, and this needs to start now. This is what the TRANSFORM 2019 unconference is going to be about. About 40 of us (you’re invited!) will spend five days working on some key questions:

  • What libraries are in the Python ‘proto-stack’? What kind of licenses do they have? Who are the maintainers?

  • Do we need a core library for the stack? Something to manage some basic data structures, units of measure, etc.

  • What are we calling it, who cares about it, and how are we going to work together?

  • Who has the capacity to provide attention, developer time, existing code, or funds to the stack?

  • Where are the gaps in the stack, and which ones need to be filled first?

We won’t finish all this at the unconference. But we’ll get started. We’ll produce a lot of ideas, plans, roadmaps, GitHub issues, and new code. If that sounds like fun to you, and you can contribute something to this work — please come. We need you there! Get more info and sign up here.


Read the follow-up post >>> What’s happening at TRANSFORM?


Thumbnail photo of the Old Man of Hoy by Tom Bastin, CC-BY on Flickr.

Can you do science on a phone?

Mobile geo-computing presentationClick the image to download the PDF (3.5M) in a new window. The PDF includes slides and notes.Yes! Perhaps the real question should be: Would you want to? Isn't the very idea just an extension of the curse of mobility, never being away from your email, work, commitments? That's the glass half-empty view; it takes discipline to use your cellphone on your own terms, picking it up when it's convenient. And there's no doubt that sometimes it is convenient, like when your car breaks down, or you're out shopping for groceries and you can't remember if it was Winnie-the-Pooh or Disney Princess toothpaste you were supposed to get.

So smartphones are convenient. And everywhere. And most people seem to have a data plan or ready access to WiFi. And these devices are getting very powerful. So there's every reason to embrace the fact that these little computers will be around the office and lab, and get on with putting some handy, maybe even fun, geoscience on them. 

My talk, the last one of the meeting I blogged about last week, was a bit of an anomaly in the hardcore computational geophysics agenda. But maybe it was a nice digestif. You can read something resembling the talk by clicking on the image (above), or if you like, you can listen to me in this 13-minute video version:

So get involved, learn to program, or simply help and inspire a developer to build something awesome. Perhaps the next killer app for geologists, whatever that might be. What can you imagine...?

Just one small note to geoscience developers out there: we don't need any more seismographs or compass-clinometers!

More powertools, and a gobsmacking

Yesterday was the second day of the open geophysics software workshop I attended in Houston. After the first day (which I also wrote about), I already felt like there were a lot of great geophysical powertools to follow up on and ideas to chase up, but day two just kept adding to the pile. In fact, there might be two piles now.

First up, Nick Vlad from FusionGeo gave us another look at open source systems from a commercial processing shop's perspective. Along with Alex (on day 1) and Renée (later on), he gave plenty of evidence that open source is not only compatible with business, but it's good for business. FusionGeo firmly believe that no one package can support them exclusively, and showed us GeoPro, their proprietary framework for integrating SEPlib, SU, Madagascar, and CP Seis. 

SEP logoYang Zhang from Stanford then showed us how reproducibility is central to SEPlib (as it is to Madagascar). When possible, researchers in the Stanford Exploration Project build figures with makefiles, which can be run by anyone to easily reproduce the figure. When this is not possible, a figure is labelled as non-reproducible; if there are some dependencies, on data for example, then it is called conditionally reproducible. (For the geeks out there, the full system for implementing this involves SEPlib, GNU make, Vplot, LaTeX, and SCons). 

Next up was a reproducibility system with ancestry in SEPlib: Madagascar, presented by the inimitable Sergey Fomel. While casually downloading and compiling Madagascar, he described how it allows for quick regeneration of figures, even from other sources like Mathematica. There are some nice usability features of Madagascar: you can easily interface with processes using Python (as well as Java, among other languages), and tools like OpendTect and BotoSeis can even provide a semi-graphical interface. Sergey also mentioned the importance of a phenomenon called dissertation procrastination, and why grad students sometimes spend weeks writing amazing code:

"Building code gives you good feelings: you can build something powerful, and you make connections with the people who use it"

After the lunch break, Joe Dellinger from BP explained how he thought some basic interactivity could be added to Vplot, SEP's plotting utility. The goal would not be to build an all-singing, all-dancing graphics tool, but to incrementally improve Vplot to support editing labels, changing scales, and removing elements. A good goal for a 1-day hack-fest?

The show-stopper of the day was Bjorn Olofsson of SeaBird Exploration. I think it's fair to say that everyone was gobsmacked by his description of SeaSeis, a seismic processing system that he has built with his own bare hands. This was the first time he has presented the system, but he started the project in 2005 and open-sourced it about 18 months ago. Bjorn's creation stemmed from an understandable (to me) frustration with other packages' apparent complexity and unease-of-use. He has built enough geophysical algorithms for SeaBird to use the software at sea, but the real power is in his interactive viewing tools. Built with Java, Bjorn has successfully exploited all the modern GUI libraries at his disposal. Due to constraints on his time, the future is uncertain. Message of the day: Help this man!

Renée Bourque of dGB also opened a lot of eyes with her overview of OpendTect and the Open Seismic Repository. dGB's tools are modern, user-friendly, and flexible. I think many people present realized that these tools—if combined with the depth and breadth of more fundamental pieces like SU, SEPlib and Madagascar—could offer the possibility of a robust, well-supported, well-documented, and rich environment that processors can use every day, without needing a lot of systems support or hacking skills. The paradigm already exists: Madagascar has an interface in OpendTect today.

As the group began to start thinking about the weekend, it was left to me, Matt Hall, to see if there was any more appetite for hearing about geophysics and computers. There was! Just enough for me to tell everyone a bit about mobile devices, the Android operating system, and the App Inventor programming canvas. More on this next week!

It was an inspiring and thought-provoking workshop. Thank you to Karl Schleicher and Robert Newsham for organizing, and Cheers! to the new friends and acquaintances. My own impression was that the greatest challenge ahead for this group is not so much computational, but more about integration and consolidation. I'm looking forward to the next one!

Open seismic processing, and dolphins

Today was the first day of the Petroleum Technology Transfer Council's workshop Open software for reproducible computational geophysics, being held at the Bureau of Economic Geology's Houston Research Center and organized skillfully by Karl Schleicher of the University of Texas at Austin. It was a full day of presentations (boo!), but all the presentations had live installation demos and even live coding (yay!). It was fantastic. 

Serial entrepreneur Alex Mihai Popovici, the CEO of Z-Terra, gave a great, very practical, overview of the relative merits of three major seismic processing packages: Seismic Unix (SU), Madagascar, and SEPlib. He has a very real need: delivering leading edge seismic processing services to clients all over the world. He more or less dismissed SEPlib on the grounds of its low development rate and difficulty of installation. SU is popular (about 3300 installs) and has the best documentation, but perhaps lacks some modern imaging algorithms. Madagascar, Alex's choice, has about 1100 installs, relatively terse self-documentation (it's all on the wiki), but is the most actively developed.

The legendary Dave Hale (I think that's fair), Colorado School of Mines, gave an overview of his Mines Java Toolkit (JTK). He's one of those rare people who can explain almost anything to almost anybody, so I learned a lot about how to manage parallelization in 2D and 3D arrays of data, and how to break it. Dave is excited about the programming language Scala, a sort of Java lookalike (to me) that handles parallelization beautifully. He also digs Jython, because it has the simplicity and fun of Python, but can incorporate Java classes. You can get his library from his web pages. Installing it on my Mac was a piece of cake, needing only three terminal commands: 

  • svn co http://boole.mines.edu/jtk
  • cd jtk/trunk
  • ant

Chuck Mosher of ConocoPhillips then gave us a look at JavaSeis, an open source project that makes handling prestack seismic data easy and very, very fast. It has parallelization built into it, and is perfect for large, modern 3D datasets and multi-dimensional processing algorithms. His take on open source in commerce: corporations are struggling with the concept, but "it's in their best interests to actively participate".

Eric Jones is CEO of Enthought, the innovators behind (among other things) NumPy/SciPy and the Enthought Python Distribution (or EPD). His take on the role of Python as an integrator and facilitator, handling data traffic and improving usability for the legacy software we all deal with, was practical and refreshing. He is not at all dogmatic about doing everything in Python. He also showed a live demo of building a widget with Traits and Chaco. Awesome.

After lunch, BP's Richard Clarke told us about the history and future of FreeUSP and FreeDDS, a powerful processing system. FreeDDS is being actively developed and released gradually by BP; indeed, a new release is due in the next fews days. It will eventually replace FreeUSP. Richard and others also mentioned that Randy Selzler is actively developing PSeis, the next generation of this processing system (and he's looking for sponsors!). 

German Garabito of the Federal University of Parà, Brazil, generated a lot of interest in BotoSeis, the GUI he has developed to help him teach SU. It allows one to build and manage processing flows visually, in a Java-built interface inspired by Focus, ProMax and other proprietary tools. The software is named after the Amazon river dolphin, or boto (left). Dave Hale described his efforts as the perfect example of the triumph of 'scratching your own itch'.

Continuing the usability theme, Karl Schleicher followed up with a nice look at how he is building scripts to pull field data from the USGS online repository, and perform SU and Madagascar processing flows on them. He hopes he can build a library of such scripts as part of Sergey Fomel's reproducible geophysics efforts. 

Finally, Bill Menger of Global Geophysical told the group a bit about two projects he open sourced when he was at ConocoPhillips: GeoCraft and CPSeis. His insight on what was required to get them into the open was worth sharing: 

  1. Get permission, using a standard open source license (and don't let lawyers change it!)
  2. Communicate the return on investment carefully: testing, bug reporting, goodwill, leverage, etc.
  3. Know what you want to get out of it, and have a plan for how to get there
  4. Pick a platform: compiler, dependencies, queueing, etc (unless you have a lot of time for support!)
  5. Know the issues: helping users, dealing with legacy code, dependency changes, etc.

I am looking forward to another awesome-packed data tomorrow. My own talk is the wafer-thin mint at the end!

Why we should embrace openness

Openness—open ideas, open data, open teams—can help us build more competitive, higher performing, more sutainable organizations in this industry.

Last week I took this message to the annual convention of the three big applied geoscience organizations in Canada: the Canadian Society of Petroleum Geologists (CSPG), the Canadian Society of Exploration Geophysicist (CSEG), and the Canadian Well Logging Society (CWLS). Evan and I attended the conference as scientists, but also experimented a bit with live tweeting and event blogging.

The talk was a generalization of the talk I did in March about open source software in geoscience. I wasn't sure at all how it would go over, and spent most of the morning sitting in technical talks fretting about how flaky and meta my talk would sound. But it went quite well, and at least served as some light relief from the erudition in the rest of the agenda. It was certainly fun to give an opinion-filled talk, and it started plenty of conversations afterwards.

You can access a PDF of the visuals, with commentary, from the thumbnail (left).

What do you think? Is a competitive, secretive industry like oil and gas capable of seeing value in openness? Might regulators eventually force us to share more as the resources society demands become scarcer? Or are we doomed to more mistrust and secrecy as oil and gas become more expensive to produce?

← Click the image for the PDF (6.8M)

Geo-FLOSS

Newton didn't need open source, so why do you?Free and open source software is catalyzing a revolution in subsurface science. As a key part of the growing movement to open access to data, information, and the very process of doing science, open software is not just for the geeks. It's a party we're all invited to. 

I have been in California this week, attending a conference in Long Beach called Mathematical and Computational Issues in the Geosciences, organized by the Society of Industrial and Applied Mathematicians. In 2009 I started being more active in my search for lectures and courses that lie outside my usual comfort zone. I have done courses in reservoir engineering and Java programming. I have heard talks on radiology and financial forecasting. It's like being back at university; I like it.

How did I end up at this conference? Last spring, I wrote a little review article about open source software (available here at dGB Earth Science's site). It was really just a copy-edited version of notes I had made whilst looking for free geoscience software and reading up on the subject for my own interest. After some brushes with open source, I was curious about the history behind the idea, how projects are built, and how they are licensed. At the same time, I also started a couple of Wikipedia articles about free software in geology and geophysics, as a place to list the projects I had come across. Kristin Flornes, of IRIS in Stavanger, Norway, saw the article and her colleagues got in touch about the conference.

The talk, which you can access via the thumbnail (left) or look at in Google Docs, is part FLOSS primer, part geo-FLOSS advert, part manifesto for a revolution of innovation. I hope the speaker notes are sufficient. 

What do you think? Is software availability or architecture or capable of driving change, or is it just a tool, passive and inert?

← Click the image for the PDF (6.9M)