August 16, 2018

What is a sprint?

August 16, 2018/ Matt Hall

In October we're hosting our first 'code sprint'! What is that?

A code sprint is a type of hackathon, in which efforts are focused around a small number of open source projects. They are related to, but not really the same as, sprints in the Scrum software development framework. They are non-competitive — the only goal is to improve the software in question, whether it's adding functionality, fixing bugs, writing tests, improving documentation, or doing any of the other countless things that good software needs.

On 13 and 14 October, we'll be hacking on 3 projects:

Devito: a high-level finite difference library for Python. Devito featured in three Geophysical Tutorials at the end of 2017 and beginning of 2018 (see Witte et al. for Part 3). The project needs help with code, tests, model examples, and documentation. There will be core devs from the project at the sprint. GitHub repo is here.
Bruges: a simple collection of Python functions representing basic geophysical equations. We built this library back in 2015, and have been chipping away ever since. It needs more equations, better docs, and better tests — and the project is basic enough for anyone to contribute to it, even a total Python newbie. GitHub repo is here.
G3.js: a JavaScript wrapper for D3.js, a popular plotting toolkit for web developers. When we tried to adapt D3.js to geoscience data, we found we wanted to simplify basic tasks like making vertical plots, and plotting raster-like data (e.g. seismic) with line plots on top (e.g. horizons). Experience with JavaScript is a must. GitHub repo is here.

The sprint will be at a small joint called MAZ Café Con Leche, located in Santa Ana about 10 km or 15 minutes from the Anaheim Convention Center where the SEG Annual Meeting is happening the following week.

Thank you, as ever, to our fantastic sponsors: Dell EMC and Enthought. These two companies are powered by amazing people doing amazing things. I'm very grateful to them both for being such enthusiastic champions of the change we're working for in our science and our industry.

If you like the sound of spending the weekend coding, talking geophysics, and enjoying the best coffee in southern California, please join us at the Geophysics Sprint! Register on Eventbrite and we'll see you there.

July 19, 2018

Lots of news!

July 19, 2018/ Matt Hall

I can't believe it's been a month since my last post! But I've now recovered from the craziness of the spring — with its two hackathons, two conferences, two new experiments, as well as the usual courses and client projects — and am ready to start getting back to normal. My goal with this post is to tell you all the exciting stuff that's happened in the last few weeks.

Meet our newest team member

There's a new Agilist! Robert Leckenby is a British–Swiss geologist with technology tendencies. Rob has a PhD in Dynamic characterisation and fluid flow modelling of fractured reservoirs, and has worked in various geoscience roles in large and small oil & gas companies. We're stoked to have him in the team!

Rob lives near Geneva, Switzerland, and speaks French and several other human languages, as well as Python and JavaScript. He'll be helping us develop and teach our famous Geocomputing course, among other things. Reach him at robert@agilescientific.com.

Geocomputing Summer School

We have trained over 120 geoscientists in Python so far this year, but most of our training is in private classes. We wanted to fix that, and offer the Geocomputing class back for anyone to take. Well, anyone in the Houston area :) It's called Summer School, it's happening the week of 13 August, and it's a 5-day crash course in scientific Python and the rudiments of machine learning. It's designed to get you a long way up the learning curve. Read more and enroll.

A new kind of event

We have several more events happening this year, including hackathons in Norway and in the UK. But the event in Anaheim, right before the SEG Annual Meeting, is going to be a bit different. Instead of the usual Geophysics Hackathon, we're going to try a sprint around open source projects in geophysics. The event is called the Open Geophysics Sprint, and you can find out more here on events.agilescientific.com.

That site — events.agilescientific.com — is our new events portal, and our attempt to stay on top of the community events we are running. Soon, you'll be able to sign up for events on there too (right now, most of them are still handled through Eventbrite), but for now it's at least a place to see everything that's going on. Thanks to Diego for putting it together!

May 31, 2018

Weekend worship in Salt Lake City

May 31, 2018/ Evan Bianco

The Salt Lake City hackathon — only the second we've done with a strong geology theme — is a thing of history, but you can still access the event page to check out who showed up and who did what. (This events page is a new thing we launched in time for this hackathon; it will serve as a public document of what happens at our events, in addition to being a platform for people to register, sponsor, and connect around our events.)

In true seat-of-the-pants hackathon style we managed to set up an array of webcams and microphones to record the finale. The demos are the icing on the cake. Teams were selected at random and were given 4 minutes to wow the crowd. Here is the video, followed by a summary of what each team got up to...

Unconformist.ai

Didi Ooi (University of Bristol), Karin Maria Eres Guardia (Shell), Alana Finlayson (UK OGA), Zoe Zhang (Chevron). The team used machine learning the automate the mapping of unconformities in subsurface data. One of the trickiest parts is building up a catalog of data-model pairs for GANs to train on. Instead of relying on thousand or hundreds of thousands of human-made seismic interpretations, the team generated training images by programmatically labelling pixels on synthetic data as being either above (white) or below (black) the unconformity. Project page. Slides.

Outcrops Gee Whiz

Thomas Martin (soon Colorado School of Mines), Zane Jobe (Colorado School of Mines), Fabien Laugier (Chevron), and Ross Meyer (Colorado School of Mines). The team wrote some programs to evaluate facies variability along drone-derived digital outcrop models. They did this by processing UAV point cloud data in Python and classified different rock facies using using weather profiles, local cliff face morphology, and rock colour variations as attributes. This research will help in the development drone assisted 3D scanning to automate facies boundaries mapping and rock characterization. Repo, Slides.

Jet Loggers

Eirik Larsen and Dimitrios Oikonomou (Earth Science Analytics), and Steve Purves (Euclidity). This team of European geoscientists, with their circadian clocks all out of whack, investigated if a language of stratigraphy can be extracted from the rock record and, if so, if it can be used as another tool for classifying rocks. They applied natural language processing (NLP) to an alphabetic encoding of well logs as a means to assist or augment the labour-intensive tasks of classifying stratigraphic units and picking tops. Slides.

Book Cliffs Bandits

Tom Creech (ExxonMobil) and Jesse Pisel (Wyoming State Geological Survey). The team started munging datasets in the Book Cliffs. Unfortunately, they really did not have the perfect, ready to go data, and by the time they pivoted to some workable open data from Alaska, their team name had already became a thing. The goal was build a tool to assist with lithology and stratigraphic correlation. They settled on change-point detection using Bayesian statistics, which they were using to build richer feature sets to test if it could produce more robust automatic stratigraphic interpretation. Repo, and presentation.

A channel runs through it

Nam Pham (UT Austin), Graham Brew (Dynamic Graphics), Nathan Suurmeyer (Shell). Because morphologically realistic 3D synthetic seismic data is scarce, this team wrote a Python program that can take seismic horizon interpretations from real data, then construct richer training data sets for building an AI that can automatically delineate geological entities in the subsurface. The pixels enclosed by any two horizons are labelled with ones, pixels outside this region are labelled with zeros. This work was in support of Nam's thesis research which is using the SegNet architecture, and aims to extract not only major channel boundaries in seismic data, but also the internal channel structure and variability – details that many seismic interpreters, armed even with state-of-the art attribute toolboxes, would be unable to resolve. Project page, and code.

GeoHacker

Malcolm Gall (UK OGA), Brendon Hall and Ben Lasscock (Enthought). Innovation happens when hackers have the ability to try things... but they also need data to try things out on. There is a massive shortage of geoscience datasets that have been staged and curated for machine learning research. Team Geohacker's project wasn't a project per se, but a platform aimed at the sharing, distribution, and long-term stewardship of geoscience data benchmarks. The subsurface realm is swimming with disparate data types across a dizzying range of length scales, and indeed community efforts may be the only way to prove machine-learning's usefulness and keep the hype in check. A place where we can take geoscience data, and put it online in a ready-to-use for for machine learning. It's not only about being open, online and accessible. Good datasets, like good software, need to be hosted by individuals, properly documented, enriched with tutorials and getting-started guides, not to mention properly funded. Website.

Petrodict

Mark Mlella (Univ. Louisiana, Lafayette), Matthew Bauer (Anchutz Exploration), Charley Le (Shell), Thomas Nguyen (Devon). Petrodict is a machine-learning driven, cloud-based lithology prediction tool that takes petrophysics measurements (well logs) and gives back lithology. Users upload a triple combo log to the app, and the app returns that same log with with volumetric fractions for it's various lithologic or mineralogical constituents. For training, the team selected several dozen wells that had elemental capture spectroscopy (ECS) logs – a premium tool that is run only in a small fraction of wells – as well as triple combo measurements to build a model for predicting lithology. Repo.

Seismizor

George Hinkel, Vivek Patel, and Alex Waumann (all from University of Texas at Arlington). Earthquakes are hard. This team of computer science undergraduate students drove in from Texas to spend their weekend with all the other geo-enthusiasts. What problem in subsurface oil and gas did they identify as being important, interesting, and worthy of their relatively unvested attention? They took on the problem of induced seismicity. To test whether machine learning and analytics can be used to predict the likelihood that injected waste water from fracking will cause an earthquake like the ones that have been making news in Oklahoma. The majority of this team's time was spent doing what all good scientists do –understanding the physical system they were trying to investigate – unabashedly pulling a number of the more geomechanically inclined hackers from neighbouring teams and peppering them with questions. Induced seismicity is indeed a complex phenomenon, but George's realization that, "we massively overestimated the availability of data", struck a chord, I think, with the judges and the audience. Another systemic problem. The dynamic earth – incredible in its complexity and forces – coupled with the fascinating and politically charged technologies we use for drilling and fracking might be one of the hardest problems for machine learning to attack in the subsurface.

AAPG next year is in San Antonio. If it runs, the hackathon will be 18–19 of May. Mark your calendar and stay tuned!

February 24, 2017

Strategies for a revolution

February 24, 2017/ Matt Hall

This must be a record. It has taken me several months to get around to recording the talk I gave last year at EAGE in Vienna — Strategies for a revolution. Rather a gradiose title, sorry about that, especially over-the-top given that I was preaching to the converted: the workshop on open source. I did, at least, blog aobut the goings on in the workshop itself at the time. I even followed it up with a slightly cheeky analysis of the discussion at the event. But I never posted my own talk, so here it is:

Too long didn't watch? No worries, my main points were:

It's not just about open source code. We must write open access content, put our data online, and push the whole culture towards openness and reproducibility.
We, as researchers, professionals, and authors, need to take responsibility for being more open in our practices. It has to come from within the community.
Our conferences need more tutorials, bootcamps, , hackathons and sprints. These events build skills and networks much faster than (just) lectures and courses.
We need something like an Open Geoscience Foundation to help streamline funding channels for open source projects and community events.

If you depend on open source software, or care about seeing more of it in our field, I'd love to hear your thoughts about how we might achieve the goal of having greater (scientific, professional, societal) impact with technology. Please leave a comment.

November 08, 2016

Welly to the wescue

November 08, 2016/ Matt Hall

I apologize for the widiculous title.

Last week I described some headaches I was having with well data, and I introduced welly, an open source Python tool that we've built to help cure the migraine. The first versions of welly were built — along with the first versions of striplog — for the Nova Scotia Department of Energy, to help with their various data wrangling efforts.

Aside — all software projects funded by government should in principle be open source.

Today we're using welly to get data out of LAS files and into so-called feature vectors for a machine learning project we're doing for Canstrat (kudos to Canstrat for their support for open source software!). In our case, the features are wireline log measurements. The workflow looks something like this:

Read LAS files into a welly 'project', which contains all the wells. This bit depends on lasio.
Check what curves we have with the project table I showed you on Thursday.
Check curve quality by passing a test suite to the project, and making a quality table (see below).
Fix problems with curves with whatever tricks you like. I'm not sure how to automate this.
Export as the X matrix, all ready for the machine learning task.

Let's look at these key steps as Python code.

1. Read LAS files

from welly import Project
p = Project.from_las('data/*.las')

2. Check what curves we have

Now we have a project full of wells and can easily make the table we saw last week. This time we'll use aliases to simplify things a bit — this trick allows us to refer to all GR curves as 'Gamma', so for a given well, welly will take the first curve it finds in the list of alternatives we give it. We'll also pass a list of the curves (called keys here) we are interested in:

The project table. The name of the curve selected for each alias is selected. The mean and units of each curve are shown as a quick QC. A couple of those RHOB curves definitely look dodgy, and they turned out to be DRHO correction curves.

3. Check curve quality

Now we have to define a suite of tests. Lists of test to run on each curve are held in a Python data structure called a dictionary. As well as tests for specific curves, there are two special test lists: Each and All, which are run on each curve encountered, and on all curves together, respectively. (The latter is required to, for example, compare the curves to each other to look for duplicates). The welly module quality contains some predefined tests, but you can also define your own test functions — these functions take a curve as input, and return either True (for a test pass) for False.

import welly.quality as qty
tests = {
    'All': [qty.no_similarities],
    'Each': [qty.no_monotonic],
    'Gamma': [
        qty.all_positive,
        qty.mean_between(10, 100),
    ],
    'Density': [qty.mean_between(1000,3000)],
    'Sonic': [qty.mean_between(180, 400)],
    }

html = p.curve_table_html(keys=keys, alias=alias, tests=tests)
HTML(html)

the green dot means that all tests passed for that curve. Orange means some tests failed. If all tests fail, the dot is red. The quality score shows a normalized score for all the tests on that well. In this case, RHOB and DT are failing the 'mean_b… — the green dot means that all tests passed for that curve. Orange means some tests failed. If all tests fail, the dot is red. The quality score shows a normalized score for all the tests on that well. In this case, RHOB and DT are failing the 'mean_between' test because they have Imperial units.

4. Fix problems

Now we can fix any problems. This part is not yet automated, so it's a fairly hands-on process. Here's a very high-level example of how I fix one issue, just as an example:

def fix_negs(c):
    c[c < 0] = np.nan
    return c

# Glossing over some details, we give a mnemonic, a test
# to apply, and the function to apply if the test fails.
fix_curve_if_bad('GAM', qty.all_positive, fix_negs)

What I like about this workflow is that the code itself is the documentation. Everything is fully reproducible: load the data, apply some tests, fix some problems, and export or process the data. There's no need for intermediate files called things like DT_MATT_EDIT or RHOB_DESPIKE_FINAL_DELETEME. The workflow is completely self-contained.

5. Export

The data can now be exported as a matrix, specifying a depth step that all data will be interpolated to:

X, _ = p.data_as_matrix(X_keys=keys, step=0.1, alias=alias)

That's it. We end up with a 2D array of log values that will go straight into, say, scikit-learn*. I've omitted here the process of loading the Canstrat data and exporting that, because it's a bit more involved. I will try to look at that part in a future post. For now, I hope this is useful to someone. If you'd like to collaborate on this project in the future — you know where to find us.

* For more on scikit-learn, don't miss Brendon Hall's tutorial in October's Leading Edge.

I'm happy to let you know that agilegeoscience.com and agilelibre.com are now served over HTTPS — so connections are private and secure by default. This is just a matter of principle for the Web, and we go to great pains to ensure our web apps modelr.io and pickthis.io are served over HTTPS. Find out more about SSL from DigiCert, the provider of Squarespace's (and Agile's) certs, which are implemented with the help of the non-profit Let's Encrypt, who we use and support with dollars.

November 03, 2016

Well data woes

November 03, 2016/ Matt Hall

I probably shouldn't be telling you this, but we've built a little tool for wrangling well data. I wanted to mention it, becase it's doing some really useful things for us — and maybe it can help you too. But I probably shouldn't because it's far from stable and we're messing with it every day.

But hey, what software doesn't have a few or several or loads of bugs?

Buggy data?

It's not just software that's buggy. Data is as buggy as heck, and subsurface data is, I assert, the buggiest data of all. Give units or datums or coordinate reference systems or filenames or standards or basically anything at all a chance to get corrupted in cryptic ways, and they take it. Twice if possible.

By way of example, we got a package of 10 wells recently. It came from a "data management" company. There are issues... Here are some of them:

All of the latitude and longitude data were in the wrong header fields. No coordinate reference system in sight anywhere. This is normal of course, and the only real side-effect is that YOU HAVE NO IDEA WHERE THE WELL IS.
Header chaos aside, the files were non-standard LAS sort-of-2.0 format, because tops had been added in their own little completely illegal section. But the LAS specification has a section for stuff like this (it's called OTHER in LAS 2.0).
Half the porosity curves had units of v/v, and half %. No big deal...
...but a different half of the porosity curves were actually v/v. Nice.
One of the porosity curves couldn't make its mind up and changed scale halfway down. I am not making this up.
Several of the curves were repeated with other names, e.g. GR and GAM, DT and AC. Always good to have a spare, if only you knew if or how they were different. Our tool curvenam.es tries to help with this, but it's far from perfect.
One well's RHOB curve was actually the PEF curve. I can't even...

The remarkable thing is not really that I have this headache. It's that I expected it. But this time, I was out of paracetamol.

Cards on the table

Our tool welly, which I stress is very much still in development, tries to simplify the process of wrangling data like this. It has a project object for collecting a lot of wells into a single data structure, so we can get a nice overview of everything:

Our goal is to include these curves in the training data for a machine learning task to predict lithology from well logs. The trained model can make really good lithology predictions... if we start with non-terrible data. Next time I'll tell you more about how welly has been helping us get from this chaos to non-terrible data.

June 01, 2016

Open source geoscience is _________________

June 01, 2016/ Matt Hall

As I wrote yesterday, I was at the Open Source Geoscience workshop at EAGE Vienna 2016 on Monday. Happily, the organizers made time for discussion. However, what passes for discussion in the traditional conference setting is, as I've written before, stilted.

What follows is not an objective account of the proceedings. It's more of a poorly organized collection of soundbites and opinions with no real conclusion... so it's a bit like the actual discussion itself.

TL;DR The main take home of the discussion was that our community does not really know what to do with open source software. We find it difficult to see how we can give stuff away and still make loads of money.

I'm not giving away my stuff

Paraphrasing a Schlumberger scientist:

Schlumberger sponsors a lot of consortiums, but the consortiums that will deliver closed source software are our favourites.

I suppose this is a way to grab competitive advantage, but of course there are always the other consortium members so it's hardly an exclusive. A cynic might see this position as a sort of reverse advantage — soak up the brightest academics you can find for 3 years, and make sure their work never sees the light of day. If you patent it, you can even make sure no-one else gets to use the ideas for 20 years. You don't even have to use the work! I really hope this is not what is going on.

I loved the quote Sergey Fomel shared; paraphrasing Matthias Schwab, his former advisor at Stanford:

Never build things you can't take with you.

My feeling is that if a consortium only churns out closed source code, then it's not too far from being a consulting shop. Apart from the cheap labour, cheap resources, and no corporation tax.

Yesterday, in the talks in the main stream, I asked most of the presenters how people in the audience could go and reproduce, or simply use, their work. The only thing that was available was a commerical OpendTect plugin of dGB's, and one free-as-in-beer MATLAB utility. Everything else was unavailble for any kind of inspection, and in one case the individual would not even reveal the technology framework they were using.

Support and maintenance

Paraphrasing a Saudi Aramco scientist:

There are too many bugs in open source, and no support.

The first point is, I think, a fallacy. It's like saying that Wikipedia contains inaccuracies. I'd suggest that open source code has about the same number of bugs as proprietary software. Software has bugs. Some people think open source is less buggy; as Linus Torvalds said: "Given enough eyeballs, all bugs are shallow." Kristofer Tingdahl (dGB) pointed out that the perceived lack of support is a business opportunity for open source community. Another participant mentioned the importance of having really good documentation. That costs money of course, which means finding ways for industry to support open source software development.

The same person also said something like:

[Open source software] changes too quickly, with new versions all the time.

...which says a lot about the state of application management in many corporations and, again, may represent opportunity rather than a threat to open source movement.

Only in this industry (OK, maybe a couple of others) will you hear the impassioned cry, "Less change!"

The fog of torpor

When a community is falling over itself to invent new ways to do things, create new value for people, and find new ways to get paid, few question the sharing and re-use of information. And by 'information' I mean code and data, not a few PowerPoint slides. Certainly not all information, but lots. I don't know which is the cause and which is the effect, but the correlation is there.

In a community where invention is slow, on the other hand, people are forced to be more circumspect, and what follows is a cynical suspicion of the motives of others. Here's my impression of the dynamic in the room during the discussion on Monday, and of course I'm generalizing horribly:

Operators won't say what they think in front of their competitors
Vendors won't say what they think in front of their customers and competitors
Academics won't say what they think in front of their consortium ~~customers~~ sponsors
Students won't say what they think in front of their advisors and potential employers

This all makes discussion a bit stilted. But it's not impossible to have group discussions in spite of these problems. I think we achieved a real, honest conversation in the two Unsessions we're done in Calgary, and I think the model we used would work perfectly in all manner of non-technical and in technical settings. We just have to start doing it. Why our convention organizers feel unable to try new things at conferences is beyond me.

I can't resist finishing on something a person at Chevron said at the workshop:

I'm from Chevron. I was going to say something earlier, but I thought maybe I shouldn't.

This just sums our industry up.

May 31, 2016

Open source FWI, I mean geoscience

May 31, 2016/ Matt Hall

I'm being a little cheeky. Yesterday's Open Source Geoscience workshop at EAGE was not really only about full waveform inversion (FWI). True, it was mostly about geophysics, but there was quite a bit of new stuff too.

But there was quite a bit on FWI.

The session echoed previous EAGE sessions on the same subject in 2006 and 2012, and was chaired by Filippo Broggini (of ETH Zürich), Sergey Fomel (University of Texas), Thomas Günther (LIAG Hannover), and Russell Hewett (Total, unfortunately not present). It started with a look at core projects like Madagascar and OpendTect. There were some (for me) pretty hard core, mathematics-heavy contributions. And we got a tour of some new and newish projects that are seeking users and/or contributors. Rather than attempting to cover everything, I'm going to exercise my (biased and ill-gotten) judgment and focus on some highlights from the day.

Filippo Broggini started by reminding us of why Joe Dellinger (BP) started this recurrent workshop a decade ago. Here's how Joe framed the value of open source to our community:

The economic benefits of a collaborative open-source exploration and production processing and research software environment would be enormous. Skilled geophysicists could spend more of their time doing innovative geophysics instead of mediocre computer science. Technical advances could be quickly shared and reproduced instead of laboriously re-invented and reverse-engineered. Oil companies, contractors, academics, and individuals would all benefit.

Did I mention that he wrote that 10 years ago?

Lessons learned from the core projects

Kristofer Tingdahl (dGB) then gave the view from his role as CEO of dGB Earth Sciences, the company behind OpendTect, the free and open source geoscience interpretation tool. He did a great job of balancing the good (their thousands of users, and their SEG Distinguished Achievement Award 2016) with the less good (the difficulty of building a developer community, and the struggle to get beyond only hundreds of paying users). His great optimism and natural business instinct filled us all with hope.

The irrepressible Sergey Fomel summed up 10 years of Madagascar's rise. In the journey from v0.9 to v2.0, the projects has moved from SourceForge to GitHub, gone from 6 to 72 developers, jumped from 30 to 260 reproducible papers, and been downloaded over 40 000 times. He also shared the story of his graduate experience at Stanford, where he was involved in building the first 'reproducible science' system with Jon Claerbout in the early 1990s. Un/fortunately, it turned out to be unreproducible, so he had to build Madagascar.

It's not (yet?) a core project, but John Stockwell (Colorado School of Mines) talked about OpenSeaSeis and barely mentioned SeismicUnix. This excellent little seismic processing project is now owned by CSM, after its creator, Bjoern Olofsson, had to give it up when he went to work for a corporation (makes sense, right? o_O ). The tool includes SeaView, a standalone SEGY viewer, as well as a graphical processing flow composer called XSeaSeis. IT prides itself on its uber-simple architecture (below). Want a gain step? Write gain.so and you're done. Perfect for beginners.

Jeffrey Shragge (UWA), Bob Clapp (SEP), and Bill Symes (Rice) provided some perspective from groups solving big math problems with big computers. Jeff talked about coaxing Madgascar — or M8R as the cool kids apparently refer to it — into the cloud, where it can chomp through 100 million core hours without setting tings on fire. This is a way for small enterprises and small (underfunded) research teams to get big things done. Bob told us about a nifty-looking HTML5 viewer for subsurface data... which I can't find anywhere. And Bill talked about 'mathematical fidelty'. and its application to solving large, expensive problems without creating a lot of intermediate data. His message: the mathematics should provide the API.

New open source tools in geoscience

The standout of the afternoon for me was University of Vienna post-doc Eun Young Lee's talk about BasinVis. The only MATLAB code we saw — so not truly open source, though it might be adapted to GNU Octave — and the only strictly geological package of the day. To support her research, Eun Young has built a MATLAB application for basin analysis, complete with a GUI and some nice visuals. This one shows a geological surface, gridded in the tool, with a thickness map projected onto the 'floor' of the scene:

I'm poorly equipped to write much about the other projects we heard about. For the record and to save you a click, here's the list [with notes] from my 'look ahead' post:

SES3D [presented by Alexey Gokhberg], a package from ETHZ for seismic modeling and inversion.
OpenFOAM [Gérald Debenest], a new open source toolbox for fluid mechanics.
PyGIMLi [Carsten Rücker], a geophysical modeling and inversion package.
PySIT [Laurent Demanet], the Python seismic imaging toolbox that Russell Hewett started while at MIT.
Seismic.jl [Nasser Kazemi] and jInv [Eldad Haber], two [modeling and inversion] Julia packages.

My perception is that there is a substantial amount of overlap between all of these packages except OpenFOAM. If you're into FWI you're spoilt for choice. Several of these projects are at the heart of industry consortiums, so it's a way for corporations to sponsor open source projects, which is awesome. However, most of them said they have closed-source components which only the consortium members get access to, so clearly the messaging around open source — the point being to accelerate innovation, reduce bugs, and increase value for everyone — is missing somewhere. There's still this idea that secrecy begets advantage begets profit, but this idea is wrong. Hopefully the other stuff, which may or may not be awesome, gets out eventually.

I gave a talk at the end of the day, about ways I think we can get better at this 'openness' thing, whatever it is. I will write about that some time soon, but in the meantime you're welcome to see my slides here.

Finally, a little time — two half-hour slots — was set aside for discussion. I'll have a go at summing that up in another post. Stay tuned!

May 27, 2016

Look ahead to EAGE 2016

May 27, 2016/ Matt Hall

I'm in Vienna for the 78th EAGE Conference and Exhibition, at Wien Messe, starting on Sunday. And, of course, for the Subsurface Hackathon, which I've already mentioned a few times.

The hackathon, is, as usual, over the weekend. It starts tomorrow, in this amazing coworking space. That's @JesperDramsch there, getting ready for the hackathon!

I know this doesn't suit everyone, but weekdays don't suit everyone either. I've also always wanted to get people out of 'work mode', into the idea that they can create whatever they want. Maybe we'll try one during the week some time; do let me know what you think about it in the comments. Feedback helps.

Don't worry, you will hear more about the hackathon. Stay tuned.

Open source software in applied geosciences

The first conference event I'll be at is the workshop on open source software. This follows up on similar get-togethers in Copenhagen in 2012 and in Vienna in 2006. I hope the fact that the inter-workshop interval is getting shorter is a sign that open source geoscience software is gaining traction!

The workshop is being organized by Filippo Broggini (of ETH Zürich), Sergey Fomel (University of Texas), Thomas Günther (LIAG Hannover), and Russell Hewett (Total). They have put together a great-looking program. In the morning, Kristofer Tingdahl (CEO of dGB Earth Sciences) will talk about business models for open source. Then Sergey Fomel will update us on Madagascar seismic processing toolbox. Finally, in a series of talks, Jeff Shragge (Univ. Western Australia), Bob Clapp (Stanford), and Bill Symes (Rice) will talk about using Madagascar and other geophysical imaging and inversion tools at a large scale and in parallel.

After lunch, there's a veritable parade of updates and new stuff, with all of these projects checking in:

OpenSeaSeis, which raised a lot of eyebrows in 2012 for its general awesomeness. Now a project at Colorado School of Mines.
SES3D, a package from ETHZ for seismic waveform modeling and inversion.
BasinVis, a MATLAB program for modeling basin fill and subsidence (woo! Open source geology!!)
OpenFOAM, a new open source toolbox for fluid mechanics.
PyGIMLi, a geophysical modeling and inversion package.
PySIT, the Python seismic imaging toolbox that Russell Hewett started while at MIT.
Seismic.jl and jInv (that's j-inv), two Julia packages you need to know about.

Aaaand at the very end of the day, is a talk from your truly on 'stuff we can do to get more open source goodness in geoscience'. I'll post some version of the talk here when I can.

Talks and stuff

I don't have any plans for Tuesday and Wednesday, other than taking in some talks and posters. I'm missing Thursday. Picking talks is hard, especially when there are 15 (yup) parallel sessions,... and that's just the oral presentations. (Hey! Conference organizer people! That's crazy!) These conference apps that get ever-so-slightly-better each year won't be really useful until they include a recommendation engine of some sort. I'd like two kinds of recommendation: "stuff that's aligned with my interests but you will disagree with everyone in there", and "stuff that doesn't seem to be aligned with my interests, but maybe it really is".

Oh and also "stuff that isn't too far away from the room I'm in right now because I only have 80 seconds to get there".

Anyway, I haven't chosen my sessions yet, let alone started to trawl through the talk titles. You can probably guess the session titles — Carbonate Petrophysics, Multiple Attenuation, Optimizing Full Waveform Marine Acquisition for Quantitative Exploration II (just kidding).

There are some special sessions I may seek out. There's one for professional women in geoscience and engineering, and two for young professionals, one of which is a panel discussion. Then there are two 'dedicated sessions': Integrated Data for Geological and Reservoir Models, and Towards Exascale Geophysical Applications, which sounds intriguing... but their programmes look like the usual strings of talks, so I'm not sure why they're singled out. There's also something called EAGE Forum, but I can't tell what that is.

Arbitrary base 10 milestone!

I don't pay as much attention to blog stats as I used to, but there is one number that I've been keeping an eye on lately: the number of posts. This humble little post is the 500th on this blog! Kind of amazing, though I'm not sure what it says about me and Evan, in terms of making sound decisions about how we spend our evenings. I mean, if each post is 600 words,... that's two good-sized novels!

I'm not saying they're good novels...

The images of the Impact HUB Vienna that don't have Jesper in them are CC-BY-SA by the HUB.

May 17, 2016

ORCL vs GOOG: the $9 billion API

May 17, 2016/ Matt Hall

What's this? Two posts about the legal intricacies of copyright in the space of a fortnight? Before you unsubscribe from this definitely-not-a-law-blog, please read on because the case of Oracle America, Inc vs Google, Inc is no ordinary copyright fight. For a start, the damages sought by Oracle in this case [edit: could] exceed $9 billion. And if they win, all hell is going to break loose.

The case is interesting for some other reasons besides the money and the hell breaking loose thing. I'm mostly interested in it because it's about open source software. Specifically, it's about Android, Google's open source mobile operating system. The claim is that the developers of Android copied 37 application programming interfaces, or APIs, from the Java software environment that Sun released in 1995 and Oracle acquired in its $7.4 billion acquisition of Sun in 2010. There were also claims that they copied specific code, not just the interface the code presents to the user, but it's the API bit that's interesting.

What's an API then?

You might think of software in terms of applications like the browser you're reading this in, or the seismic interpretation package you use. But this is just one, very high-level, type of software. Other, much lower-level software runs your microwave. Developers use software to build software; these middle levels contain FORTRAN libraries for tasks like signal processing, tools for making windows and menus appear, and still others for drawing, or checking spelling, or drawing shapes. You can think of an API like a user interface for programmers. Where the user interface in a desktop application might have menus and dialog boxes, the interface for a library has classes and methods — pieces of code that hold data or perform tasks. A good API can be a pleasure to use. A bad API can make grown programmers cry. Or at least make them write a new library.

The Android developers didn't think the Java API was bad. In fact, they loved it. They tried to license it from Sun in 2007 and, when Sun was bought by Oracle, from Oracle. When this didn't work out, they locked themselves in a 'cleanroom' and wrote a runtime environment called Dalvik. It implemented the same API as the Java Virtual Machine, but with new code. The question is: does Oracle own the interface — the method names and syntaxes? Are APIs copyrightable?

I thought this case ended years ago?

It did. Google already won the argument once, on 31 May 2012, when the court held that APIs are "a system or method of operations" and therefore not copyrightable. Here's the conclusion of that ruling:

The original 2012 holding that Google did not violate the copyright Act by copying 37 of Java's interfaces. Click for the full PDF.

But it went to the Federal Circuit Court of Appeals, Google's petition for 'fair use' was denied, and the decision was sent back to the district court for a jury trial to decide on Google's defence. So now the decision will be made by 10 ordinary citizens... none of whom know anything about programming. (There was a computer scientist in the pool, but Oracle sent him home. It's okay --- Google sent a free-software hater packing.)

This snippet from one of my favourite podcasts, Leo Laporte's Triangulation, is worth watching. Leo is interviewing James Gosling, the creator of Java, who was involved in some of the early legal discovery process...

Why do we care about this?

The problem with all this is that, when it come to open source software and the Internet, APIs make the world go round. As the Electronic Frontier Foundation argued on behalf of 77 computer scientists (including Alan Kay, Vint Cerf, Hal Abelson, Ray Kurzweil, Guido van Rossum, and Peter Norvig, ) in its amicus brief for the Supreme Court... we need uncopyrightable interfaces to get computers to cooperate. This is what drove the personal computer explosion of the 1980s, the Internet explosion of the 1990s, and the cloud computing explosion of the 2000s, and most people seem to think those were awesome. The current bot explosion also depends on APIs, but the jury is out (lol) on how awesome that one is.

The trial continues. Google concluded its case yesterday, and Oracle called its first witness, co-CEO Safra Catz. "We did not buy Sun to file this lawsuit," she said. Reassuring, but if they win there's going to be a lot of that going around. A lot.

For a much more in-depth look at the story behind the trial, this epic article by Sarah Jeong is awesome. Follow the rest of the events over the next few days on Ars Technica, Twitter, or wherever you get your news. Meanwhile on Agile*, we will return to normal geophysical programming, I promise :)

ADDENDUM on 26 May 2016... Google won the case with the "fair use" argument. So the appeal court's decision that APIs are copyrightable stands, but the jury were persuaded that this particular instance qualified as fair use. Oracle will appeal.

Blog