September 29, 2017

Hacking in Houston

September 29, 2017/ Matt Hall

Houston 2013
Houston 2014
Denver 2014
Calgary 2015
New Orleans 2015
Vienna 2016
Paris 2017
Houston 2017... The eighth geoscience hackathon landed last weekend!

We spent last weekend in hot, humid Houston, hacking away with a crowd of geoscience and technology enthusiasts. Thirty-eight hackers joined us on the top-floor coworking space, Station Houston, for fun and games and code. And tacos.

Matt kicking things off

Discussing projects

The hacking starts

Coding all day :)

Team coding

Figuring out street-level flooding in Houston

Alexa, how do we make an Alexa skill?

Evening at 8th Wonder Brewing

Back at it with help from NVIDIA

A Poweredge C4130 posing with David Holmes from Dell EMC

Getting ready to podcast

The organizer sofa

Making secret plans :)

NVIDIA Jetson TX1 all ready to go

Swag!

Listening to demos

David MC-ing the demos

Eric Jones from Enthought

Some of the prize-winners with their prizes

Here's a rundown of the teams and what they worked on.

Seismic Imagers

Jingbo Liu (CGG), Zohreh Souri (University of Houston).

Tech — DCGAN in Tensorflow, Amazon AWS EC2 compute.

The team looked for patterns that make seismic data different from other images, using a deep convolutional generative adversarial network (DCGAN). Using a seismic volume and a set of 2D lines, they made 121,000 sub-images (tiles) for their training set.

The Young And The RasLAS

William Sanger (Schlumberger), Chance Sanger (Museum of Fine Arts, Houston), Diego Castañeda (Agile), Suman Gautam (Schlumberger), Lanre Aboaba (University of Arkansas).

State of the art text detection by Google Cloud Vision API

Tech — Google Cloud Vision API, Python flask web app, Scatteract (sort of). Repo on GitHub.

Digitizing well logs is a common industry task, and current methods require a lot of manual intervention. The team's automated pipeline: convert PDF files to images, perform OCR with Google Cloud Vision API to extract headers and log track labels, pick curves using a CNN in TensorFlow. The team implemented the workflow in a Python flask front-end. Check out their slides.

Hutton Rocks

Kamal Hami-Eddine (Paradigm), Didi Ooi (University of Bristol), James Lowell (GeoTeric), Vikram Sen (Anadarko), Dawn Jobe (Aramco).

Tech — Amazon Echo Dot, Amazon AWS (RDS, Lambda).

The team built Hutton, a cloud-based cognitive assistant for gaining more efficient, better insights from geologic data. Project includes integrated cloud-hosted database, interactive web application for uploading new data, and a cognitive assistant for voice queries. Hutton builds upon existing Amazon Alexa skills. Check out their GitHub repo, and slides.

Big data > Big Lore

Licheng Zhang (CGG), Zhenzhen Zhong (CGG), Justin Gosses (Valador/NASA), Jonathan Parker (Marathon)

The team used machine learning to predict formation tops on wireline logs, which would allow for rapid generation of structure maps for exploration play evaluation, save man hours and assist in difficuly formation-top correlations. The team used the AER Athabasca open dataset of 2193 wells (yay, open data!).

Tech — Jupyter Notebooks, SciPy, scikit-learn. Repo on GitHub.

Free near surface

Tien-Huei Wang, Jing Wu, Clement Zhang (Schlumberger).

Multiples are a kind of undesired seismic signal and take expensive modeling to remove. The project used machine learning to identify multiples in seismic images. They attempted to use GAN frameworks, but found it difficult to formulate their problem, turning instead to the simpler problem of binary classification. Check out their slides.

Tech — CNN... I don't know the framework.

The Cowboyz

Mingliang Liu, Mohit Ayani, Xiaozheng Lang, Wei Wang (University of Wyoming), Vidal Gonzalez (Universidad Simón Bolívar, Venezuela).

A tight group of researchers joined us from the University of Wyoming at Laramie, and snagged one of the most enthusiastic hackers at the event, a student from Venezuela called Vidal. The team attempted acceleration of geostatistical seismic inversion using TensorFlow, a central theme in Mingliang's research.

Tech — TensorFlow.

Augur.ai

Altay Sensal (Geokinetics), Yan Zaretskiy (Aramco), Ben Lasscock (Geokinetics), Colin Sturm (Apache), Brendon Hall (Enthought).

Electrical submersible pumps (ESPs) are critical components for oil production. When they fail, they can cause significant down time. Augur.ai provides tools to analyze pump sensor data to predict when pumps when pump are behaving irregularly. Check out their presentation!

Tech — Amazon AWS EC2 and EFS, Plotly Dash, SigOpt, scikit-learn. Repo on GitHub.

The Disaster Masters

Joe Kington (Planet), Brendan Sullivan (Chevron), Matthew Bauer (CSM), Michael Harty (Oxy), Johnathan Fry (Chevron)

Hydrologic models predict floodplain flooding, but not local street flooding. Can we predict street flooding from LiDAR elevation data, conditioned with citizen-reported street and house flooding from U-Flood? Maybe! Check out their slides.

Tech — Python geospatial and machine learning stacks: rasterio, shapely, scipy.ndimage, scikit-learn. Repo on GitHub.

The structure does WHAT?!

Chris Ennen (White Oak), Nanne Hemstra (dGB Earth Sciences), Nate Suurmeyer (Shell), Jacob Foshee (Durwella).

Inspired by the concept of an iPhone 'face ageing' app, Nate recruited a team to poke at applying the concept to maps of the subsurface. Think of a simple map of a structural field early in its life, compared to how it looks after years of interpretation and drilling. Maybe we can preview the 'aged' appearance to help plan where best to drill next to reduce uncertainty!

Tech — OpendTect, Azure ML Studio, C#, self-boosting forest cluster. Repo on GitHub.

Thank you!

Massive thanks to our sponsors — including Pioneer Natural Resources — for their part in bringing the event to life!

More thank-yous

Apart from the participants themselves, Evan and I benefitted from a team of technical support, mentors, and judges — huge thanks to all these folks:

The indefatigable David Holmes from Dell EMC. The man is a legend.
Andrea Cortis from Pioneer Natural Resources.
Francois Courteille and Issam Said of NVIDIA.
Carlos Castro, Sunny Sunkara, Dennis Cherian, Mike Lapidakis, Jit Biswas, and Rohan Mathews of Amazon AWS.
Maneesh Bhide and Steven Tartakovsky of SigOpt.
Dave Nichols and Aria Abubakar of Schlumberger.
Eric Jones from Enthought.
Emmanuel Gringarten from Paradigm.
Frances Buhay and Brendon Hall for help with catering and logistics.
The team at Station for accommodating us.
Frank's Pizza, Tacos-a-Go-Go, Cali Sandwich (banh mi), Abby's Cafe (bagels), and Freebird (burritos) for feeding us.

Finally, megathanks to Gram Ganssle, my Undersampled Radio co-host. Stalwart hack supporter and uber-fixer, Gram came over all the way from New Orleans to help teams make sense of deep learning architectures and generally smooth things over. We recorded an episode of UR at the hackathon, talking to Dawn Jobe, Joe Kington, and Colin Sturm about their respective projects. Check it out!

[Update, 29 Sep & 3 Nov] Some statistics from the event:

39 participants, including 7 women (way too few, but better than 4 out of 63 in Paris)
9 students (and 0 professors!).
12 people from petroleum companies.
18 people from service and technology companies, including 5 from Schlumberger!
13 no-shows, not including folk who cancelled ahead of time; a bit frustrating because we had a long wait list.
Furthest travelled: James Lowell from Newcastle, UK — 7560 km!
98 tacos, 67 burritos, 96 slices of pizza, 55 kolaches, and an untold number of banh mi.

September 14, 2017

Isn't everything on the internet free?

September 14, 2017/ Matt Hall

A couple of weeks ago I wrote about a new publication from Elsevier. The book seems to contain quite a bit of unlicensed copyrighted material, collected without proper permission from public and private groups on LinkedIn, SPE papers, and various websites. I had hoped to have an update for you today, but the company is still "looking into" the matter.

The comments on that post, and on Twitter, raised some interesting views. Like most views, these views usually come in pairs. There is a segment of the community that feels quite enraged by the use of (fully attributed) LinkedIn comments in a book; but many people hold the opposing view, that everything on the Internet is fair game.

I sympathise with this permissive view, to an extent. If you put stuff on the web, people are (one hopes) going to see it, interpret it, and perhaps want to re-use it. If they do re-use it, they may do so in ways you did not expect, or perhaps even disagree with. This is okay — this is how ideas develop.

I mean, if I can't use a properly attributed LinkedIn post as the basis for a discussion, or a YouTube video to illustrate a point, then what's really the point of those platforms? It would undermine the idea of the web as a place for interaction and collaboration, for cultural or scientific evolution.

Freely accessible but not free

Not to labour the point, but I think we all understand that what we put on the Internet is 'out there'. Indeed, some security researchers suggest you should assume that every email you type will be in the local newspaper tomorrow morning. This isn't just 'a feeling', it's built into how the web works. most websites are exclusively composed of strictly copyrighted content, but most websites also have conspicuous buttons to share that copyrighted content — Tweet this, Pin that, or whatever. The signals are confusing... do you want me to share this or not?

One can definitely get carried away with the idea that everything should be free. There's a spectrum of infractions. On the 'everyday abuse' end of things, we have the point of view that grabbing randoms images from the web and putting the URL at the bottom is 'good enough'. Based on papers at conferences, I suspect that most people think this and, as I explained before, it's definitely not true: you usually need permission.

At the other end of the scale, you end up with Sci-Hub (which sounds like it's under pressure to close at the moment) and various book-sharing sites, both of which I think are retrograde and anti-open-access (as well as illegal). I believe we should respect the copyright of others — even that of supposedly evil academic publishers — if we want others to respect ours.

So what's the problem with a bookful of LinkedIn posts and other dubious content? Leaving aside for now the possibility of more serious plagiarism, I think the main problem is simply that the author went too far — it is a wholesale rip-off of 350 people's work, not especially well done, with no added value, and sold for a hefty sum.

Best practice for re-using stuff on the web

So how do we know what is too far? Is it just a value judgment? How do you re-use stuff on the web properly? My advice:

Stop it. Resist the temptation to Google around, grabbing whatever catches your eye.
Re-use sparingly, only using one or two of the real gems. Do you really need that picture of a casino on your slide entitled "Risk and reward"? (No, you definitely don't.)
Make your own. Ideas are not copyrightable, so it might be easier to copy the idea and make the thing you want yourself (giving credit where it's due, of course).
Ask for permission from the creator if you do use someone's stuff. Like I said before, this is only fair and right.
Go open! Preferentially share things by people who seem to be into sharing their stuff.
Respect the work. Make other people's stuff look awesome. You might even...
...improve the work if you can — redraw a diagram, fix a typo — then share it back to them and the community.
Add value. Add real insight, combine things in new ways, surprise and delight the original creators.
And finally, if you're not doing any of these things, you better not be trying to profit from it.

Everything on the Internet is not free. My bet is that you'll be glad of this fact when you start putting your own stuff out there. We can all do our homework and model good practice. This is especially important for those people in influential positions in academia, because their behaviours rub off on so many impressionable people.

We talked to Fernando Enrique Ziegler on the Undersampled Radio podcast last week. He was embroiled in the 'bad book' furore too, in fact he brought it to many people's attention. So this topic came up in the show, as well as a lot of stuff about pore pressure and hurricanes. Check it out...

June 28, 2017

Subsurface Hackathon project round-up, part 1

June 28, 2017/ Evan Bianco

The dust has settled from the Hackathon in Paris two weeks ago. Been there, done that, came home with the T-shirt.

In the same random order they presented their 4-minute demos to our panel of esteemed judges, I present a (very) abbreviated round-up of what the teams made together over the course of the weekend. With the exception of a few teams who managed to spontaneously nucleate before the hackathon, most of these teams were comprised of people who had never met each other before the event.

Just let that sink in for a second: teams of mostly mutual strangers built 13 legit machine-learning-based geoscience applications in one weekend.

An automated well log management system

Team Un-well Loggers: James Wanstall (Glencore), Niket Doshi (Teradata), Joseph Taylor (Teradata), Duncan Irving (Teradata), Jane McConnell (Teradata).

Tech: Kylo (NiFi, HDFS, Hive, Spark)

If you're working with well logs, and if you've got lots of them, you've almost certainly got gaps or inaccuracies from curve to curve and from well to well. The team's scalable, automated well-log file management system Log Healer computes missing logs and heals broken ones. Amazing.

An early result from Team Janus. The image on the left is ground truth, that on the right is predicted. Many of the features are present. Not bad for v0.1!

Meaningful cross sections from well logs

Team Janus: Daniel Buse, Johannes Camin, Paul Gabriel, Powei Huang, Fabian Kampe (all from GiGa Infosystems)

The team built an elegant machine learning workflow to attack the very hard problem of creating geologically realistic cross-section from well logs. The validation algorithm compares pixels to score the result.

Think Section's mindblowing photomicrograph labeling tool can also make novel camouflage patterns.

Paint-by-numbers on digital thin sections

Team Think Section: Diego Castaneda (Agile*), Brendon Hall (Enthought), Roeland Nieboer (Fugro), Jan Niederau (RWTH Aachen), Simon Virgo (RWTH Aachen)

Tech: Python (Scikit Learn, Scikit Image, Flask, NumPy, SciPy, Pandas), AWS for hosting app & Jupyter server.

Description: Mineral classification and point-counting on thin sections can be an incredibly tedious and time consuming task. Team Think Section trained a model to segregate, classify, and label mineral grains in 200GB of high-resolution multi-polarization-angle photomicrographs.

Team Classy's super-impressive shot gather seismic event Detection technology. Left: synthetic gather. Middle: predicted labels. Right: truth.

Event detection on seismic shot gathers

Team Classy: Princy Ikotoko Ndong (EOST), Anna Lim (NTNU), Yuriy Ivanov (NTNU), Song Hou (CGG), Justin Gosses (Valador).

Tech: Python (NumPy, Matplotlib), Jupyter notebooks.

The team created an AI which identifies and labels different events on a shot gather image. It can find direct waves, reflections, multiples or coherent noise. It uses a support vector machine for classification, and is simple and fast.

model2seismic: An entirely new way to do modeling and inversion. Take note: the neural network that made this image knows no physics. — model2seismic: An entirely new way to do modeling and inversion. Take note: the neural network that made this image knows no physics.

Forward and inverse modeling without the physics

Team GANsters - Lukas Mosser (Imperial), Wouter Kimman (Meridian), Jesper Dramsch (Copenhagen), Alfredo de la Fuente (Wolfram), Steve Purves (Euclidity)

Tech: PyNoddy, homegrown Python ML tools.

The GANsters created a deep-learning image-translation-based seismic inversion and forward modelling system. I urge you to go and look at their project on model2seismic. If it doesn't give you goosebumps, you are geophysically inert.

Machine learning for for stratigraphic interpretation

Team Pick Pick LOG - Antoine Vanbesien (EOST), Fidèle Degni (Mines St-Étienne), Massinissa Mesbahi (Pau), Natsuki Gunji (Mines St-Étienne), Cédric Menut (EOST).

This team of data science and geoscience undergrads attacked an automated stratigraphic interpretation task. They used supervised learning to determine lithology from well logs in Alberta's Athabasca play, then attempted to teach their AI to pick stratigraphic tops. Impressive!

Pretty amazing, huh? The power of the hackathon to bring a project from barely-even-an-idea to actual-working-code is remarkable! And we're not even halfway through the teams: tomorrow I'll describe the other seven projects.

June 13, 2017

Le grand hack!

June 13, 2017/ Matt Hall

It happened! The Subsurface Hackathon drew to a magnificent close on Sunday, in an intoxicating cloud of code, creativity, coffee, and collaboration. It will take some beating.

Nine months in gestation, the hackathon was on a scale we have not attempted before. Total E&P joined us as co-organizers and made this new reach possible. They also let us use their amazing Booster — a sort of intrapreneurship centre — which was perfect for the event. Their team (thanks especially to Marine and Caroline!) did an amazing job of hosting, as well as providing several professionals from their subsurface software (thanks Jonathan and Yannick!) and data science teams (thanks Victor and David!). Arnaud Rodde and Frédéric Broust, who had to do some organization hacking of their own to make something as weird as a hackathon happen, should be proud of their teams.

Instead of trying to describe the indescribable, here are some photos:

BY THE NUMBERS

16 hours of code
13 teams
62 hackers
44 students
4 robots
568 croissants
0 lost-time incidents

I won't say much about the projects for now. The diversity was high — there were projects in thin section photography, 3D geological modeling, document processing, well log prediction, seismic modeling and inversion, and fault detection. All of the projects included some kind of machine learning, and again there was diversity there, including several deep learning applications. Neural networks are back!

Feel the buzz!

If you are curious, Gram and I recorded a quick podcast and interviewed a few of the teams:

It's going to take a few days to decompress and come down from the high. In a couple of weeks I'll tell you more about the projects themselves, and we'll edit the photos and post the best ones to Flickr (and in the meantime there are a few more pics there already).

Thank you to the sponsors!

Last thing: we couldn't have done any of this without the support of Dell EMC. David Holmes has been a rock for the hackathon project over the last couple of years, and we appreciate his love of community and code! Thank you too to Duncan and Jane at Teradata, Francois at NVIDIA, Peter and Jon at Amazon AWS, and Gram at Sandstone for all your support. Dear reader: please support these organizations!

UPDATE: 2 follow-up posts

June 02, 2017

Conversation not discussion

June 02, 2017/ Matt Hall

It's a while since we had a 'conferences are broken' rant on the Agile blog!

Five or six of the sessions at this year's conference were... different. I already mentioned the Value In Geophysics session, which was a cross between a regular series of talks and a panel discussion. I went to another, The modern geoscientist, which was structured the same way. A third one, Fundamentals of Professional Career Branding, was a mini workshop with Jackie Rafter of Higher Landing. There were at least a couple of other such sessions.

It's awesome to see the societies experimenting with something outside the usual plethora of talks and posters. I hope they were well received, because we need more of this in our discipline, now more than ever. If you went to one and enjoyed it, please let the organizers know.

But... the sessions — especially the panel discussion sessions — lacked something. One thing really:

The sessions we saw were nowhere near participatory enough. Not even close.

The 'expert-panel-enlightens-audience' pattern is slowing us down, perpetuating broken models of leadership and hierarchy. There isn't an expert in Calgary or the universe that knows how or when this downturn is going to end, or what we need to do to improve our chances of continuing to contribute to society and make a living in our profession. So please, stop throwing people up on a stage, making them give 5 minute presentations, and occasionally asking for questions from the audience. That is nothing like a discussion. Tune in to a political debate show to see what those look like: rapid-fire, punchy, controversial. In short: interesting. And, from an organizer's point of view, really hard, which is why we should stop.

Real conversation

What I think is really needed right now, more than half-baked expert discussion, is conversation. Conversations happen between small groups of people, all sitting on the same plane, around a table, with napkins to draw on and time to draw on them. They connect people and spread awesome ideas like viruses. What's more, great conversations have outcomes.

I don't want to claim that Agile has all this figured out, but we have demonstrated various ways of connecting scientists in meaningful ways and with lasting outcomes. We've also written extensively on the subject (e.g. here and here and here and here). Other verticals have conducted many more experiments, and documented the results. Humans know how to do this.

So there's no excuse — it's not too dramatic to call the current 'situation' a crisis in our profession in Canada — so we need to get beyond tinkering at the edges and half-hearted attempts at change. Our societies need to pay attention to what's needed, and get on with making it happen.

Still more ranting...

We talked about this topic at some length on the Undersampled Radio podcast yesterday. Here's the uncut video version:

March 07, 2017

Unearthing gold in Toronto

March 07, 2017/ Matt Hall

I just got home from Toronto, the mining capital of the world, after an awesome weekend hacking with Diego Castañeda, a recent PhD grad in astrophysics that is working with us) and Anneya Golob (another astrophysicist and Diego's partner). Given how much I bang on about hackathons, it might surprise you to know that this was the first hackathon I have properly participated in, without having to order tacos or run out for more beer every couple of hours.

PArticipants being briefed by one of the problem sponsors on the first evening.

What on earth is Unearthed?

The event (read about it) was part of a global series of hackathons organized by Unearthed Solutions, a deservedly well-funded non-profit based in Australia that is seeking to disrupt every single thing in the natural resources sector. This was their fourteenth event, but their first in Canada. Remarkably, they got 60 or 70 hackers together for the event, which I know from my experience organizing events takes a substantial amount of work. Avid readers might remember us mentioning them before, especially in a guest post by Jelena Markov and Tom Horrocks in 2014.

A key part of Unearthed's strategy is to engage operating companies in the events. Going far beyond mere sponsorship, Barrick Gold sent several mentors to the event, the Chief Innovation Officer Michelle Ash, as well as two judges, Ed Humphries (head of digital transformation) and Iain Allen (head of digital mining). Barrick provided the chellenge themes, as well as data and vivid descriptions of operational challenges. The company was incredibly candid with the participants, and should be applauded for its support of what must have felt like a pretty wild idea.

Team Auger Effect: Diego and Anneya hacking away on Day 2.

What went down?

It's hard to describe a hackathon to someone who hasn't been to one. It's like trying to describe the Grand Canyon, ice climbing, or a 1985 Viña Tondonia Rioja. It's always fun to see and hear the reactions of the judges and other observers that come for the demos in the last hours of the event: disbelief at what small groups of humans can do in a weekend, for little tangible reward. It flies in the face of everything you think you know about creativity, productivity, motivation, and collaboration. Not to mention intellectual property.

As the fifteen (!) teams made their final 5-minute pitches, it was clear that every single one of them had created something unique and useful. The judges seemed genuinely blown away by the level of accomplishment. It's hard to capture the variety, but I'll have a go with a non-comprehensive list. First, there was a challenge around learning from geoscience data:

BGC Engineering, one of the few pro teams and First Place winner, produced an impressive set of tools for scraping and analysing public geoscience data. I think it was a suite of desktop tools rather than a web application.
Mango (winners of the Young Innovators award), Smart Miner (second place overall), Crater Crew, Aureka, and Notifyer and others presented map-based browsers for public mining data, with assistance from varying degrees of machine intelligence.
Auger Effect (me, Diego, and Anneya) built a three-component system consisting of a browser plugin, an AI pipeline, and a social web app, for gathering, geolocating, and organizing data sources from people as they research.

The other challenge was around predictive maintenance:

Tyrelyze, recognizing that two people a year are killed by tyre failures, created a concept for laser scanning haul truck tyres during operations. These guys build laser scanners for core, and definitely knew what they were doing.
Decelerator (winners of the People's Choice award) created a concept for monitoring haul truck driving behaviour, to flag potentially expensive driving habits.
Snapfix.io looked at inventory management for mine equipment maintenance shops.
Arcana, Leo & Zhao, and others looked at various other ways of capturing maintenance and performace data from mining equipment, and used various strategies to try to predict

I will try to write some more about the thing we built... and maybe try to get it working again! The event was immensely fun, and I'm so glad we went. We learned a huge amount about mining too, which was eye-opening. Massive thanks to Unearthed and to Barrick on all fronts. We'll be back!

Brad BEchtold of Cisco (left) presenting the Young Innovator award for under-25s to Team Mango.

The winners of the People's Choice Award, Team Decelerate.

The winners of the contest component of the event, BGC Engineering, with Ed Humphries of Barrick (left).

UPDATE View all the results and submissions from the event.

Wish there was a hackathon just for geoscientists and subsurface engineers?
You're in luck! Join us in Paris for the Subsurface Hackathon — sponsored by Dell EMC, Total E&P, NVIDIA, Teradata, and Sandstone. The theme is machine learning, and registration is open. There's even a bootcamp for anyone who'd like to pick up some skills before the hack.

February 02, 2017

No secret codes: announcing the winners

February 02, 2017/ Matt Hall

The SEG / Agile / Enthought Machine Learning Contest ended on Tuesday at midnight UTC. We set readers of The Leading Edge the challenge of beating the lithology prediction in October's tutorial by Brendon Hall. Forty teams, mostly of 1 or 2 people, entered the contest, submitting several hundred entries between them. Deadlines are so interesting: it took a month to get the first entry, and I received 4 in the second month. Then I got 83 in the last twenty-four hours of the contest.

How it ended

	Team	F1	Algorithm	Language	Solution
1	LA_Team (Mosser, de la Fuente)	0.6388	Boosted trees	Python	Notebook
2	PA Team (PetroAnalytix)	0.6250	Boosted trees	Python	Notebook
3	ispl (Bestagini, Tuparo, Lipari)	0.6231	Boosted trees	Python	Notebook
4	esaTeam (Earth Analytics)	0.6225	Boosted trees	Python	Notebook

The winners are a pair of graduate petroelum engineers, Lukas Mosser (Imperial College, London) and Alfredo de la Fuente (Wolfram Research, Peru). Not coincidentally, they were also one of the more, er, energetic teams — it's say to say that they explored a good deal of the solution space. They were also very much part of the discussion about the contest on GitHub.com and on the Software Underground Slack chat group, aka Swung (you're in there, right?).

I will be sending Raspberry Shakes to the winners, along with some other swag from Enthought and Agile. The second-place team will receive books from SEG (thank you SEG Book Mart!), and the third-place team will have to content themselves with swag. That team, led by Paolo Bestagini of the Politecnico di Milano, deserves special mention — their feature engineering approach was very influential, being used by most of the top-ranking teams.

Coincidentally Gram and I talked to Lukas on Undersampled Radio this week:

Back up a sec, what the heck is a machine learning contest?

To enter, a team had to predict the lithologies in two wells, given wireline logs and other data. They had complete data, including lithologies, in nine other wells — the 'training' data. Teams trained a wide variety of models — from simple nearest neighbour models and support vector machines, to sophisticated deep neural networks and random forests. These met with varying success, with accuracies ranging between about 0.4 and 0.65 (i.e., error rates from 60% to 35%). Here's one of the best realizations from the winning model:

One twist that made the contest especially interesting was that teams could not just submit their predictions — they had to submit the code that made the prediction, in the open, for all their fellow competitors to see. As a result, others were quickly able to adopt successful strategies, and I'm certain the final result was better than it would have been with secret code.

I spent most of yesterday scoring the top entries by generating 100 realizations of the models. This was suggested by the competitors themselves as a way to deal with model variance. This was made a little easier by the fact that all of the top-ranked teams used the same language — Python — and the same type of model: extreme gradient boosted trees. (It's possible that the homogeneity of the top entries was a negative consequence of the open format of the contest... or maybe it just worked better than anything else.)

What now?

There will be more like this. It will have something to do with seismic data. I hope I have something to announce soon.

I (or, preferably, someone else) could write an entire thesis on learnings from this contest. I am busy writing a short article for next month's Leading Edge, so if you're interested in reading more, stay tuned for that. And I'm sure there wil be others.

If you took part in the contest, please leave a comment telling about your experience of it or, better yet, write a blog post somewhere and point us to it.

December 06, 2016

Le meilleur hackathon du monde

December 06, 2016/ Matt Hall

Hackathons are short bursts of creative energy, making things that may or may not turn out to be useful. In general, people work in small teams on new projects with no prior planning. The goal is to find a great idea, then manifest that idea as something that (barely) works, but might not do very much, then show it to other people.

Hackathons are intellectually and professionally invigorating. In my opinion, there's no better team-building, networking, or learning event.

The next event will be 10 & 11 June 2017, right before the EAGE Conference & Exhibition in Paris. I hope you can come.

see this event on Eventbrite

The theme for this event will be machine learning. We had the same theme in New Orleans in 2015, but suffered a bit from a lack of data. This time we will have a collection of open datasets for participants to build off, and we'll prime hackers with a data-and-skills bootcamp on Friday 9 June. We did this once before in Calgary – it was a lot of fun.

Can you help?

It's my goal to get 52 participants to this edition of the event. But I'll need your help to get there. Please share this post with any friends or colleagues you think might be up for a weekend of messing about with geoscience data and ideas.

Other than participants, the other thing we always need is sponsors. So far we have three organizations sponsoring the event — Dell EMC is stepping up once again, thanks to the unstoppable David Holmes and his team. And we welcome Sandstone — thank you to Graham Ganssle, my Undersampled Radio co-host, who I did not coerce in any way.

If your organization might be awesome enough to help make amazing things happen in our community, I'd love to hear from you. There's info for sponsors here.

If you're still unsure what a hackathon is, or what's so great about them, check out my November article in the Recorder (Hall 2015, CSEG Recorder, vol 40, no 9).

September 01, 2016

52 Things... Rock Physics

September 01, 2016/ Matt Hall

There's a new book in the 52 Things family!

52 Things You Should Know About Rock Physics is out today, and available for purchase at Amazon.com. It will appear in their European stores in the next day or two, and in Canada... well, soon. If you can't wait for that, you can buy the book immediately direct from the printer by following this link.

The book mines the same vein as the previous volumes. In some ways, it's a volume 2 of the original 52 Things... Geophysics book, just a little bit more quantitative. It features a few of the same authors — Sven Treitel, Brian Russell, Rachel Newrick, Per Avseth, and Rob Simm — but most of the 46 authors are new to the project. Here are some of the first-timers' essays:

Ludmilla Adam, Why echoes fade.
Arthur Cheng, How to catch a shear wave.
Peter Duncan, Mapping fractures.
Paul Johnson, The astonishing case of non-linear elasticity.
Chris Liner, Negative Q.
Chris Skelt, Five questions to ask the petrophysicist.

It's our best collection of essays yet. We're very proud of the authors and the collection they've created. It stretches from childhood stories to linear algebra, and from the microscope to seismic data. There's no technical book like it.

Supporting Geoscientists Without Borders

Purchasing the book will not only bring you profund insights into rock physics — there's more! Every sale sends $2 to Geoscientists Without Borders, the SEG charity that supports the humanitarian application of geoscience in places that need it. Read more about their important work.

It's been an extra big effort to get this book out. The project was completely derailed in 2015, as we — like everyone else — struggled with some existential questions. But we jumped back into it earlier this year, and Kara (the managing editor, and my wife) worked her magic. She loves working with the authors on proofs and so on, but she doesn't want to see any more equations for a while.

If you choose to buy the book, I hope you enjoy it. If you enjoy it, I hope you share it. If you want to share it with a lot of people, get in touch — we can help. Like the other books, the content is open access — so you are free to share and re-use it as you wish.

August 04, 2016

The sound of the Software Underground

August 04, 2016/ Matt Hall

If you are a geoscientist or subsurface engineer, and you like computery things — in other words, if you read this blog — I have a treat for you. In fact, I have two! Don't eat them all at once.

Software Underground

Sometimes (usually) we need more diversity in our lives. Other times we just want a soul mate. Or at least someone friendly to ask about that weird new seismic attribute, where to find a Python library for seismic imaging, or how to spell Kirchhoff. Chat rooms are great for those occasions, Slack is where all the cool kids go to chat, and the Software Underground is the Slack chat room for you.

It's free to join, and everyone is welcome. There are over 130 of us in there right now — you probably know some of us already (apart from me, obvsly). Just go to http://swung.rocks/ to sign up, and we will welcome you at the door with your choice of beverage.

To give you a flavour of what goes on in there, here's a listing of the active channels:

#python — for people developing in Python
#sharp-rocks — for people developing in C# or .NET
#open-geoscience — for chat about open access content, open data, and open source software
#machinelearning — for those who are into artificial intelligence
#busdev — collaboration, subcontracting, and other business opportunities
#general — chat about anything to do with geoscience and/or computers
#random — everything else

Undersampled Radio

If you have a long commute, or occasionally enjoy being trapped in an aeroplane while it flies around, you might have discovered the joy of audiobooks and podcasts. You've probably wished many times for a geosciencey sort of podcast, the kind where two ill-qualified buffoons interview hyper-intelligent mega-geoscientists about their exploits. I know I have.

Well, wish no more because Undersampled Radio is here! Well, here:

iTunes — subscribe in the app on your device. Best for iOS.
Google Play Music — subscribe in the app. Best for Android.
UndersampledRad.io — the website and RSS feed.

The show is hosted by New Orleans-based geophysicist Graham Ganssle and me. Don't worry, it's usually not just us — we talk to awesome guests like geophysicists Mika McKinnon and Maitri Erwin, geologist Chris Jackson, and geopressure guy Mark Tingay. The podcast is recorded live every week or three in Google Hangouts on Air — the link to that, and to show notes and everything else — is posted by Gram in the #undersampled Software Underground channel. You see? All these things are connected, albeit in a nonlinear, organic, highly improbable way. Pseudoconnection: the best kind of connection.

Indeed, there is another podcast pseudoconnected to Software Underground: the wonderful Don't Panic Geocast — hosted by John Leeman and Shannon Dulin — also has a channel: #dontpanic. Give their show a listen too! In fact, here's a show we recorded together!

Don't have an hour right now? OK, you asked for it, here's a clip from that show to get you started. It starts with John Leeman explaining what Fun Paper Friday is, and moves on to one of my regular rants about conferences...

In case you're wondering, neither of these projects is explicitly connected to Agile — I am just involved in both of them. I just wanted to clear up any confusion. Agile is not a podcast company, for the time being anyway.

Blog