The Scottish hackathon

On 16−18 November the UK Oil & Gas Authority (OGA) hosted its first hackathon, with Agile providing the format and technical support. This followed a week of training the OGA provided — again, through Agile — back in September. The theme for the hackathon was ‘machine learning’, and I’m pretty sure it was the first ever geoscience hackathon in the UK.

Thirty-seven digital geoscientists participated in the event at Robert Gordon University; most of them appear below. Many of them had not coded at all before the bootcamp on Friday, so a lot of people were well outside their comfort zones when we sat down on Saturday. Kudos to everyone!

The projects included the usual mix of seismic-based tasks, automated well log picking, a bit of natural language processing, some geospatial processing, and seals (of the mammalian variety). Here’s a rundown of what people got up to:


Counting seals on Scottish islands

Seal Team 6: Julien Moreau, James Mullins, Alex Schaaf, Balazs Kertesz, Hassan Tolba, Tom Buckley.

Project: Julien arrived with a cool dataset: over 6000 seals located on two large TIFFs images of Linga Holm, an island off Stronsay in the Orkneys. The challenge: locate the seals automatically. The team came up with a pipeline to generate HOG descriptors, train a support vector machine on about 20,000 labelled image tiles, then scan the large TIFFs to try to identify seals. Shown here is the output of one such scan, with a few false positive and false negatives. GitHub repo.

This project won the Most Impact award.

seals_test_image.png

Automatic classification of seismic sections

Team Seis Class: Jo Bagguley, Laura Bardsley, Chio Martinez, Peter Rowbotham, Mike Atkins, Niall Rowantree, James Beckwith.

Project: Can you tell if a section has been spectrally whitened? Or AGC’d? This team set out to attempt to teach a neural network the difference. As a first step, they reduced it to a binary classification problem, and showed 110 ‘final’ and 110 ‘raw’ lines from the OGA ESP 2D 2016 dataset to a convolutional neural net. The AI achieved an accuracy of 98% on this task. GitHub repro.

This project won recognition for a Job Well Done.


Why do get blocks relinquished?

Team Relinquishment Surprise: Tanya Knowles, Obiamaka Agbaneje, Kachalla Aliyuda, Daniel Camacho, David Wilkinson (not pictured).

Project: Recognizing the vast trove of latent information locked up in the several thousand reports submitted to the OGA. Despite focusing on relinquishment, they quickly discovered that most of the task is to cope with the heterogeneity of the dataset, but they did manage to extract term frequencies from the various Conclusions sections, and made an ArcGIS web app to map them.

relinquishment_team.jpg

Recognizing reflection styles on seismic

Team What’s My Seismic? Quentin Corlay, Tony Hallam, Ramy Abdallah, Zhihua Cui, Elia Gubbala, Amechi Halim.

Project: The team wanted to detect the presence of various seismic facies in a small segment of seismic data (with a view to later interpreting entire datasets). They quickly generated a training dataset, then explored three classifiers: XGBoost, Google’s AutoML, and a CNN. All of the methods gave reasonable results and were promising enough that the team vowed to continue investigating the problem. Project website. GitHub repo.

This project won the Best Execution award.

whats-my-seismic.png

Stretchy-squeezey well log correlation

Team Dynamic Depth Warping: Jacqueline Booth, Sarah Weihmann, Khaled Muhammad, Sadiq Sani, Rahman Mukras, Trent Piaralall, Julio Rodriguez.

Project: Making picks and correlations in wireline data is hard, partly because the stratigraphic signal changes spatially — thinning and thickening, and with missing or extra sections. To try to cope with this, the team applied a dynamic time (well, depth) warping algorithm to the logs, then looking for similar sections in adjacent wells. The image shows a target GR log (left) with the 5 most similar sections. Two, maybe four, of them seem reasonable. Next the team planned to incorporate more logs, and attach probabilities to the correlations. Early results looked promising. GitHub repo.


Making lithostrat picks

Team Marker Maker: Nick Hayward, Frédéric Ramon, Can Yang, Peter Crafts, Malcolm Gall

Project: The team took on the task of sorting out lithostratigraphic well tops in a mature basin. But there are speedbumps on the road to glory, e.g. recognizing which picks are lithological (as opposed to chronological), and which pick names are equivalent. The team spent time on various subproblems, but there’s a long road ahead.

This project won recognition for a Job Well Done.

marker-maker.jpg

Alongside these projects, Rob and I floated around trying to help, and James Beckwith hacked on a cool project of his own for a while — Paint By Seismic, a look at unsupervised classification on seismic sections. In between generating attributes and clustering, he somehow managed to help and mentor most of the other teams — thanks James!

Thank you!

Thank you to The OGA for these events, and in particular to Jo Bagguley, whose organizational skills I much appreciated over the last few weeks (as my own skills gradually fell apart). The OGA’s own Nick Richardson, the OGTC’s Gillian White, and Robert Gordon Universty’s Eyad Elyan acted as judges.

These organizations contributed to the success of these events — please say Thank You to them when you can!

oga-sponsors.png

I’ll leave you with some more photos from the event. Enjoy!

Machine learning goes mainstream

At our first machine-learning-themed hackathon, in New Orleans in 2015, we had fifteen hackers. TImes were hard in the industry. Few were willing or able to compe out and play. Well, it’s now clear that times have changed! After two epic ML hacks last year (in Paris and Houston), at which we hosted about 115 scientists, it’s clear this year is continuing the trend. Indeed, by the end of 2018 we expect to have welcomed at least 240 more digital scientists to hackathons in the US and Europe.

Conclusion: something remarkable is happening in our field.

The FORCE hackathon

Last Tuesday and Wednesday, Agile co-organized the FORCE Machine Learning Hackathon in Stavanger, Norway. FORCE is a cross-industry geoscience organization, coordinating meetings and research in subsurface. The event preceeded a 1-day symposium on the same theme: machine learning in geoscience. And it was spectacular.

Get a flavour of the spectacularness in Alessandro Amato’s beautiful photographs:

Fifty geoscientists and engineers spent two days at the Norwegian Petroleum Directorate (NPD) in Stavanger. Our hosts were welcoming, accommodating, and generous with the waffles. As usual, we gently nudged the participants into teams, and encouraged them to define projects and find data to work on. It always amazes me how smoothly this potentially daunting task goes; I think this says something about the purposefulness and resourcefulness of our community.

Here’s a quick run-down of the projects:

  • Biostrat! Geological ages from species counts.

  • Lost in 4D Space. Pressure drawdown prediction.

  • Virtual Metering. Predicting wellhead pressure in real time.

  • 300 Wells. Extracting shows and uncertainty from well reports.

  • AVO ML. Unsupervised machine learning for more geological AVO.

  • Core Images. Grain size and lithology from core photos.

  • 4D Layers. Classification engine for 4D seismic data.

  • Gully Attack. Strat trap picking with deep reinforcement learning.

  • sketch2seis. Turning geological cartoons into seismic with pix2pix.

I will do a complete review of the projects in the coming few days, but notice the diversity here. Five of the projects straddle geological topics, and five are geophysical. Two or three involve petroleum engineering issues, while two or three move into sed/strat. We saw natural language processing. We saw random forests. We saw GANs, VAEs, and deep reinforcement learning. In terms of input data, we saw core photos, PDF reports, synthetic seismograms, real-time production data, and hastily assembled label sets. In short — we saw everything.

Takk skal du ha

Many thanks to everyone that helped the event come together:

  • Peter Bormann, the mastermind behind the symposium, was instrumental in making the hackathon happen.

  • Grete Block Vargle (AkerBP) and Pernille Hammernes (Equinor) kept everyone organized and inspired.

  • Tone Helene Mydland (NPD) and Soelvi Amundrud (NPD) made sure everything was logistically honed.

  • Eva Halland (NPD) supported the event throughout and helped with the judging.

  • Alessandro Amato del Monte (Eni) took some fantastic photos — as seen in this post.

  • Diego Castaneda and Rob Leckenby helped me on the Agile side of things, and helped several teams.

And a huge thank you to the sponsors of the event — too many to name, but here they all are:

all_small.png

There’s more to come!

If you’re reading this thinking, “I’d love to go to a geoscience hackathon”, and you happen to live in or near the UK, you’re in luck! There are two machine learning geoscience hackathons coming up this fall:

Don’t miss out! Get signed up and we’ll see you there.

Results from the AAPG Machine Learning Unsession

Click here to visit the Google Doc write-up

Click here to visit the Google Doc write-up

Back in May, I co-hosted a different kind of conference session — an 'unsession' — at the AAPG Annual Conference and Exhibition in Salt Lake City, Utah. It was successful in achieving its main goal, which was to show the geoscience community and AAPG organizers a new way of collaborating, networking, and producing tangible outcomes from conference sessions.

It also succeeded in drawing out hundreds of ideas and questions around machine learning in geoscience. We have now combed over what the 120 people (roughly) produced on that afternoon, written it up in a Google Doc (right), and present some highlights right here in this post.

Click here to visit the Flickr photo album.

Click here to visit the Flickr photo album.

The unsession had three phases:

  1. Exploring current and future skills for geoscientists.

  2. Asking about the big questions in machine learning in geoscience.

  3. Digging into some of those questions.

Let's look at each one in turn.


skills_blog.jpg

Current and future skills

As an icebreaker, we asked everyone to list three skills they have that set them apart from others in their teams or organizations — their superpowers, if you will. They wrote these on green Post-It notes. We also asked for three more skills they didn't have today, but wanted to acquire in the next decade or so. These went on orange Post-Its. We were especially interested in those skills that felt intimidating or urgent. The 8 or 10 people at each table then shared these with each other, by way of introducing themselves.

The skills are listed in this Google Sheets document.

Unsurprisingly, the most common 'skills I have' were around geoscience: seismic interpretation, seismic analysis, stratigraphy, engineering, modeling, sedimentology, petrophysics, and programming. And computational methods dominated the 'skills I want' category: machine learning, Python, coding or programming, deep learning, statistics, and mathematics.

We followed this up with a more general question — How would you rate the industry's preparedness for this picture of the future, as implied by the skill gap we've identified?. People could substitute 'industry' for whatever similar scale institution felt meaningful to them. As shown (right), this resulted in a bimodal distribution: apparently there are two ways to think about the future of applied geoscience — this may merit more investigation with a more thorough survey.

Get the histogram data.

preparedness_histogram.png

Big questions in ML

After the icebreaker, we asked the tables to respond to a big question:

What are the most pressing questions in applied geoscience that can probably be tackled with machine learning?

We realized that this sounds a bit 'hammer looking for a nail', but justified asking the question this way by drawing an anology with other important new tools of the past — well logging, or 3D seismic, or sequence stratigrapghy. The point is that we have this powerful new (to us) set of tools; what are we going to look at first? At this point, we wanted people to brainstorm, without applying constraints like time or money.

This yielded approximately 280 ideas, all documented in the Google Sheet. Once the problems had been captured, the tables rotated so that each team walked to a neighboring table, leaving all their problems behind... and adopting new ones. We then asked them to score the new problems on two axes: scope (local vs global problems) and tractability (easy vs hard problems). This provided the basis for each table to choose one problem to take to the room for voting (each person had 9 votes to cast). This filtering process resulted in the following list:

  1. How do we communicate error and uncertainty when using machine learning models and solutions? 85 votes.

  2. How do we account for data integration, integrity, and provenance in our models? 78 votes.

  3. How do we revamp the geoscience curriculum for future geoscientists? 71 votes.

  4. What does guided, searchable, legacy data integration look like? 68 votes.

  5. How can machine learning improve seismic data quality, or provide assistive technology on poor data? 65 votes.

  6. How does the interpretability of machine learning model predictions affect their acceptance? 54 votes.

  7. How do we train a model to assign value to prospects? 51 votes.

  8. How do we teach artificial intelligences foundational geology? 45 votes.

  9. How can we implement automatic core description? 42 votes.

  10. How can we contain bad uses of AI? 40 votes.

  11. Is self-steering well drilling possible? 21 votes.

I am paraphrasing most of those, but you can read the originals in the Google Sheet data harvest.


Exploring the questions

In the final stage of the afternoon, we took the top 6 questions from the list above, and dug into them a little deeper. Tables picked their way through our Solution Sketchpads — especially updated for machine learning problems — to help them navigate the problems. Clearly, these questions were too enormous to make much progress in the hour or so left in the day, but the point here was to sound out some ideas, identify some possible actions, and connect with others interested in working on the problem.

One of the solution sketches is shown here (right), for the Revamp the geoscience curriculum problem. They discussed the problem animatedly for an hour.

This team included — among others — an academic geostatistician, an industry geostatistician, a PhD student, a DOE geophysicist, an SEC geologist, and a young machine learning brainbox. Amazingly, this kind of diversity was typical of the tables.

See the rest of the solution sketches in Flickr.


That's it! Many thanks to Evan Bianco for the labour of capturing and digitizing the data from the event. Thanks also to AAPG for the great photos, and for granting them an open license. And thank you to my co-chairs Brendon Hall and Yan Zaretskiy of Enthought, and all the other folks who helped make the event happen — see the Productive chaos post for details.

To dig deeper, look for the complete write up in Google Docs, and the photos in Flickr


calendar.png

Just a reminder... if it's Python and machine learning skills you want, we're running a Summer School in downtown Houston the week of 13 August. Come along and get your hands on the latest in geocomputing methods. Suitable for beginners or intermediate programmers.

Don't miss out! Find out more or register now.

Woo yeah perfect: hacking in Salt Lake City

Thirty geoscientist-coders swarmed into Salt Lake City this past weekend to hack at Church & State, a co-working space in a converted church. There, we spent two days appealing to the almighty power of machine learning.

Nine teams worked on the usual rich variety of projects around the theme. Projects included AIs that pick unconformities, natural language processing to describe stratigraphy, and designing an open data platform in service of machine learning. 

I'll do a run-down of the projects soon, but if you can't wait until then for my summary, you can watch the demos here; the first presentation starts at the 38 minute mark of the video. And you can check out some pictures from the event:

Pictures can say a lot but a few simple words, chosen at the right time, can speak volumes too. Shortly before we launched the demos, we asked the participants to choose words that best described how they were feeling. Here's what we got:

word_cloud_menti_SLC.png

Each participant was able to submit three responses, and although we aren't able to tell who said what, we were able to scrape the data and look at each person's chosen triplet of words. A couple of noteworthy ones were: educated, naptimeinspired and the expressive woo, yeah, perfect. But my personal favorite, by far, has to be the combination of: dead, defeated, inspired.

The creative process can be a rollercoaster of emotions. It's not easy. It's not always comfortable. Things don't always work out. But that's entirely ok. Indeed, facing up to this discomfort, as individuals and as organizations, is a necesary step in the path to digital transformation.

Enough Zen! To all the participants who put in the hard work this weekend, and to our wonderful sponsors who brought all kinds of support, I thank you and I salute you.

sponsors.png

An invitation to start something

Most sessions at your average conference are about results — the conclusions and insights from completed research projects. But at AAPG this year, there's another kind of session, about beginnings. The 'Unsession' is back!

   Machine Learning Unsession
   Room 251 B/C, 1:15 pm, Wednesday 23 May

The topic is machine learning in geoscience. My hope is that there's a lot of emphasis on geological problems, especially in stratigraphy. But we don't know exactly where it will go, because the participants themselves will determine the topic and direction of the session.

Importantly, most of the session will not involve technical discussion. It's not a computational geology session. It's a session for everyone — we absolutely need input from anyone who's interested in how computers can help us do geoscience.

What to expect

Echoing our previous unconference-style sessions, here's the vibe my co-hosts (Brendon Hall and Yan Zaretskiy of Enthought) and I are going for:

  • Conferences are too one-way, too passive. We want more action, more tangible outcomes.
  • We want open, honest, inclusive conversations about our science, and our technical challenges. Bring your most courageous, opinionated, candid self. The stuff you’re scared to mention, or you’d normally only talk about over a beer? Bring that.
  • Listen with an open mind. The minute you think you’re right, you’ve checked out of the conversation.
  • Whoever shows up — they are the right people. (This is a rule of Open Space Technology.)
  • What happens is the only thing that could have happened. (This is a rule of Open Space Technology.)
  • There is no finish line; when it's over, it's over.
  • What we are doing is not definitive. It's just a thing that we're doing.

The session is an experiment. Failure is most definitely an option, just the least desirable one. Conversely, perfection is the least likely outcome.

If you're going to AAPG this year, I hope you'll come along to this conversation. Bring a friend!


Here's a reminder of the very first Unsession that Evan and I facilitated, way back in 2013. Argh, that's 5 years ago...

Looking ahead to SEG

SEGAM-logo-2017.jpg

The SEG Annual Meeting is coming up. Next week sees the festival of geophysics return to the global energy capital, shaken and damp but undefeated after its recent battle with Hurricane Harvey. Even though Agile will not be at the meeting this year, I wanted to point out some highlights of the week.

The Annual Meeting

The meeting will be big, as usual: 108 talk sessions, and 50 poster and e-presentation sessions. I have no idea how many presentations we're talking about but suffice to say that there's a lot. Naturally, there's a machine learning session, with the following talks:

The Geophysics Hackathon

Even though we're not at the conference, we are in Houston this weekend — for the latest edition of the Geophysics Hackathon! The focus was set to be firmly on 'machine learning', but after the hurricane, we added the theme of 'disaster recovery and mitigation'. People are completely free to choose whatever project they'd like to work on; we'll be ready to help and advise on both topics. We also have some cool gear to play with: a Dell C4130 with 4 x NVIDIA P100s, NVIDIA Jetson TX1s, Amazon Echo Dots, and a Raspberry Shake. Many, many thanks to Dell EMC and Pioneer Natural Resources and all our other sponsors:

sponsors_tight.png

If you're one of the 70 or so people coming to this event, I'm looking forward to seeing you there... if you're not, then I'm looking forward to telling you all about it next week.


Petrel User Group

icons-petrel.png

Jacob Foshee and Durwella are hosting a Petrel User Group meetup at The Dogwood, which is in midtown (not far from downtown). If you're a user of Petrel — power user or beginner, it doesn't matter — and you're interested in making the most of technology, it'd be good to see you there. Apart from anything else, you'll get to meet Jacob, who is one of those people with technology superpowers that you never know when you might need.


Rock Physics Reception

Tuesday If you've never been to the famous Rock Physics Reception, then you're missing out. It's your best shot at bumping into the luminaries of rock physics — Colin Sayers, Stefan Gelinsky, Per Avseth, Marco Perez, Bill Goodway, Tad Smith — you know the sort of thing. If the first thing you think about when you wake up in the morning is Lamé's second parameter, RSVP right now. Hurry: there are only a handful of spots left.


There's more! Don't miss:

  • The Women's Network Breakfast on Wednesday.
  • The Wiki Committee meeting on Wednesday, 8:00 am, Hilton Room 344B.
  • If you're an SEG member, you can go to any committee meeting you like! Find one that matches your interests.

If you know of any other events, please drop them in the comments!

 

Newsflash: the Geophysics Hackathon is back!

Mark your calendar: 22–24 September (right before SEG), at a downtown Houston location to be confirmed.

We're filling the room with 50 geoscientists of all stripes. Interpreters, programmers, students, professionals... everyone is welcome. The plan: to imagine, design, and prototype some new tools in geophysics — all around the theme of machine learning. It's going to be awesome. 

The schedule: we'll get started at 6 pm on Friday 22 September, and go till 10 pm. Then we pick it up again on Saturday morning, and go till 6 pm, and the same again on Sunday. Teams will present a demo to everyone on Sunday after 3 pm. There will be a few prizes, a few drinks, lots of food, and a lot of new geophysical tools and widgets. 

If you want to know more about what a hackathon is, read my summary from the last one: Le grand hack! Or check out the project round-up posts, part 1 and part 2.

If you're not sure you belong, I promise that you do. One of the prize-winning teams in Paris had no coding experience! And every team needs help with brainstorming, design, testing, and presentation. Absolutely anyone can contribute, and absolutely everyone will learn something.

If you have some like-minded friends, bring them along! We need teams of 5 people, so if there are already 5 of you, you can start coding as soon as you walk in the door!

If you can't be there yourself, please share this post with someone you know.

When you're ready, click here to buy a ticket.


Thank you as always to our sponsors so far: Dell EMC and Amazon AWS. If you'd like to sponsor the Houston event, please check this page out, or just get in touch.

Subsurface Hackathon project round-up, part 2

Following on from Part 1 yesterday, here are the other seven team projects from the hackathon:


Interactive visualization of Water Table heights over many years.

Interactive visualization of Water Table heights over many years.

Water, water everywhere

Water Underground: Martin Bentley (NMMU), Joseph Barraud (Rolls Royce), Rabah Cheknoun (UPPA)

The team built readers for the groundwater data available from dinoloket.nl, both the groundwater levels and the hydrochemistry. They clustered the data by aggregating by month and then looking for similarities in levels in the boreholes and built an open Jupyter notebook.


  

 

 

Seismic from noise

OBSNoise: Fernando Villanueva-Robles (IPGP), Yann Huet (Setec-Lerm), Ngoc Huyen Luu (Ecole Polytechnique), Dorian Bagur (Telecom ParisTech), Jonathan Grandjean (Independent)

The OBSNoise project investigated the application of machine learning to coherently stack ambient noise records collected from ocean bottom seismic (OBS) arrays in order to extract reservoir information. The team's results from synthetic data showed promise. If fully developed, this technology could be a virtually real-time monitoring system of dynamic reservoir properties.


The Killers. Killing It. 

The Killers. Killing It. 

Global geochemical data analytics

The Killers: Alexandre Sache, Violaine Delahaye, Karl Sache (all from Institute Polytechnique UniLaSalle), Côme Arvis, Guillaume Ligner (Ecole Polytechnique)

Two geoscience undergrads and one automotive design student (I know right?) from UniLaSalle hooked up with two data science students from Ecole Polytechnique to interogate the massive GeoRoc database using some clever data analytics tricks and did some novel many-dimensional geochemical classifications.


Team LogFix.

Team LogFix.

Fixing broken well data

LogFix: Guillaume Coffin (Telecom Evolution), Florian Napierala (EISTI), Camille Gimenez (Université Paris-Saclay), Tristan Siméon (Université de Montpellier), Robert Leckenby (Independent)

A truly pristine, calibrated, and corrected petrophysical data is so rare it has a sort of mythical status. Team LogFix used machine learning to identify bad-data zones, repair, QC, and fill-in missing sections. They got an impressive way with the problem, using a dataset from the Athabasca of Canada.


Between the hand-drawn lines

Automagical: Louis Poirier (Independent), Maggie Baber (Independent), Georg Semmler (GiGa infosystems), Björn Wieczoreck (GiGa infosystems), Jonas Kopcsek (GiGa infosystems)

Automagical_Paris_Hackathon.png

You don't need to believe in magic. Team Automagical used machine learning to create 3D geological models from 2D cross-sections sections. They trained a predictive model using a collection of standardized hand-drawn cross-sections from human geoscientists. The model learns how to propagate rocks throughout a 3D scene. Their goal is to be able to generate cross-sections along any direction through the model. The AI learned how to do geologically realistic interpolation on simple structures. What kind of geologic complexity is possible with more input from more cross-sections?


The document on the left contains a log display with a lithology column. It's a 'hit'. The one on the right has no lithlogies and is a 'miss'. 

The document on the left contains a log display with a lithology column. It's a 'hit'. The one on the right has no lithlogies and is a 'miss'.

 

There's rocks in them hills! Hills of paper, that is

Logs on the Rocks: Daniel Stanton (Leeds University), Jack Woolam (Leeds University), Adam Goddard (Leeds University), Henri Blondelle (AgileDD)

If the oil and gas industry is to get more efficient, we better get really good at finding lithology and fluid information in the mountains of paper we've collectively built. Team Logs on the Rocks used CNNs to identify graphical depictions of rock types in a sea of unstructured PDFs and TIFFs. They introduced themselves as a team of non-coders, but these guys were were doing cloud computing on AWS and using NVIDIA's GPUs before the end of the weekend. 


Robot vision for seismic interpretation

It's not our FAULT! Claire Birnie (Leeds University), Carlos Alberto da Costa Filho (Edinburgh University), Matteo Ravasi (Statoil), Filippo Broggini (ETHZ), Gijs Straathof (SGS)

Geologic feature recognition using machine learning. The goal was to assist seismic interpreters in detecting geologic features – faults, folds, traps, etc. – in seismic data . They used Haar cascade classifiers, which are routinely used for identifying faces or kittens or beer bottles in photographs and video streams, specially trained to work on seismic data. They used the awesome OpenCV library to build this technology. At the time of writing, their website appears to be maxed out for the month, so if you're dying to see it, leave them a comment on LinkedIn asking them increase their capacity. And in the meantime, you can check out their project's repo on GitHub.

Kudos for the open source repo, team!


It was thrilling to see such a large range of data and applications. Digital thin-sections, ground water maps, seismic data, well logs, cross-sections, information in unstructured documents, and so on. Thanks to each and every individual that showed up with their expertise and enthusiasm. We're all better off because of it.

A quick reminder that our sponsors are awesome! Please high-five them next time you meet them...

Machine learning meets seismic interpretation

Agile has been reverberating inside the machine learning echo chamber this past week at EAGE. The hackathon's theme was machine learning, Monday's workshop was all about machine learning. And Matt was also supposed to be co-chairing the session on Applications of machine learning for seismic interpretation with Victor Aare of Schlumberger, but thanks to a power-cut and subsequent rescheduling, he found himself double-booked so, lucky me, he invited me to sit in his stead. Here are my highlights, from the best seat in the house.

Before I begin, I must mention the ambivalence I feel towards the fact that 5 of the 7 talks featured the open-access F3 dataset. A round of applause is certainly due to dGB Earth Sciences for their long time stewardship of open data. On the other hand, in the sardonic words of my co-chair Victor Aarre, it would have been quite valid if the session was renamed The F3 machine learning session. Is it really the only quality attribute research dataset our industry can muster? Let's do better.

Using seismic texture attributes for salt classification

Ghassan AlRegib ruled the stage throughout the session with not one, not two, but three great talks on behalf of himself and his grad students at Georgia Institute of Technology (rather than being a show of bravado, this was a result of problems with visas). He showed some exciting developments in shallow learning methods for predicting facies in seismic data. In addition to GLCM attributes, he also introduced a couple of new (to me anyway) attributes for salt classification. Namely, textural gradient and a thing he called seismic saliency, a metric modeled after the human visual system describing the 'reaction' between relative objects in a 3D scene. 

Twelve Seismic attributes used for multi-attribute salt-boundary classification. (a) is RMS Amplitude, (B) to (M) are TEXTURAL attributes. See abstract for details. This figure is copyright of Ghassan AlRegib and licensed CC-BY-SA by virtue of being generated from the F3 dataset of dGB and TNO.

Ghassan also won the speakers' lottery, in a way. Due to the previous day's power outage and subsequent reshuffle, the next speaker in the schedule was a no-show. As a result, Ghassan had an extra 20 minutes to answer questions. Now for most speakers that would be a public-speaking nightmare, but Ghassan hosted the onslaught of inquiring minds beautifully. If we hadn't had to move on to the next next talk, I'm sure he could have entertained questions all afternoon. I find it fascinating how unpredictable events like power outages can actually create the conditions for really effective engagement. 

Salt classification without using attributes (using deep learning)

Matt reported on Anders Waldeland's work a year ago, and it was interesting to see how his research has progressed, as he nears the completion of his thesis. 

Anders successfully demonstrated how convolutional neural networks (CNNs) can classify salt bodies in seismic datasets. So, is this a big deal? I think it is. Indeed, Anders's work seems like a breakthough in seismic interpretation, at least of salt bodies. To be clear, I don't think this means that it is time for seismic interpreters to pack up and go home. But maybe we can start looking forward to spending our time doing less tedious things than picking complex salt bodies.  

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

He trained a CNN on one manually labeled slice of a 3D cube and used the network to automatically classify the full 3D salt body (on the right in the figure). Conventional algorithms for salt picking, such as that used by AlRegib (see above), typically rely on seismic attributes to define a feature space. This requires professional insight and judgment, and is prone to error and bias. Nicolas Audebert mentioned the same shortcoming in his talk in the workshop Matt wrote about last week. In contrast, the CNN algorithm works directly on the seismic data, learning the most discriminative filters on its own, no attributes needed

Intuition training

Machine learning isn't just useful for computing in the inverse direction such as with inversion, seismic interpretation, and so on. Johannes Amtmann showed us how machine learning can be useful for ranking the performance of different clustering methods using forward models. It was exciting to see: we need to get back into the habit of forward modeling, each and every one of us. Interpreters build synthetics to hone their seismic intuition. It's time to get insanely good at building forward models for machines, to help them hone theirs. 

There were so many fascinating problems being worked on in this session. It was one of the best half-day sessions of technical content I've ever witnessed at a subsurface conference. Thanks and well done to everyone who presented.


Machine learning and analytics in geoscience

We're at EAGE in Paris. I'm sitting in a corner of the exhibition because the power is out in the main hall, so all the talks for the afternoon have been postponed. The poor EAGE team must be beside themselves, I feel for them. (Note to future event organizers: white boards!)

Yesterday Diego, Evan, and I — along with lots of hackathon participants — were at the Data Science for Geosciences workshop, an all-day machine learning fest. The session was chaired by Cyril Agut (Total), Marianne Cuif-Sjostrand (Total), Florence Delprat-Jannaud (IFPEN), and Noalwenn Dubos-Sallée (IFPEN), and they had assembled a good programme, with quite a bit of variety.

Michel Lutz, Group Data Officer at Total, and adjunct at École des Mines de Saint-Étienne, gave a talk entitled, Data science & application to geosciences: an introduction. It was high-level but thoughtful, and such glimpses into large companies are always interesting. The company seems to have a mature data science strategy, and a well-developed technology stack. Henri Blondelle (AgileDD) asked about open data at the end, and Michel somewhat sidestepped on specifics, but at least conceded that the company could do more in open source code, if not data.

Infrastructure, big data, and IoT

Next we heard a set of talks about the infrastructure aspect of big (really big) data.

Alan Smith of Luchelan told the group about some negative experiences with Hadoop and seismic data (though it didn't seem to me that his problems were insoluble since I know of several projects that use it), and the realization that sometimes you just need fast infrastructure and custom software.

Hadi Jamali-Rad of Shell followed with an IoT story from the field. He had deployed a large number of wireless seismic sensors around a village in Holland, then tested various aspects of the communication system to answer questions like, what's the packet loss rate when you collect data from the nodes? What about from a balloon stationed over the site?

Duncan Irving of Teradata asked, Why aren't we [in geoscience] doing live analytics on 100PB of live data like eBay? His hypothesis is that IT organizations in oil and gas failed to keep up with key developments in data analytics, so now there's a crisis of sorts and we need to change how we handle our processes and culture around big data. 

Machine learning

We shifted gears a bit after lunch. I started with a characteristically meta talk about how I think our community can help ensure that our research and practice in this domain leads to good places as soon as possible. I'll record it and post it soon.

Nicolas Audebert of ONERA/IRISA presented a nice application of a 3D convolutional neural network (CNN) to the segmentation and classification of hyperspectral aerial photography. His images have between about 100 and 400 channels, and he finds that CNNs reduce error rates by up to about 50% (compared to an SVM) on noisy or complex images. 

Henri Blondelle of Agile Data Decisions talked about his experience of the CDA's unstructured data challenge of 2016. About 80% of the dataset is unstructured (e.g. folders of PDFs and TIFFs), and Henri's vision is to transform 80% of that into structured data, using tools like AgileDD's IQC to do OCR and heuristic labeling. 

Irina Emelyanova of CSIRO provided another case study: unsupervised e-facies prediction using various types of clustering, from K-means to some interesting variants of self-organizing maps. It was refreshing to see someone revealing a lot of the details of their implementation.

Jan Limbeck, a research scientist at Shell wrapped up the session with an overview of Shell's activities around big data and machine learning, as they prepare for exabytes. He mentioned the Mauricio Araya-Polo et al. paper on deep learning in seismic shot gathers in the special March issue of The Leading Edge — clearly it's easiest to talk about things they've already published. He also listed a lot of Shell's machine learning projects (frac optimization, knowledge graphs, reservoir simulation, etc), but there's no way to know what state they are in or what their chances of success are. 

As well as all the 9 talks, there were 13 posters, about a third of which were on infrastructure stuff, with the rest providing more case studies. Unfortunately, I didn't get the chance to look at them in any detail, but I appreciated the organizers making time for discussion around the posters. If they'd also allowed more physical space for the discussion it could have been awesome.

Analytics!

After hearing about Mentimeter from Chris Jackson I took the opportunity to try it out on the audience. Here are the results, I think they are fairly self-explanatory... 

I also threw in the mindmap I drew at the end as a sort of summary. The vertical axis represents something like'abstraction' or 'time' (in a workflow sense) and I think each layer depends somewhat on those beneath it. It probably makes sense to no-one but me.

Breakout!

It seems clear that 2017 is the breakout year for machine learning in petroleum geoscience, and in petroleum in general. If your company or institution has not yet gone beyond "watching" or "thinking about" data science and machine learning, then it is falling behind by a little more every day, and it has been for at least a year. Now's the time to choose if you want to be part of what happens next, or a victim of it.