Looking forward to EAGE

Evan, Diego and I are flying to Paris today for the EAGE Conference and Exhibition. It's exciting. We're excited. 

But the excitement starts before the conference. The Subsurface Hackathon is this weekend!

My diary

Even the hackathon excitement starts before the weekend, because tomorrow, Friday, we're running the hacker's bootcamp — a sort of short course appetizer for the hackathon. We have about 25 geoscientists coming to the Booster TOTAL (an event space at TOTAL's La Défense offices) to get some hands-on practice with Python and the latest in machine learning tools. It's especially exciting because we'll also have engineers from NVIDIA on hand to help with the coaching. The idea is to help people hit the ground running when the hackathon starts on Saturday.

After that, on Saturday and Sunday,  it's the hackathon itself. We have no fewer than 60 geoscientists and engineers registered for this breakout event. They're coming to the Booster to work on a wide array of machine learning ideas for the subsurface. It's going to be epic. You can read all about what happens next week, I promise. 

Then on Monday it's the Data Science for Geoscience workshop, at which I'm giving a keynote. Since I'm far from possessing expertise, I'm using it as a chance to get people jazzed about helping make the coming AI revolution in geoscience a positive experience. I'm really looking forward to it.

The conference itself starts on Tuesday. In the afternoon I'm co-chairing a session on machine learning (have you spotted the theme yet?) in seismic interpretation, along with Victor Aare of Schlumberger. It will be awesome to see what kind of progress our community is making in this field — it's fun to imagine what seismic interpretation might be like in a few years. There are so many fascinating problems to work on! Here are the talks in that session:

On Wednesday we'll be taking in some more talks and posters, then in the afternoon I'm reprising my keynote talk at IFPEN, a subsurface research institute in the Bois de Boulogne. I've never been there before, although I have met a few IFP scientists before. I'm looking forward to it very much. 

It all ends for us on Thursday. Evan and Diego fly home and I'm off to Cambridge (the old one in the fens, not the one in Massachusetts) for a few days with family (and bookshops). Until then, expect much blogging!


Going to EAGE?

If you're reading this and would like to meet up with us at Agile or some of the Software Underground crowd — the friendliest bunch of coding geoscientists you could hope for — let's plan to meet at the end of the workshop, at the workshop location. Look for the Software Underground shirts.

What should national data repositories do?

Right now there's a conference happening in Stavanger, Norway: National Data Repository 2017. My friend David Holmes of Dell EMC, a long time supporter of Agile's recent hackathons and general geocomputing infrastructure superhero, is there. He's giving a talk, I think, and chairing at least one session. He asked a question today on Software Underground:

If anyone has any thoughts or ideas as to what the regulators should be doing differently now is a good time to speak up :)

My response

For me it's about raising their aspirations. Collectively, they are sitting on one of the most valuable — or invaluable — datasets in the world, comparable to Hubble, or the LHC. Better yet, the data are (in most cases) already open and they actually want to share it. And the community (us) is better tooled than ever, and perhaps also more motivated, to get cracking. So the possibility is there to see a revolution in subsurface science and exploration (in the broadest sense of the word) and my challenge to them is:

Can they now create the conditions for this revolution in earth science?

Some things I think they can do right now:

  • Properly fund the development of an open data platform. I'll expand on this topic below.
  • Don't get too twisted off on formats (go primitive), platforms (pick one), licenses (go generic), and other busy work that committees love to fret over. Articulate some principles (e.g. public first, open source, small footprint, no lock-in, componentize, no single provider, let-users-choose, or what have you), and stay agile. 
  • Lobby NOCs and IOCs hard to embrace integrated and high-quality open data as an advantage that society, as well as industry, can share in. It's an important piece in the challenge we face to modernize the industry. Not so that it can survive for survival's sake, but so that it can serve society for as long as it's needed. 
  • Get involved in the community: open up their processes and collaborate a lot more with the technical societies — like show up and talk about their programs. (How did I not hear about the CDA's unstructured data challenge — a subject I'm very much into — till it was over? How many other potential participants just didn't know about it?)

An open data platform

The key piece here is the open data platform. Here are the features I'd like to see of such a platform:

  • Optimized for users, not the data provider, hosting provider, or system administrator.
  • Clear rights: well-known, documented, obvious, clearly expressed open licenses for re-use.
  • Meaningful levels of access that are free of charge for most users and most use cases.
  • Access for humans (a nice mappy web interface) with no awkward or slow registration processes.
  • Access for machines (a nice API, perhaps even a couple of libraries expressing it).
  • Tools for query, discovery, and retrieval; ideally with user feedback paths ('more like this, less like that').
  • Ways to report, or even fix, problems in the data. This relieves you of "the data's not ready" procrastination.
  • Good documentation of all of this, ideally in a wiki or something that people can improve.
  • Support for a community of users and developers that want to do things with the data.

Building this platform is not trivial. There is massive file storage, database back end, web front end, licensing, and so on. Then there's the community of developers and users to engage and support. It will take years, and never be finished. It sounds hard... but people are doing it. Prototypes for seismic data exist already, and there are countless models in other verticals (just check out the Registry of Research Data Repositories, or look at the list on PLOS). 

The contract to build data infrostructure is often awarded to the likes of Schlumberger, Halliburton or CGG. In theory, these companies have the engineering depth to pull them off (though this too is debatable, especially in today's web-first, native-never world). But they completely lack the culture required: there's no corporate understanding of what 'open' means. So the model is broken in subtle but fatal ways and the whole experiment fails. 

I'm excited to hear what comes out of this conference. If you're there, please tell!

GeoConvention highlights

We were in Calgary last week at the Canada GeoConvention 2017. The quality of the talks seemed more variable than usual but, as usual, there were some gems in there too. Here are our highlights from the technical talks...

Filling in gaps

Mauricio Sacchi (University of Alberta) outlined a new reconstruction method for vector field data. In other words, filling in gaps in multi-compononent seismic records. I've got a soft spot for Mauricio's relaxed speaking style and the simplicity with which he presents linear algebra, but there are two other reasons that make this talk worthy of a shout out:

  1. He didn't just show equations in his talk, he used pseudocode to show the algorithm.
  2. He linked to his lab's seismic processing toolkit, SeismicJulia, on GitHub.

I am sure he'd be the first to admit that it is early days for for this library and it is very much under construction. But what isn't? All the more reason to showcase it openly. We all need a lot more of that.

Update on 2017-06-7 13:45 by Evan Bianco: Mauricio, has posted the slides from his talk

Learning about errors

Anton Birukov (University of Calgary & graduate intern at Nexen) gave a great talk in the induced seismicity session. It was a lovely mashing-together of three of our favourite topics: seismology, machine-learning, and uncertainty. Anton is researching how to improve microseismic and earthquake event detection by framing it as a machine-learning classification problem. He's using Monte Carlo methods to compute myriad synthetic seismic events by making small velocity variations, and then using those synthetic events to teach a model how to be more accurate about locating earthquakes.

Figure 2 from Anton Biryukov's abstract. An illustration of the signal classification concept. The signals originating from the locations on the grid (a) are then transformed into a feature space and labeled by the class containing the event or…

Figure 2 from Anton Biryukov's abstract. An illustration of the signal classification concept. The signals originating from the locations on the grid (a) are then transformed into a feature space and labeled by the class containing the event origin. From Biryukov (2017). Event origin depth uncertainty - estimation and mitigation using waveform similarity. Canada GeoConvention, May 2017.

The bright lights of geothermal energy
Matt Hall

Two interesting sessions clashed on Wednesday afternoon. I started off in the Value of Geophysics panel discussion, but left after James Lamb's report from the mysterious Chief Geophysicists' Forum. I had long wondered what went on in that secretive organization; it turns out they mostly worry about how to make important people like your CEO think geophysics is awesome. But the large room was a little dark, and — in keeping with the conference in general — so was the mood.

Feeling a little down, I went along to the Diversification of the Energy Industry session instead. The contrast was abrupt and profound. The bright room was totally packed with a conspicuously young audience numbering well over 100. The mood was hopeful, exuberant even. People were laughing, but not wistfully or ironically. I think I saw a rainbow over the stage.

If you missed this uplifting session but are interested in contributing to Canada's geothermal energy scene, which will certainly need geoscientists and reservoir engineers if it's going to get anywhere, there are plenty of ways to find out more or get involved. Start at cangea.ca and follow your nose.

We'll be writing more about the geothermal scene — and some of the other themes in this post — so stay tuned. 


DID YOU KNOW?

You can get regular updates right to your email, just drop your address in the box:

The fine print: No spam, we promise! We never share email addresses with 3rd parties. Unsubscribe any time with the link in the emails. The service is provided by MailChimp in accordance with Canada's anti-spam regulations.

Running away from easy

Matt and I are in Calgary at the 2017 GeoConvention. Instead of writing about highlights from Day 1, I wanted to pick on one awesome thing I saw. Throughout the convention, there is a air of sadness, of nostalgia, of struggle. But I detect a divide among us. There are people who are waiting for things to return to how they were, when life was easy. Others are exploring how to be a part of the change, instead of a victim of it. Things are no longer easy, but easy is boring. 


Want to start an oil and gas company? What resources are you going to need? Computers, pricey software applications, data. Purchase all of this stuff as a one-time capital expense, build a team, get an office lease, buy desks and a Keurig. Then if all goes well, 18 months later you'll have a slide deck outlining a play that you could pitch to investors. 

Imagine getting started without laying down a huge amount of capital for all those things. What if you could rent a desk at a co-working space, access the suite of software tools that you're used to, and use their Keurig. The computer infrastructure and software is managed and maintained by an IT service company so you don't have to worry about it. 

Yesterday at the Calgary Geoconvention I heard all about ReSourceYYC, a co-working space catering to oil and gas professionals, introduced ResourceNET, a subscription-based cloud workstation environment for freelancers, consultants, startups, and the newly and not-so-newly underemployed community of subsurface professionals.

In making this offering, ReSourceYYC has partnered up with a number of software companies: Entero, Seisware, Surfer, ValNav, geoLOGIC, and Divestco, to name a few. The limitations and restrictions around this environment, if any, weren't totally clear. I wondered: Could I append or swap my own tools with this stack? Can I access this environment from anywhere?

It could be awesome. I think it could serve just as many freelancers and consultants as "oil and gas startups". It seems a bit too early to say, but I reckon there are literally thousands of geoscientists and engineers in Calgary that'd be all over this.

I think it's interesting and important and I hope they get it right.

SEG-Y Rev 2 again: little-endian is legal!

Big news! Little-endian byte order is finally legal in SEG-Y files.

That's not all. I already spilled the beans on 64-bit floats. You can now have up to 18 quintillion traces (18 exatraces?) in a seismic line. And, finally, the hyphen confusion is cleared up: it's 'SEG-Y', with a hyphen. All this is spelled out in the new SEG-Y specification, Revision 2.0, which was officially released yesterday after at least five years in the making. Congratulations to Jill Lewis, Rune Hagelund, Stewart Levin, and the rest of the SEG Technical Standards Committee

Back up a sec: what's an endian?

Whenever you have to think about the order of bytes (the 8-bit chunks in a 'word' of 32 bits, for example) — for instance when you send data down a wire, or store bytes in memory, or in a file on disk — you have to decide if you're Roman Catholic or Church of England.

What?

It's not really about religion. It's about eggs.

In one of the more obscure satirical analogies in English literature, Jonathan Swift wrote about the ideological tussle between between two factions of Lilliputians in Gulliver's Travels (1726). The Big-Endians liked to break their eggs at the big end, while the Little-Endians preferred the pointier option. Chaos ensued.

Two hundred and fifty years later, Danny Cohen borrowed the terminology in his 1 April 1980 paper, On Holy Wars and a Plea for Peace — in which he positioned the Big-Endians, preferring to store the big bytes first in memory, against the Little-Endians, who naturally prefer to store the little ones first. Big bytes first is how the Internet shuttles data around, so big-endian is sometimes called network byte order. The drawing (right) shows how the 4 bytes in a 32-bit 'word' (the hexadecimal codes 0A, 0B, 0C and 0D) sit in memory.

Because we write ordinary numbers big-endian style — 2017 has the thousands first, the units last — big-endian might seem intuitive. Then again, lots of people write dates as, say, 31-03-2017, which is analogous to little-endian order. Cohen reviews the computational considerations in his paper, but really these are just conventions. Some architectures pick one, some pick the other. It just happens that the x86 architecture that powers most desktop and laptop computers is little-endian, so people have been illegally (and often accidentally) writing little-endian SEG-Y files for ages. Now it's actually allowed.

Still other byte orders are possible. Some processors, notably ARM and other RISC architectures, are middle-endian (aka mixed endian or bi-endian). You can think of this as analogous to the month-first American date format: 03-31-2017. For example, the two halves of a 32-bit word might be reversed compared to their 'pure' endian order. I guess this is like breaking your boiled egg in the middle. Swift did not tell us which religious denomination these hapless folks subscribe to.

OK, that's enough about byte order

I agree. So I'll end with this handy SEG-Y cheatsheet. Click here for the PDF.


References and acknowledgments

Cohen, Danny (April 1, 1980). On Holy Wars and a Plea for PeaceIETF. IEN 137. "...which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians." Also published at IEEE ComputerOctober 1981 issue.

Thumbnail image: “Remember, people will judge you by your actions, not your intentions. You may have a heart of gold -- but so does a hard-boiled egg.” by Kate Ter Haar is licensed under CC BY 2.0

News and updates and a sandwich

Plans for the hackathon in Paris in June are well underway. We now have two major sponsors: Dell EMC and now Total E&P too will be supporting the event with generous funding. Bolstered by this, I've set a goal of getting 50 participants in the event. Imagine that!

If you would like to help us reach this goal, please consider printing out some of these posters (right) and putting them up in your place of work or study >> hi-res PDF << It should even be readable in black & white, if that's your only option.

You can find links to everything you need to know about the event at agilescientific.com/paris.

Le grand sandwich délicieux

The hackathon is really just the filling in a delicious Parisian sandwich of geocomputing goodness. The bread at the bottom is the Hacker Bootcamp on 9 June. The filling is the hackathon weekend... and the final piece is the EAGE workshop on machine learning. Convened by geoscientists at Total and IFP, it should be a great day of knowledge sharing and discussion. I can't wait.

11 days to go!

There are only 11 days left to take part in the SEG Machine Learning contest, in which you are challenged to predict lithologies in two wells, given some wireline logs and lithologies in several other nearby wells. Everything you need to get started, even if you've never tried anything like this before, is right here. See Brendon Hall's TLE article for more deets.

The radio show for geo-nerds

Undersampled Radio is still going strong. We just recorded episode 32 today. Last week's chat with Prof Chris Jackson (Imperial College London) — who's embarking on a GSA lecture tour this year — was a real cracker, check it out:

The other thing you need to know about Chris is that he's started writing his blog again. It's awesome, of course, and you should probably just go and read it now...

2016 retrospective

As we see out the year — or rather shove it out, slamming the door firmly behind it, then changing the locks and filing a restraining order — we like to glance back over the blog. We remember the posts that we enjoyed writing, and the ones you seemed to enjoy reading, and record them here for posterity.

The most popular

The great thing about writing on the web, compared to print, is that you quickly find out whether it was any good, or useful, or at least slightly interesting. You can't hide from data. Without adjusting for the age of posts (older ones have had longer to garner readers of course), the most popular posts of the last 12 months — from the 47 we have published — were:

None of these posts comes anywhere near the most popular page on the site, k is for wavenumber, which I wrote in 2012 but still gets about 600 pageviews a month, nearly 4% of the traffic on the site. Other perennials include Well tie workflow, What is anisotropy? and What is SEG Y?

If you gauge popularity by real engagement — comments, which are like diamonds to bloggers — then, apart from the pieces I already mentioned, these were the next most commenty posts:

Where is everybody?

We don't collect data about our readers beyond what's reported by your browser to Google Analytics, most of which is pretty esoteric. But it is interesting to see the geographic distribution of our readers. The top dozen cities from the roughly two thirds of sessions — out of about 9000 monthly sessions — that report this information:

  1. Houston (3,457 users)
  2. Calgary (2,244)
  3. London (1,500)
  4. Perth (723)
  5. Kuala Lumpur (700)
  6. Stavanger
  7. Delhi
  8. Rio de Janeiro
  9. Leeds
  10. Aberdeen
  11. Jakarta
  12. New York

Last thing

You rock! I mean it. This blog would be pretty pointless without your eyeballs. We appreciate every visit, however short, and when you share a post with someone... it really makes our day. I love hearing from readers, even about typos. Especially aobut typos. Anyway, the point is: thank you for stopping by, and being part of this global community of geoscientists.

Whatever festival you celebrate this week, have a peaceful time*. And all the best for 2017!

* Well, maybe squeeze in a bit of writing: it's good for you. 


Previous Retrospective posts... 2011 retrospective •  2012 retrospective • 2013 retrospective • 2014 retrospective

There was no Retrospective in 2015, I was too discombobulated this time last year :(

SEG machine learning contest: there's still time

Have you been looking for an excuse to find out what machine learning is all about? Or maybe learn a bit of Python programming language? If so, you need to check out Brendon Hall's tutorial in the October issue of The Leading Edge. Entitled, "Facies classification using machine learning", it's a walk-through of a basic statistical learning workflow, applied to a small dataset from the Hugoton gas field in Kansas, USA.

But it was also the launch of a strictly fun contest to see who can get the best prediction from the available data. The rules are spelled out in ther contest's README, but in a nutshell, you can use any reproducible workflow you like in Python, R, Julia or Lua, and you must disclose the complete workflow. The idea is that contestants can learn from each other.

Left: crossplots and histograms of wireline log data, coloured by facies — the idea is to highlight possible data issues, such as highly correlated features. Right: true facies (left) and predicted facies (right) in a validation plot. See the rest of the paper for details.

What's it all about?

The task at hand is to predict sedimentological facies from well logs. Such log-derived facies are sometimes called e-facies. This is a familiar task to many development geoscientists, and there are many, many ways to go about it. In the article, Brendon trains a support vector machine to discriminate between facies. It does a fair job, but the accuracy of the result is less than 50%. The challenge of the contest is to do better.

Indeed, people have already done better; here are the current standings:

Team F1 Algorithm Language Solution
1 gccrowther 0.580 Random forest Python Notebook
2 LA_Team 0.568 DNN Python Notebook
3 gganssle 0.561 DNN Lua Notebook
4 MandMs 0.552 SVM Python Notebook
5 thanish 0.551 Random forest R Notebook
6 geoLEARN 0.530 Random forest Python Notebook
7 CannedGeo 0.512 SVM Python Notebook
8 BrendonHall 0.412 SVM Python Initial score in article

As you can see, DNNs (deep neural networks) are, in keeping with the amazing recent advances in the problem-solving capability of this technology, doing very well on this task. Of the 'shallow' methods, random forests are quite prominent, and indeed are a great first-stop for classification problems as they tend to do quite well with little tuning.

How do I enter?

There is still over 6 weeks to enter: you have until 31 January. There is a little overhead — you need to learn a bit about git and GitHub, there's some programming, and of course machine learning is a massive field to get up to speed on — but don't be discouraged. The very first entry was from Bryan Page, a self-described non-programmer who dusted off some basic skills to improve on Brendon's notebook. But you can run the notebook right here in mybinder.org (if it's up today — it's been a bit flaky lately) and a play around with a few parameters yourself.

The contest aspect is definitely low-key. There's no money on the line — just a goody bag of fun prizes and a shedload of kudos that will surely get the winners into some awesome geophysics parties. My hope is that it will encourage you (yes, you) to have fun playing with data and code, trying to do that magical thing: predict geology from geophysical data.


Reference

Hall, B (2016). Facies classification using machine learning. The Leading Edge 35 (10), 906–909. doi: 10.1190/tle35100906.1. (This paper is open access: you don't have to be an SEG member to read it.)

Le meilleur hackathon du monde

hackathon_2017_calendar.png

Hackathons are short bursts of creative energy, making things that may or may not turn out to be useful. In general, people work in small teams on new projects with no prior planning. The goal is to find a great idea, then manifest that idea as something that (barely) works, but might not do very much, then show it to other people.

Hackathons are intellectually and professionally invigorating. In my opinion, there's no better team-building, networking, or learning event.

The next event will be 10 & 11 June 2017, right before the EAGE Conference & Exhibition in Paris. I hope you can come.

The theme for this event will be machine learning. We had the same theme in New Orleans in 2015, but suffered a bit from a lack of data. This time we will have a collection of open datasets for participants to build off, and we'll prime hackers with a data-and-skills bootcamp on Friday 9 June. We did this once before in Calgary – it was a lot of fun. 

Can you help?

It's my goal to get 52 participants to this edition of the event. But I'll need your help to get there. Please share this post with any friends or colleagues you think might be up for a weekend of messing about with geoscience data and ideas. 

Other than participants, the other thing we always need is sponsors. So far we have three organizations sponsoring the event — Dell EMC is stepping up once again, thanks to the unstoppable David Holmes and his team. And we welcome Sandstone — thank you to Graham Ganssle, my Undersampled Radio co-host, who I did not coerce in any way.

sponsors_so_far.png

If your organization might be awesome enough to help make amazing things happen in our community, I'd love to hear from you. There's info for sponsors here.

If you're still unsure what a hackathon is, or what's so great about them, check out my November article in the Recorder (Hall 2015, CSEG Recorder, vol 40, no 9).

The disappearing lake trick

On Sunday 20 November it's the 36th anniversary of the 1980 Lake Peigneur drilling disaster. The shallow lake — almost just a puddle at about 3 m deep — disappeared completely when the Texaco wellbore penetrated the Diamond Crystal Salt Company mine at a depth of about 350 m.

Location, location, location

It's thought that the rig, operated by Wilson Brothers Ltd, was in the wrong place. It seems a calculation error or misunderstanding resulted in the incorrect coordinates being used for the well site. (I'd love to know if anyone knows more about this as the Wikipedia page and the video below offer slightly different versions of this story, one suggesting a CRS error, the other a triangulation error.)

The entire lake sits on top of the Jefferson Island salt dome, but the steep sides of the salt dome, and a bit of bad luck, meant that a few metres were enough to spoil everyone's day. If you have 10 minutes, it's worth watching this video...

Apparently the accident happened at about 0430, and the crew abandoned the subsiding rig before breakfast. The lake was gone by dinner time. Here's how John Warren, a geologist and proprietor of Saltworks, describes the emptying in his book Evaporites (Springer 2006, and repeated on his awesome blog, Salty Matters):

Eyewitnesses all agreed that the lake drained like a giant unplugged bathtub—taking with it trees, two oil rigs [...], eleven barges, a tugboat and a sizeable part of the Live Oak Botanical Garden. It almost took local fisherman Leonce Viator Jr. as well. He was out fishing with his nephew Timmy on his fourteen-foot aluminium boat when the disaster struck. The water drained from the lake so quickly that the boat got stuck in the mud, and they were able to walk away! The drained lake didn’t stay dry for long, within two days it was refilled to its normal level by Gulf of Mexico waters flowing backwards into the lake depression through a connecting bayou...

The other source that seems reliable is Oil Rig Disasters, a nice little collection of data about various accidents. It ends with this:

Federal experts from the Mine Safety and Health Administration were not able to apportion blame due to confusion over whether Texaco was drilling in the wrong place or that the mine’s maps were inaccurate. Of course, all evidence was lost.

If the bit about the location is true, it may be one of the best stories of the perils of data management errors. If anyone (at Chevron?!) can find out more about it, please share!