How to load SEG-Y data

Yesterday I looked at the anatomy of SEG-Y files. But it's pathology we're really interested in. Three times in the last year, I've heard from frustrated people. In each case, the frustration stemmed from the same problem. The epic email trails led directly to these posts. Next time I can just send a URL!

In a nutshell, the specific problem these people experienced was missing or bad trace location data. Because I've run into this so many times before, I never trust location data in a SEG-Y file. You just don't know where it's been, or what has happened to it along the way — what's the datum? What are the units? And so on. So all you really want to get from the SEG-Y are the trace numbers, which you can then match to a trustworthy source for the geometry.

Easy as 1-2-3, er, 4

This is my standard approach to loading data. Your mileage will vary, depending on your software and your data. 

  1. Find the survey geometry information. For 2D data the geometry is usually in a separate navigation ('nav') file. For 3D you are just looking for cornerpoints, and something indicating how the lines and crosslines are numbered (they might not start at 1, and might not be oriented how you expect). This information may be in the processing report or, less reliably, in the EBCDIC text header of the SEG-Y file.
  2. Now define the survey geometry. You need a location for every trace for a 2D, and the survey's cornerpoints for a 3D. The geometry is a description of where the line goes on the earth, in surface coordinates, and where the starting trace is, how many traces there are, and what the trace spacing is. In other words, the geometry tells you where the traces go. It's variously called 'navigation', 'survey', or some other synonym.
  3. Finally, load the traces into their homes, one vintage (survey and processing cohort) at a time for 2D. The cross-reference between the geometry and the SEG-Y file is the trace or CDP number for a 2D, and the line and crossline numbers for a 3D.
  4. Check everything twice. Does the map look right? Is the survey the right shape and size? Is the line spacing right? Do timeslices look OK?

Where to get the geometry data?

So, where to find cornerpoints, line spacings, and so on? Sadly, the header cannot be trusted, even in newly-processed data. If you have it, the processing report is a better bet. It often helps to talk to someone involved in the acquisition and processing too. If you can corroborate with data from the acqusition planning (line spacings, station intervals, and so on), so much the better — but remember that some acquisition parameters may have changed during the job.

Of vital importance is some independent corroboration— a map, ideally —of the geometry and the shape and orientation of the survey. I can't count the number of back-to-front surveys I've seen. I even saw one upside-down (in the z dimension) once, but that's another story.

Next time, I'll break down the loading process a bit more, with some step-by-step for loading the data somewhere you can see it.

What is SEG-Y?

The confusion starts with the name, but whether you write SEGY, SEG Y, or SEG-Y, it's probably definitely pronounced 'segg why'. So what is this strange substance?

SEG-Y means seismic data. For many of us, it's the only type of seismic file we have much to do with — we might handle others, but for the most part they are closed, proprietary formats that 'just work' in the application they belong to (Landmark's brick files, say, or OpendTect's CBVS files). Processors care about other kinds of data — the SEG has defined formats for field data (SEG-D) and positional data (SEG-P), for example. But SEG-Y is the seismic file for everyone. Kind of.

The open SEG-Y "standard" (those air quotes are an important feature of the standard) was defined by SEG in 1975. The first revision, Rev 1, was published in 2002. The second revision, Rev 2, was announced by the SEG Technical Standards Committee at the SEG Annual Meeting in 2013 and I imagine we'll start to see people using it in 2014. 

What's in a SEG-Y file?

SEG-Y files have lots of parts:

The important bits are the EBCDIC header (green) and the traces (light and dark blue).

The EBCDIC text header is a rich source of accurate information that provides everything you need to load your data without problems. Yay standards!

Oh, wait. The EBCDIC header doesn't say what the coordinate system is. Oh, and the datum is different from the processing report. And the dates look wrong, and the trace length is definitely wrong, and... aargh, standards!

The other important bit — the point of the whole file really — is the traces themselves. They also have two parts: a header (light blue, above) and the actual data (darker blue). The data are stored on the file in (usually) 4-byte 'words'. Each word has its own address, or 'byte location' (a number), and a meaning. The headers map the meaning to the location, e.g. the crossline number is stored in byte 21. Usually. Well, sometimes. OK, it was one time.

According to the standard, here's where the important stuff is supposed to be:

I won't go into the unpleasantness of poking around in SEG-Y files right now — I'll save that for next time. Suffice to say that it's often messy, and if you have access to a data-loading guru, treat them exceptionally well. When they look sad — and they will look sad — give them hugs and hot tea. 

What's so great about Rev 2?

The big news in the seismic standards world is Revision 2. According to this useful presentation by Jill Lewis (Troika International) at the Standards Leadership Council last month, here are the main features:

  • Allow 240 byte trace header extensions.
  • Support up to 231 (that's 2.1 billion!) samples per trace and traces per ensemble.
  • Permit arbitrarily large and small sample intervals.
  • Support 3-byte and 8-byte sample formats.
  • Support microsecond date and time stamps.
  • Provide for additional precision in coordinates, depths, elevations.
  • Synchronize coordinate reference system specification with SEG-D Rev 3.
  • Backward compatible with Rev 1, as long as undefined fields were filled with binary zeros.

Two billion samples at µs intervals is over 30 minutes Clearly, the standard is aimed at <ahem> Big Data, and accommodating the massive amounts of data coming from techniques like variable timing acquisition, permanent 4D monitoring arrays, and microseismic. 

Next time, we'll look at loading one of these things. Not for the squeamish.

Calibrate your seismic intuition

On Tuesday we announced our new web app, modelr.io. Why are we so excited about it? 

  • We love the idea that subsurface software can cost dollars, not 1000's of dollars. 
  • We love the idea of subsurface software being online, not on the desktop.
  • We love the idea that subsurface software can be open source. Here's our code!
  • We love the idea of subsurface software that doesn't need a manual to master.
  • We love the idea of subsurface software that runs on a tablet or a phone.
  • We see software as an important way to share knowledge and connect people.

OK, that's enough reasons. There are more. Those are the main ones.

The point is: we love these ideas. And we hope that you, dear reader, at least like some of them a bit. Because we really want to keep developing modelr. We think it can be awesome. Imagine 3D earth models, imagine full waveform modeling, imagine gravity and magnetic models. We get very excited when we think about all the possiblities. There's no better way to calibrate your seismic intuition than modeling, and modelr is a great place to start modeling. 

Here's a challenge: take 3 minutes and see if you can generate...

 A wedge model & tuning curve An AVA gather for a Class 4 sand    A stochastic AVA crossplot          

 modelr seismic wedge modelmodelr seismic avo modelmodelr stochastic avo  model

The most important thing nobody does

A couple of weeks ago, we told you we were up to something. Today, we're excited to announce modelr.io — a new seismic forward modeling tool for interpreters and the seismically inclined.

Modelr is a web app, so it runs in the browser, on any device. You don't need permission to try it, and there's never anything to install. No licenses, no dongles, no not being able to run it at home, or on the train.

Later this week, we'll look at some of the things Modelr can do. In the meantime, please have a play with it.
Just go to modelr.io and hit Demo, or click on the screenshot below. If you like what you see, then think about signing up — the more support we get, the faster we can make it into the awesome tool we believe it can be. And tell your friends!

If you're intrigued but unconvinced, sign up for occasional news about Modelr:

This will add you to the email list for the modeling tool. We never share user details with anyone. You can unsubscribe any time.

A long weekend of creative geoscience computing

The Rock Hack is in three weeks. If you're in Houston, for AAPG or otherwise, this is going to be a great opportunity to learn some new computer skills, build some tools, or just get some serious coding done. The Agile guys — me, Evan, and Ben — will be hanging out at START Houston, laptops open, all say 5 and 6 April, about 8:30 till 5. The breakfast burritos and beers are on us.

Unlike the geophysics hackathon last September, this won't be a contest. We're going to try a more relaxed, unstructured event. So don't be shy! If you've always wanted to try building something but don't know where to start, or just want to chat about The Next Big Thing in geoscience or technology — please drop in for an hour, or a day.

Here are some ideas we're kicking around for projects to work on:

  • Sequence stratigraphy calibration app to tie events to absolute geologic time and to help interpret systems tracts.
  • Wireline log 'attributes'.
  • Automatic well-to-well correlation.
  • Facies recognition from core.
  • Automatic photomicrograph interpretation: grain size, porosity, sorting, and so on.
  • A mobile app for finding and capturing data about outcrops.
  • An open source basin modeling tool.

Short course

If you feel like a short course would get you started faster, then come along on Friday 4 April. Evan will be hosting a 1-day course, leading you through getting set up for learning Python, learning some syntax, and getting started on the path to scientific computing. You won't have super-powers by the end of the day, but you'll know how to get them.

Eventbrite - Agile Geocomputing

The course includes food and drink, and lots of code to go off and play with. If you've always wanted to get started programming, this is your chance!

Purposeful discussion in geoscience

Regular readers will remember the Unsolved Problems Unsession at the GeoConvention in Calgary last May. We think these experiments in collaboration are one possible way to get people more involved in progressing geoscience at conferences, and having something to show for it. We plan to do more — and are here to support you if you'd like to try one in your community.

Last Thursday was the 2014 CSEG Symposium. The organizers asked me for a short video to sum up what happened at the unsession for the crowd, and to help get them in the mood for some discussion. I hope it helped...

Getting better

Conferences seem so crammed with talks these days. No time for good conversation, in or out of the sessions. The only decent discussion I remember recently (apart from the unsession, obvsly) was at EAGE in 2012, when a talk finished early and the space filled with a fascinating discussion between two compressed sensing clever-clogs.

I think there are a few ways to get better at it:

  • Make more time for it, preferably at least 40 minutes.
  • Get people into smaller groups, about 4–12 people is good.
  • Facilitate with some ground rules, provocative questions, and conversation management.
  • Capture what was said, preferably in real time and using the participants' own words.
  • Use lots of methods: drawing, sticky notes, tweets, video, and so on.
  • Reflect the conversation back at the participants, and let them respond.
  • Read up on open space, knowledge café, charrettes, and other methods.
  • Don't shut it down with "I guess we're out of time..." — review or sum up first.

Think about when you have been part of a really good conversation. How it feels, how it flows, and how you remember it for days afterwards, and mention it to others later. I think we can have more of those about our work, and conferences are a great place to help them happen.

Stay tuned for details of the next unsession — again, at the Calgary GeoConvention.

Relentlessly practical

This is one of my favourite knowledge sharing stories.

A farmer in my community had a problem with one of his cows — it was seriously unwell. He asked one of the old local farmers about the symptoms, and was told, “Oh yes, one of my herd had the same thing last summer. I gave her a cup of brandy and four aspirins every night for a week.” The young farmer went off and did this, but the poor cow got steadily worse and died. When he saw the old farmer next he told him, more than a little accusingly, “I did what you said, and the cow died anyway.” The old geezer looked into the distance and just said, “Yep, so did mine.”

Incomplete information can be less useful than no information. Yet incomplete information has somehow become our specialty in applied geoscience. How often do we share methods, results, or case studies without the critical details that would make it useful information? That is, not just marketing, or resumé padding. Inded, I heard this week that one large US operator will not approve a publication that does include these critical details! And we call ourselves scientists...

Completeness mandatory

Thankfully, Last month The Leading Edge — the magazine of the SEG — started a new tutorial column, edited by me. Well, I say 'edited', I'm just the person that pesters prospective authors until they give in and send me a manuscript. Tad Smith, Don Herron, and Jenny Kucera are the people that make it actually happen. But I get to take all the credit.

When I was asked about it, I suggested two things:

  1. Make each tutorial reproducible by publishing the code that makes the figures.
  2. Make the words, the data, and the code completely open and shareable. 

To my delight and, I admit, slight surprise, they said 'Sure!'. So the words are published under an open license (Creative Commons Attribution-ShareAlike, the same license for re-use that most of Wikipedia has), the tutorials use open data for everything, and the code is openly available and free to re-use. Complete transparency.

There's another interesting aspect to how the column is turning out. The first two episodes tell part of the story in IPython Notebook, a truly amazing executable writing environment that we've written about before. This enables you to seamlessly stich together text, code, and plots (left). If you know a bit of Python, or want to start learning it right now this second, go give wakari.io a try. It's pretty great. (If you really like it, come and learn more with us!).

Read the first tutorial: Hall, M. (2014). Smoothing surfaces and attributes. The Leading Edge, 33(2), 128–129. doi: 10.1190/tle33020128.1. A version of it is also on SEG Wiki, and you can read the IPython Notebook at nbviewer.org.

Do you fancy authoring something for this column? Wonderful — please do! Here are the author instructions. If you have an idea for something, please drop me a line, let's talk about how to make it relentlessly practical.