July 31, 2014

July linkfest

July 31, 2014/ Matt Hall

It's linkfest time again. All the links, in one handy post.

First up — I've seen some remarkable scientific visualizations recently. For example, giant ocean vortices spiralling across the globe (shame about the rainbow colourbar though). Or the trillion-particle Dark Sky Simulation images we saw at SciPy. Or this wonderful (real, not simulated) video by the Perron Group at MIT:

Staying with visuals, I highly recommend reading anything by Mike Bostock, especially if you're into web technology. The inventor of D3.js, a popular data viz library, here's his exploration of algorithms, from sampling to sorting. It's more conceptual than straight up visualization of data, but no less insightful.

And I recently read about some visual goodness combined with one of my favourite subjects, openness. Peter Falkingham, a palaeontologist at the Royal Vetinary College and Brown University, has made a collection of 3D photographs of modern tracks and traces available to the world. He knows his data is more impactful when others can use it too.

Derald Smith and sedimentology

From Smith et al. (2009) in SEPM Special Publication No. 97.The geological world was darkened by the death of Derald Smith on 18 June. I met Derald a few times in connection with working on the McMurray Formation of Alberta, Canada during my time at ConocoPhillips. We spent an afternoon examining core and seismic data, and speculating about counter-point-bars, a specialty of his. He was an intuitive sedimentologist whose contributions will be remembered for many years.

Another geological Smith is being celebrated in September at the Geological Society of London's annual William Smith Meeting. The topic this year is The Future of Sequence Stratigraphy: Evolution or Revolution? Honestly, my first thought was "hasn't that conversation been going on since 1994?", but on closer inspection, it promises to be an interesting two days on 'source-to-sink', 'landscape into rock', and some other recent ideas.

The issue of patents reared up in June when Elon Musk of Tesla Motors announced the relaxation of their patents — essentially a promise not to sue anyone using one of their patented technology. He realizes that a world where lots of companies make electric vehicles is better for Tesla. I wrote a piece about patents in our industry.

Technology roundup

A few things that caught our eye online:

Along with our good friend Duncan Child, we started Software Underground, a dicussion group on subsurface software and entrepreneurship. It's in private beta for now — follow the links to request an invite.
We like colour. Matteo Niccoli's tutorial on colourmaps is out tomorrow in the August issue of The Leading Edge.
Colour came up at SciPy too — check out Kristen Thyng's talk at SciPy.
NASA scientist Rob Simmon tweeted last month about HCL Wizard, a wonderful new perceptual colour palette tool
WellDatabase.com is a new commercial site trying to unify access to public well data in the US. We still prefer Ted Kiernan's more open approach with PublicWellData.com, but competition is always good.
Rob Smallshire of Sixty North in Norway has rescued one of the few open tools that can read and write SEG Y data — here's segpy!

Last thing: did you know that the unit of acoustic impedance is the Rayl? Me neither.

Previous linkfests: April — January — October.

The figure is from Smith et al. (2009), Stratigraphy of counter-point-bar and eddy accretion deposits in low-energy meander belts of the Peace–Athabasca delta, northeast Alberta, Canada. In: SEPM Special Publication No. 97, ISBN 978-1-56576-305-0, p. 143–152. It is copyright of SEPM, and used here in accordance with their terms.

July 29, 2014

Graphics that repay careful study

July 29, 2014/ Evan Bianco

The Visual Display of Quantitative Information by Edward Tufte (2nd ed., Graphics Press, 2001) celebrates communication through data graphics. The book provides a vocabulary and practical theory for data graphics, and Tufte pulls no punches — he suggests why some graphics are better than others, and even condemns failed ones as lost opportunities. The book outlines empirical measures of graphical performance, and describes the pursuit of graphic-making as one of sequential improvement through revision and editing. I see this book as a sort of moral authority on visualization, and as the reference book for developing graphical taste.

Through design, the graphic artist allows the viewer to enter into a transaction with the data. High performance graphics, according to Tufte, 'repay careful study'. They support discovery, probing questions, and a deeper narrative. These kinds of graphics take a lot of work, but they do a lot of work in return. In later books Tufte writes, 'To clarify, add detail.'

A stochastic AVO crossplot

Consider this graphic from the stochastic AVO modeling section of modelr. Its elements are constructed with code, and since it is a program, it is completely reproducible.

Let's dissect some of the conceptual high points. This graphic shows all the data simultaneously across 3 domains, one in each panel. The data points are sampled from probability density estimates of the physical model. It is a large dataset from many calculations of angle-dependent reflectivity at an interface. The data is revealed with a semi-transparent overlay, so that areas of certainty are visually opaque, and areas of uncertainty are harder to see.

At the same time, you can still see every data point that makes the graphic giving a broad overview (the range and additive intensity of the lines and points) as well as the finer structure. We place the two modeled dimensions with templates in the background, alongside the physical model histograms. We can see, for instance, how likely we are to see a phase reversal, or a Class 3 response subject to the physical probability estimates. The statistical and site-specific nature of subsurface modeling is represented in spirit. All the data has context, and all the data has uncertainty.

Rules for graphics that work

Tufte summarizes that excellent data graphics should:

Show all the data.
Provoke the viewer into thinking about meaning.
Avoid distorting what the data have to say.
Present many numbers in a small space.
Make large data sets coherent.
Encourage the eye to compare different pieces of the data.
Reveal the data at several levels of detail, from a broad overview to the fine structure.
Serve a reasonably clear purpose: description, exploration, tabulation, or decoration.
Be closely integrated with the statistical and verbal descriptions of a data set.

The data density, or data-to-ink ratio, looks reasonably high in my crossplot, but it could like still be optimized. What would you remove? What would you add? What elements need revision?

July 23, 2014

Whither technical books?

July 23, 2014/ Matt Hall

Pile of geophysics books Leafing through our pile of new books on seismic analysis got me thinking about technical books and the future of technical publishing. In particular:

Why are these books so expensive?
When will we start to see reproducibility?
Does all this stuff just belong on the web?

Why so expensive?

Should technical books really cost several times what ordinary books cost? Professors often ask us for discounts for modelr, our $9/mo seismic modeling tool. Students pay 10% of what pros pay in our geocomputing course. Yet academic books cost three times what consumer books cost. I know it's a volume game — but you're not going to sell many books at $100 a go! And unlike consumer books, technical authors usually don't make any money — a star writer may score 6% of net sales... once 500 books have been sold (see Handbook for Academic Authors).

Where's the reproducibility?

Compared to the amazing level of reproducibility we saw at SciPy — where the code to reproduce virtually every tutorial, talk, and poster was downloadable — books are still rather black box. For example, the figures are often drafted, not generated. A notable (but incomplete) exception is Chris Liner's fantastic (but ridiculously expensive) volume, Elements of 3D Seismology, in which most of the figures seem to have been generated by Mathematica. The crucial final step is to share the code that generated them, and he's exploring this in recent blog posts (e.g. right).

I can think of three examples of more reproducible geophysics in print:

Gary Mavko has shared a lot of MATLAB code associated with Quantitative Seismic Interpretation and The Rock Physics Handbook. The code to reproduce the figures is not provided, and MATLAB is not really open, but it's a start.
William Ashcroft's excellent book, A Petroleum Geologist's Guide to Seismic Reflection contains (proprietary, Windows only) code on a CD, so you could in theory make some of the figures yourself. But it wouldn't be easy.
The series of tutorials I'm coordinating for The Leading Edge has, so far, includes all code to reproduce figures, exclusively written in open languages and using open or synthetic data. Kudos to SEG!

Will the web win?

None of this comes close to Sergey Fomel's brand of fully reproducible geophysics. He is a true pioneer in this space, up there with Jon Claerbout. (You should definitely read his blog!). One thing he's been experimenting with is 'live' reproducible documents in the cloud. If we don't see an easy way to publish live, interactive notebooks in the cloud this year, we'll see them next year for sure.

So imagine being able to read a technical document, a textbook say, with all the usual features you get online — links, hover-over, clickable images, etc. But then add the ability to not only see the code that produced each figure, but to edit and re-run that code. Or add slider widgets for parameters — "What happens to the gather if if I change Poisson's ratio?" Now, since you're on the web, you can share your modification with your colleagues, or the world.

Now that's a book I'd be glad to pay double for.

Some questions for you

We'd love to know what you think of technical books. Leave a comment below, or get in touch.

Do you purchase technical books regularly? What prompts you to buy a book?
What book keeps getting pulled off your shelf, and which ones collect dust?
What's missing from the current offerings? Workflows, regional studies, atlases,...?
Would you rather just consume everything online? Do you care about reproducibility?

400 posts

The last post was our 400th on this blog. At an average of 500 words, that's about 200,000 words since we started at the end of 2010. Enough for a decent-sized novel, but slightly less likely to win a Pulitzer. In that time, according to Google, almost exactly 100,000 individuals have stopped by agilegeoscience.com — most of them lots of times — thank you readers for keeping us going! The most popular posts: Shale vs tight, Rock physics cheatsheet, and Well tie workflow. We hope you enjoy reading at least half as much as we enjoy writing.

July 18, 2014

Six books about seismic analysis

July 18, 2014/ Matt Hall

Last year, I did a round-up of six books about seismic interpretation. A raft of new geophysics books recently, mostly from Cambridge, prompts this look at six volumes on seismic analysis — the more quantitative side of interpretation. We seem to be a bit hopeless at full-blown book reviews, and I certainly haven't read all of these books from cover to cover, but I thought I could at least mention them, and give you my first impressions.

If you have read any of these books, I'd love to hear what you think of them! Please leave a comment.

Observation: none of these volumes mention compressive sensing, borehole seismic, microseismic, tight gas, or source rock plays. So I guess we can look forward to another batch in a year or two, when Cambridge realizes that people will probably buy anything with 3 or more of those words in the title. Even at $75 a go.

Quantitative Seismic Interpretation

Per Avseth, Tapan Mukerji and Gary Mavko (2005). Cambridge University Press, 408 pages, ISBN 978-0-521-15135-1. List price USD 91, $81.90 at Amazon.com, £45.79 at Amazon.co.uk

You have this book, right?

Every seismic interpreter that's thinking about rock properties, AVO, inversion, or anything beyond pure basin-scale geological interpretation needs this book. And the MATLAB scripts.

Rock Physics Handbook

Gary Mavko, Tapan Mukerji & Jack Dvorkin (2009). Cambridge University Press, 511 pages, ISBN 978-0-521-19910-0. List price USD 100, $92.41 at Amazon.com, £40.50 at Amazon.co.uk

If QSI is the book for quantitative interpreters, this is the book for people helping those interpreters. It's the Aki & Richards of rock physics. So if you like sums, and QSI left you feeling unsatisifed, buy this too. It also has lots of MATLAB scripts.

Seismic Reflections of Rock Properties

Jack Dvorkin, Mario Gutierrez & Dario Grana (2014). Cambridge University Press, 365 pages, ISBN 978-0-521-89919-2. List price USD 75, $67.50 at Amazon.com, £40.50 at Amazon.co.uk

This book seems to be a companion to The Rock Physics Handbook. It feels quite academic, though it doesn't contain too much maths. Instead, it's more like a systematic catalog of log models — exploring the full range of seismic responses to rock properies.

Practical Seismic Data Analysis

Hua-Wei Zhou (2014). Cambridge University Press, 496 pages, ISBN 978-0-521-19910-0. List price USD 75, $67.50 at Amazon.com, £40.50 at Amazon.co.uk

Zhou is a professor at the University of Houston. His book leans towards imaging and velocity analysis — it's not really about interpretation. If you're into signal processing and tomography, this is the book for you. Mostly black and white, the book has lots of exercises (no solutions though).

Seismic Amplitude: An Interpreter's Handbook

Rob Simm & Mike Bacon (2014). Cambridge University Press, 279 pages, ISBN 978-1-107-01150-2 (hardback). List price USD 80, $72 at Amazon.com, £40.50 at Amazon.co.uk

Simm is a legend in quantitative interpretation and the similarly lauded Bacon is at Ikon, the pre-eminent rock physics company. These guys know their stuff, and they've filled this superbly illustrated book with the essentials. It belongs on every interpreter's desk.

Seismic Data Analysis Techniques...

Enwenode Onajite (2013). Elsevier. 256 pages, ISBN 978-0124200234. List price USD 130, $113.40 at Amazon.com. £74.91 at Amazon.co.uk.

This is the only book of the collection I don't have. From the preview I'd say it's aimed at undergraduates. It starts with a petroleum geology primer, then covers seismic acquisition, and seems to focus on processing, with a little on interpretation. The figures look rather weak, compared to the other books here. Not recommended, not at this price.

NOTE These prices are Amazon's discounted prices and are subject to change. The links contain a tag that gets us commission, but does not change the price to you. You can almost certainly buy these books elsewhere.

July 15, 2014

The event that connects like the web

July 15, 2014/ Evan Bianco

Last week, Matt, Ben, and I attended SciPy 2014, the 13th annual scientific computing with Python conference. On a superficial level, it was just another conference. But there were other elements, brought forth by the organizers and participants (definitely not just attendees) and slowly revealed over the week. Together, the community created the conditions for a truly remarkable experience.

Immutable accessibility

By design, the experience starts before the event, and continues after it is over. Before each of the four half-day tutorials I attended, the instructors posted their teaching materials, code, and setup instructions. Most oral presentations did the same. Most code and content was served through GitHub or Bitbucket and instructions were posted using Mozilla's Etherpad. Ultimately the tools don't matter — it's the intention that is important. Instructors and speakers plan to connect.

Enhancing the being there

Beyond talks and posters, here are some examples of other events that were executed with engagement in mind:

Keynote presentations. If a keynote is truly key, design the schedule so that everyone can show up — they're a great way to start the day on a high note.
Birds of a Feather sessions are better than a panel discussion or Q&A. Run around with a microphone, and record notes in Etherpad.
Lightning talks at the end the day. Anyone can request 5 minutes on a show & tell. It was the first time I've heard applause erupt in the middle of a talk — and it happened several times.
Developer sprints take an hour to teach newbies how to become active members of your community or your project. Then spend two-days showing them how you work.

Record all the things

SciPy is not a conference, it's a hypermedia stream that connects networks across organizational boundaries. And it happens in real time — I overheard several people remarking in astonishment that the video of so-and-so's talk earlier that same morning was already posted online. My trained habit of frantic note-taking was redundant, freeing my concentration for more active listening. Instructors and presenters published their media online, and the majority of presenters pulled up interactive iPython notebooks in the browser and executed code on the fly.

As an example of this, here's Karl Schleicher of Sergey Fomel's group at UT, talking about reproducing the results from a classic paper in The Leading Edge, Spitz (1999):

We need this

On Friday evening Matt remarked to one of the sponsors, "This is the closest thing I have seen to what a conference should be". I think what he meant by that is that it should be about connecting. It should be about pushing our work out to the largest possible scope. It should be open by default, and designed to support ideas and conversations long after it is over. Just like all the things that the web is for as well.

Our question: Can we help SEG, AAPG, or EAGE deliver this to our community? Or do we have to go and build it?

July 11, 2014

Geophysics at SciPy 2014

July 11, 2014/ Matt Hall

Wednesday was geophysics day at SciPy 2014, the conference for scientific Python in Austin. We had a mini-symposium in the afternoon, with 4 talks and 2 lightning talks about posters.

All the talks

Here's what went on in the session...

Matt Hall — Modelr seismic models
Patrick Cole — PyGMI grav-mag modeling
Joe Kington, Chevron — 3D seismic viz in Python
Leo Uieda, Universidade do Estado do Rio de Janeiro — Fatiando poster preview (full 2013 talk)
Rowan Cockett, UBC and 3pt Science — SimPEG poster preview
Karl Schleicher — Prototyping geophysical algorithms

The talks should all be online eventually. For now, you can watch my talk and Joe's (awesome) talk right here...

And also...

There have been so many other highlights at this amazing conference that I can't resist sharing a couple of the non-geophysical gems...

Here's a 31 TB file on the Internet, a dark matter simulation containing 1 trillion particles... that you can read from arbitrarily with a few lines of Python.
There is so much interactive plotting awesomeness coming to Python: Plotly, Bokeh (right), mpld3... And collaboration is here too in the Jupyter coLaboratory.

Last thing... If you use the scientific Python stack in your work, please consider giving as generously as you can to the NumFOCUS Foundation. Support open source!

July 09, 2014

SciPy will eat the world... in a good way

July 09, 2014/ Matt Hall

We're at the SciPy 2014 conference in Austin, the big giant meetup for everyone into scientific Python.

One surprising thing so far is the breadth of science and computing in play, from astronomy to zoology, and from AI to zero-based indexing. It shouldn't have been surprising, as SciPy.org hints at the variety:

There's really nothing you can't do in the scientific Python ecosystem, but this isn't why SciPy will soon be everywhere in science, including geophysics and even geology. I think the reason is IPython Notebook, and new web-friendly ways to present data, directly from the computing environment to the web — where anyone can see it, share it, interact with it, and even build on it in their own work.

Teaching STEM

In Tuesday's keynote, Lorena Barba, an uber-prof of engineering at The George Washington University, called IPython Notebook the killer app for teaching in the STEM fields. She has built two amazing courses in Notebook: 12 Steps to Navier–Stokes and AeroPython (right), and more are on the way. Soon, perhaps through Jupyter CoLaboratory (launching in alpha today), perhaps with the help of tools like Bokeh or mpld3, the web versions of these notebooks will be live and interactive. Python is already the new star of teaching computer science, web-friendly super-powers will continue to push this.

Let's be extra clear: if you are teaching geophysics using a proprietary tool like MATLAB, you are doing your students a disservice if you don't at least think hard about moving to Python. (There's a parallel argument for OpedTect over Petrel, but let's not get into that now.)

Reproducible and presentable

Can you imagine a day when geoscientists wield these data analysis tools with the same facility that they wield other interpretation software? With the same facility that scientists in other disciplines are already wielding them? I can, and I get excited thinking about how much easier it will be to collaborate with colleagues, document our workflows (for others and for our future selves), and write presentations and papers for others to read, interact with, and adapt for their own work.

To whet your appetite, here's the sort of thing I mean (not interactive, but here's the code)...

If you agree that it's needed, I want to ask: What traditions or skill gaps are in the way of this happening? How can our community of scientists and engineers drive this change? If you disagree, I'd love to hear why.

Update on 2014-07-11 12:52 by Matt Hall

Here's Lorena Barba's keynote talk on reproducibility, the flipped classroom, and teaching with Notebooks. Skip to 11:30 for the start of Lorena's talk.

July 06, 2014

Looking forward to SciPy 2014

July 06, 2014/ Matt Hall

This week the Agile crew is at the SciPy conference in Austin, Texas. SciPy is a scientific library for the Python programming language, and the eponymous conference is the annual meetup for the physicists, astonomers, economists — and even the geophysicists! — that develop and use SciPy.

What is SciPy?

Python is an awesome high-level programming language. It's awesome because...

Python is free and open source.
Python is easy to learn and quite versatile.
Python has hundreds of great open source extensions, called libraries.
The Python ecosystem is actively developed by programmers at Google, Enthought, Continuum, and elsewhere.
Python has a huge and talkative user community, so finding help is easy.

All of these factors make it ideal for crunching and visualizing scientific data. The most important of these is NumPy, which provides efficient linear algebra operations — essential for handling big vectors and matrices. SciPy builds on NumPy to provide signal processing, statistics, and optimization. There are other packages in the same ecosystem for plotting, data management, and so on.

If you follow this blog, you know we have been getting into code lately. We think that languages like Python, GNU Octave, and R (a stastical language) are a core competency for geoscientists. That's why we want to help geoscientists learn Python, and why we organize hackathons, and why we keep going on about it on the blog.

What's going on in Austin?

Technical organizers Katy Huff and Serge Rey have put together a fantastic schedule including 2 days of tutorials (already underway), 3 days of technical talks and posters, and 2 days of sprints (focused coding sessions). Interspersed throughout the talk days are 'Birds of a Feather' meetups for various special-interest groups, and more social gatherings. It's exactly what a scientific conference should be: active learning, content, social, hacking, and unstructured discussion.

Here are some of the things I'm most looking forward to:

Tutorial: Image analysis in Python with scipy and scikit-image with Juan Nunez-Iglesias and Tony Yu.
Talk: GeoPandas: Geospatial data + pandas by Kelsey Jordhal.
Talk: PyMC: Markov chain Monte Carlo in Python by Chris Fonnesbeck.
Session: The geophysics mini-symposium, of course! All Wednesday afternoon.
How to choose a good colour map by Damon McDougall

If you're interested in hearing about what's going on in this corner of the geophysical and scientific computing world, tune in this week to read more. We'll be posting regularly to the blog, or you can follow along on the #SciPy2014 Twitter hashtag.

July 02, 2014

Well tie calculus

July 02, 2014/ Evan Bianco

As Matt wrote in March, he is editing a regular Tutorial column in SEG's The Leading Edge. I contributed the June edition, entitled Well-tie calculus. This is a brief synopsis only; if you have any questions about the workflow, or how to get started in Python, get in touch or come to my course.

Synthetic seismograms can be created by doing basic calculus on traveltime functions. Integrating slowness (the reciprocal of velocity) yields a time-depth relationship. Differentiating acoustic impedance (velocity times density) yields a reflectivity function along the borehole. In effect, the integral tells us where a rock interface is positioned in the time domain, whereas the derivative tells us how the seismic wavelet will be scaled.

This tutorial starts from nothing more than sonic and density well logs, and some seismic trace data (from the #opendata Penobscot dataset in dGB's awesome Open Seismic Repository). It steps through a simple well-tie workflow, showing every step in an IPython Notebook:

Loading data with the brilliant LASReader
Dealing with incomplete, noisy logs
Computing the time-to-depth relationship
Computing acoustic impedance and reflection coefficients
Converting the logs to 2-way travel time
Creating a Ricker wavelet
Convolving the reflection coefficients with the wavelet to get a synthetic
Making an awesome plot, like so...

Final thoughts

If you find yourself stretching or squeezing a time-depth relationship to make synthetic events align better with seismic events, take the time to compute the implied corrections to the well logs. Differentiate the new time-depth curve. How much have the interval velocities changed? Are the rock properties still reasonable? Synthetic seismograms should adhere to the simple laws of calculus — and not imply unphysical versions of the earth.

Matt is looking for tutorial ideas and offers to write them. Here are the author instructions. If you have an idea for something, please drop him a line.

Blog