The (bad) stuff of legend

What is a legend? Merriam–Webster says:

  1. A story from the past that is believed by many people but cannot be proved to be true.
  2. An explanatory list of the symbols on a map or chart.

I think we can combine these:

An explanatory list from the past that is believed by many to be useful but which cannot be proved to be.

Maybe that goes too far, sometimes you need a legend. But often, very often, you don't. At the very least, you should always try hard to make the legend irrelevant. Why, and how, can you do this? 

A case study

On the right is a non-scientific caricature of a figure from a paper I just finished reviewing for Geophysics. I won't give any more details because I don't want to pick on it unduly — lots of authors make the same mistakes.

Here are some of the things I think are confusing about this figure, detracting from the science in the paper. 

  • Making the reader cross-reference the line decoration with the legend makes it harder to make the comparison you're asking them to make. Just label the lines directly. 
  • Using unhelpful, generic names like 1, 2, and 3 for the models leads the reader into cross-reference Inception. The models were shown and explained on the previous page. 
  • Inception again: the models 1, 2, and 3 were shown in the previous figure parts (a), (b), and (c) respectively. So I had to cross-reference deeper still to really find out about them. 
  • The paper used colour elsewhere, so the use of black and white line decoration here seems unnecessary. There are other ways to ensure clarity if the paper is photocopied.
  • Everything on the same visual plane, so to speak, so the chart cannot take any more detail, such as gridlines. 

Getting better

I have tried to fix some of this in the version of the figure shown here. It's the same size as the original. The legend, such as it is, is now a visual key to the models. Careful juxtaposition of figures could obviate the need even for this extra key. The idea would be to use the colours and names of the models in every figure, to link them more intuitively.

The principles at work:

  • Reduce the fatigue of reading by labeling things directly.
  • Avoid using 'a' and 'b' or other generic names. Call the parts before and after, or 8 ms gate and 16 ms gate
  • Put things you want people to compare next to each other: models with data, output with input, etc. 
  • Use less ink for decoration, more ink for data. Gently direct the reader's attention. 

I'm sure there are other improvements we could make. Do you have any tips to share for making better figures? Leave them in the comments. 


Update, 30 Jan 2015

Some great comments came in today, and the point about black and white is well taken. Indeed, our 52 Things books are all black and white, and I end up transforming most images and figures to (I hope) make them clearer without colour. Here's how I'd do this figure in black and white.

On breaking rules

Humans have a complicated relationship with rules. 

One of the mantras of the 21st century economy is 'first, break all the rules'. If the rules are merely stale conventions, then yes: break away. But it's tempting to go too far and scoff at all rules, and even laws, as the petty creations of boring bureaucrats, declaring, "Rules? Pah! We won't be tied down by your rules!"

But it's not that simple. We like some rules, like the rule about not smoking in aeroplanes, or parking in your reserved parking place. When others break those rules, it's annoying. And rules that define boundaries can heighten, not hinder, creativity and impact — look at code golf, Yves Klein, haiku (though the 5–7–5 thing is a myth), and Twitter

So what to do about a rule we don't like? There are usually a few options:

  1. Obey it. The rule worked! But maybe not for you.
  2. Change it. This might work, but it might take a whileGood luck!
  3. Break it. Easy! Just pretend it's not there. There's no need to feel bad: everyone else is doing it.

Is that it? Be boring, be brave, or stick it to the man? No, it's a false trichotomy. There is a fourth option:

  1. Make the rule irrelevant. Build or contribute to a new version of reality where the rule no longer applies.

In other words, don't break stupid rules — that doesn't change anything. Better to make your point by subverting the entire foundation of stupid rules. For example:

  • When lawyer Larry Lessig decided he'd had enough of copyright restrictions, he didn't say 'screw you guys' and start downloading movies on BitTorrent. He started Creative Commons and transformed the way the sharing economy functions. Result: not just reduced revenue, but reduced impact of traditional media — far more important.
  • The local government will partly fund training for small businesses from a marketing consultant. Apparently, it's common to game this system by hiring a consultant under this program, then simply having them do work for hire — website, branding, and so on. But these are normal business expenses; instead of coercing a broken system to channel public money into private enterprise, we'd all be better off beating a new path to small-scale investment and collaboration. 
  • There's a young would-be Robin Hood in the geoscience publishing world, hosting copyrighted textbook PDFs for free download. He believes he's helping to rid the world of the tyranny of over-priced technical literature, but he's going about it the wrong way. Better to promote open-access literature, and be a champion of legal re-use. This denies 'the establishment' their impact, instead of lauding it, and helps spread truly shareable content.

Next time you come across a rigid rule you don't like, don't break it. Ask instead how you can make the rule not matter.

No Trespassing image CC-BY-SA by Michael Dorausch on Flickr.

Test driven development geoscience

Sometimes I wonder how much of what we do in applied geoscience is really science. Is it really about objective enquiry? Do we form hypotheses, then test them? The scientific method is largely a caricature — science is more accidental and more fun than a step-by-step recipe — but I think our field sometimes falls short of even basic rigour. Go and sit through a conference session on seismic attribute analysis some time and you'll see what I mean. Let's just say there's a lot of arm-waving and shape-ology. 

Learning from geeks

We've written before about the virtues of the software engineering community. Innovation has been so rapid recently, that I think it's a great place to find interpretation hacks like pair picking. Learning about and experiencing the amazing productivity of programmers is one of the reasons I think all scientists should learn to program (but not learn to be a programmer). You'll find out about concepts like version control, user-centered design, and test-driven development. Programmers embrace these ideas to a greater or lesser degree, depending on their goals and those of the project they're working on. But all programmers know them.

I'm especially into test-driven development at the moment. The idea is that before implementing a new module or feature, you write a test — a short program that gives the new thing some input, inspects the output, and compares it to a known answer. The first version of the code will likely fail the test. The idea is to refactor the code until it passes the test. Then you add that test to a suite that runs every time you build anything in the same project, so you know your new thing doesn't get broken by something else later. And you aren't tempted to implement features that weren't part of the test.

Fail — Refactor — Pass

Imagine test-driven development geology (or any other kind of geoscience). What would that look like?

  • When planning wells, we often do write tests — they're called prognoses. But the comparison with the result is rarely formalized or quantified, especially outside the target zone. Once the well is drilled, it becomes data and we move on. No-one likes to dwell on the poorly understood or error-prone, but naturally that's where the greatest room for improvement is.  
  • When designing a new seismic attribute, or embarking on a seismic processing project, we often have a vague idea of success in our heads, and that's about it. What if we explicitly defined an input test dataset, some wells or bits of wells, and set 'passing' performance criteria on those? "I won't interpret the reprocessed seismic until it improves those synthetic correlation coefficients by 40%."
  • When designing a seismic survey, we could establish acceptable criteria for trace density, minimum offset, azimuth distribution, and recording time, then use these as a cost function to find the best possible survey for our dollars. Wait, perhaps we actually do this one. Is seismic acquisition unusually scientific? Or is it an inherently more linear problem?

What do you think? Can you see ways to define 'success' before you begin, then somewhat quantitatively compare your results with that? Ideas wanted!

Seismic survey layout: from theory to practice

Up to this point, we've modeled the subsurface moveout and the range of useful offsets, we've build an array of sources and receivers, and we've examined the offset and azimuth statistics in the bins. And we've done it all using open source Python libraries and only about 100 lines of source code. What we have now is a theoretical seismic program. Now it's time to put that survey on the ground. 

The theoretical survey

Ours is a theoretical plot because it idealizes the locations of sources and receivers, as if there were no surface constraints. But it's unlikely that we'll be able to put sources and receivers in perfectly straight lines and at perfectly regular intervals. Topography, ground conditions, buildings, pipelines, and other surface factors have an impact on where stations can't be placed. One of the jobs of the survey designer is to indicate how far sources and receivers can be skidded, or moved away from their theoretical locations before rejecting them entirely.

From theory to practice

In order to see through the noise, we need to collect lots of traces with plenty of redundancy. The effect of station gaps or relocations won't be as immediately obvious as dead pixels on a digital camera, but they can cause some bins to have fewer traces than the idealized layout, which could be detrimental to the quality of imaging in that region. We can examine the impact of moving and removing stations on the data quality, by recomputing the bin statistics based on the new geometries, and comparing them to the results we were designing for. 

When one station needs to be adjusted, it may make sense to adjust several neighbouring points to compensate, or to add more somewhere nearby. But how can we tell what makes sense? The points should resemble the idealized fold and minimum offset statistics bin by bin. For example, let's assume that we can't put sources or receivers in river valleys and channels. Say they are too steep, or water would destroy the instrumentation, or are otherwise off limits. So we remove the invalid points from our series, giving our survey a more realistic surface layout based on the ground conditions. 

Unlike the theoretical layout, we now have bins that aren't served by any traces at all so we've made them invisible (no data). On the right, bins that have a minimum offset greater than 800 m are highlighted in grey. Beneath these grey bins is where the onset of imaging would be the deepest, which would not be a good thing if we have interests in the shallow part of the subsurface. (Because seismic energy spreads out more or less spherically from the source, we will eventually undershoot all but the largest gaps.)

This ends the mini-series on seismic acquisition. I'll end with the final state of the IPython Notebook we've been developing, complete with the suggested edits of reader Jake Wasserman in the last post — this single change resulted in a speed-up of the midpoint-gathering step from about 30 minutes to under 30 seconds!

We want to know... How do you plan seismic acquisitions? Do you have a favourite back-of-the-envelope calculation, a big giant spreadsheet, or a piece of software you like? Let us know in the comments.

It goes in the bin

The cells of a digital image sensor. CC-BY-SA Natural Philo.

The cells of a digital image sensor. CC-BY-SA Natural Philo.

Inlines and crosslines of a 3D seismic volume are like the rows and columns of the cells in your digital camera's image sensor. Seismic bins are directly analogous to pixels — tile-like containers for digital information. The smaller the tiles, the higher the maximum realisable spatial resolution. A square survey with 4 million bins (or 4 megapixels) gives us 2000 inlines and 2000 crosslines to interpret, after processing the data of course. Small bins can mean high resolution, but just as with cameras, bin size is only one aspect of image quality.

Unlike your digital camera however, seismic surveys don't come with a preset number of megapixels. There aren't any bins until you form them. They are an abstraction.

Making bins

This post picks up where Laying out a seismic survey left off. Follow the link to refresh your memory; I'll wait here. 

At the end of that post, we had a network of sources and receivers, and the Notebook showed how I computed the midpoints of the source–receiver pairs. At the end, we had a plot of the midpoints. Next we'd like to collect those midpoints into bins. We'll use the so-called natural bins of this orthogonal survey — squares with sides half the source and receiver spacing.

Just as we represented the midpoints as a GeoSeries of Point objects, we will represent  the bins with a GeoSeries of Polygons. GeoPandas provides the GeoSeries; Shapely provides the geometries; take a look at the IPython Notebook for the code. This green mesh is the result, and will hold the stacked traces after processing.

bins_physical.png

Fetching the traces within each bin

To create a CMP gather like the one we modelled at the start, we need to grab all the traces that have midpoints within a particular bin. And we'll want to create gathers for every bin, so it is a huge number of comparisons to make, even for a small example such as this: 128 receivers and 120 sources make 15 320 midpoints. In a purely GIS environment, we could perform a spatial join operation between the midpoint and bin GeoDataFrames, but instead we can use Shapely's contains method inside nested loops. Because of the loops, this code block takes a long time to run.

# Make a copy because I'm going to drop points as I
# assign them to polys, to speed up subsequent search.
midpts = midpoints.copy()

offsets, azimuths = [], [] # To hold complete list.

# Loop over bin polygons with index i.
for i, bin_i in bins.iterrows():
    
    o, a = [], [] # To hold list for this bin only.
    
    # Now loop over all midpoints with index j.
    for j, midpt_j in midpts.iterrows():
        if bin_i.geometry.contains(midpt_j.geometry):
            # Then it's a hit! Add it to the lists,
            # and drop it so we have less hunting.
            o.append(midpt_j.offset)
            a.append(midpt_j.azimuth)
            midpts = midpts.drop([j])
            
    # Add the bin_i lists to the master list
    # and go around the outer loop again.
    offsets.append(o)
    azimuths.append(a)
    
# Add everything to the dataframe.    
bins['offsets'] = gpd.GeoSeries(offsets)
bins['azimuths'] = gpd.GeoSeries(azimuths)

After we've assigned traces to their respective bins, we can make displays of the bin statistics. Three common views we can look at are:

  1. A spider plot to illustrate the offset and azimuth distribution.
  2. A heat map of the number of traces contributing to each bin, usually called fold.
  3. A heat map of the minimum offset that is servicing each bin. 

The spider plot is easily achieved with Matplotlib's quiver plot:

spider_bubble_zoom.png

And the arrays representing our data are also quite easy to display as heatmaps of fold (left) and minimum offset (right): 

fold_and_xmin_physical.png

In the next and final post of this seismic survey mini-series, we'll analyze the impact of data quality when there are gaps and shifts in the source and receiver stations from these idealized locations.

Last thought: if the bins of a seismic survey are like a digital camera's image sensor, then what is the apparatus that acts like a lens? 

What does Agile actually do?

For the forgetful and/or the nostalgic, here's a reminder of what the old site looked like. If there's a particular feature you miss, please let us know!

For the forgetful and/or the nostalgic, here's a reminder of what the old site looked like. If there's a particular feature you miss, please let us know!

There's one question almost everybody asks us: "What do you guys actually do?". The more brazen get straight to the point: "How do you guys make money?". One way to answer this question, without making people wait till they meet us, is with the website. And our website has always been quite... bloggy. It's easy to see how we don't make money, less obvious how we do.

New website

After 4 years with the same design, we've given the site a new look. Four years is half a lifetime in web years, so we now have lots of upgraded features, like mobile responsiveness, integrated commerce, and lots of dynamic content. We can also make it a bit easier to see what people hire us for, and to hire us yourself. 

Here's a quick overview of some of the new content:

  • Services — things we do for money.
  • Courses — boost your skills as a scientist and communicator.
  • Products — things we make... for money, or for fun, or because they have to be done.
  • Shop — where you can buy boxes of books, and maybe more one day. 
  • Projects — some of the bigger things we're into right now, like Modelr and SubSurfWiki.

I still find it hard to describe what we do, and I kind of like it that way. We're not easy to pigeonhole. We're fortunate not only to have always had enough work, but also to work with some of the smartest and most energetic people in our industry. But I hope our new site goes some way to making it easier to find out how we might be able to help you and your organization be a bit more awesome. That's what we're here for. 


Some geeky footnotes

For anyone who's interested, the site is powered by Squarespace 7. We were previously on Squarespace 5. There's a lot of upside to a package deal like Squarespace, but of course you lose some flexibility. I might have thought about moving to another platform, perhaps WordPress and its open source goodness, but in the end the ease of transferring our 400+ blog posts was the deciding factor. 

For those of you using a news reader like The Old Reader to read our blog, please note that the old 'journal' feed URL no longer works — please update it to http://www.agilegeoscience.com/blog?format=RSS. The Feedburner feed is still active, at least for now.