Geocomputing: Call for papers

52 Things .+? Geocomputing is in the works.

For previous books, we've reached out to people we know and trust. This felt like the right way to start our micropublishing project, because we had zero credibility as publishers, and were asking a lot from people to believe anything would come of it.

Now we know we can do it, but personal invitation means writing to a lot of people. We only hear back from about 50% of everyone we write to, and only about 50% of those ever submit anything. So each book takes about 160 invitations.

This time, I'd like to try something different, and see if we can truly crowdsource these books. If you would like to write a short contribution for this book on geoscience and computing, please have a look at the author guidelines. In a nutshell, we need about 600 words before the end of March. A figure or two is OK, and code is very much encouraged. Publication date: fall 2015.

We would also like to find some reviewers. If you would be available to read at least 5 essays, and provide feedback to us and the authors, please let me know

In keeping with past practice, we will be donating money from sales of the book to scientific Python community projects via the non-profit NumFOCUS Foundation.

What the cover might look like. If you'd like to write for us, please read the author guidelines.

What the cover might look like. If you'd like to write for us, please read the author guidelines.

Julia in a nutshell

Julia is the most talked-about language in the scientific Python community. Well, OK, maybe second to Python... but only just. I noticed this at SciPy in July, and again at EuroSciPy last weekend.

As promised, here's my attempt to explain why scientists are so excited about it.

Why is everyone so interested in Julia?

At some high level, Julia seems to solve what Steven Johnson (MIT) described at EuroSciPy on Friday as 'the two-language problem'. It's also known as Outerhout's dichotomy. Basically, there are system languages (hard to use, fast), and scripting languages (easy to use, slow). Attempts to get the best of boths worlds have tended to result in a bit of a mess. Until Julia.

Really though, why?

Cheap speed. Computer scientists adore C because it's rigorous and fast. Scientists and web developers worship Python because it's forgiving and usually fast enough. But the trade-off has led to various cunning ploys to get the best of both worlds, e.g. PyPy and Cython. Julia is perhaps the cunningest ploy of all, achieving speeds that compare with C, but with readable code, dynamic typing, garbage collection, multiple dispatch, and some really cool tricks like Unicode variable names that you enter in pure LaTeX. And check out this function definition shorthand:

Why is Julia so fast?

Machines don't understand programming languages — the code written by humans has to be translated into machine language in a process called 'compiling'. There are three approaches:

Compiling makes languages fast, because the executed code is tuned to the task (e.g. in terms of the types of variables it handles), and to the hardware it's running on. Indeed, it's only by building special code for, say, integers, that compiled languages achieve the speeds they do.

Julia is compiled, like C or Fortran, so it's fast. However, unlike C and Fortran, which are compiled before execution, Julia is compiled at runtime ('just in time' for execution). So it looks a little like an interpreted language: you can write a script, hit 'run' and it just works, just like you can with Python.

You can even see what the generated machine code looks like:

Don't worry, I can't read it either.

But how is it still dynamically typed?

Because the compiler can only build machine code for specific types — integers, floats, and so on — most compiled languages have static typing. The upshot of this is that the programmer has to declare the type of each variable, making the code rather brittle. Compared to dynamically typed languages like Python, in which any variable can be any type at any time, this makes coding a bit... tricky. (A computer scientist might say it's supposed to be tricky — you should know the type of everything — but we're just trying to get some science done.)

So how does Julia cope with dynamic typing and still compile everything before execution? This is the clever bit: Julia scans the instructions and compiles for the types it finds — a process called type inference — then makes the bytecode, and caches it. If you then call the same instructions but with a different type, Julia recompiles for that type, and caches the new bytecode in another location. Subsequent runs use the appropriate bytecode, with recompilation.

Metaprogramming

It gets better. By employing metaprogramming — on-the-fly code generation for special tasks — it's possible for Julia to be even faster than highly optimized Fortran code (right), in which metaprogramming is unpleasantly difficult. So, for example, in Fortran one might tolerate a relatively slow loop that can only be avoided with code generation tricks; in Julia the faster route is much easier. Here's Steven's example.

Interoperability and parallelism

It gets even better. Julia has been built with interoperability in mind, so calling C — or Python — from inside Julia is easy. Projects like Jupyter will only push this further, and I expect Julia to soon be the friendiest way to speed up that stubborn inner NumPy loop. And I'm told a lot of thoughtful design has gone into Julia's parallel processing libraries... I have never found an easy way into that world, so I hope it's true.


I'm not even close to being able to describe all the other cool things Julia, which is still a young language, can do. Much of it will only be of interest to 'real' programmers. In many ways, Julia seems to be 'Python for C programmers'.

If you're really interested, read Steven's slides and especially his notebooks. Better yet, just install Julia and IJulia, and play around. Here's another tutorial and a cheatsheet to get you started.

Highlights from EuroSciPy

In July, Agile reported from SciPy in Austin, Texas, one of several annual conferences for people writing scientific software in the Python programming language. I liked it so much I was hungry for more, so at the end of my recent trip to Europe I traveled to the city of Cambridge, UK, to participate in EuroSciPy.

The conference was quite a bit smaller than its US parent, but still offered 2 days of tutorials, 2 days of tech talks, and a day of sprints. It all took place in the impressive William Gates Building, just west of the beautiful late Medieval city centre, and just east of Schlumberger's cool-looking research centre. What follows are my highlights...

Okay you win, Julia

Steven Johnson, an applied mathematician at MIT, gave the keynote on the first morning. His focus was Julia, the current darling of the scientific computing community, and part of a new ecosystem of languages that seek to cooperate, not compete. I'd been sort of ignoring Julia, in the hope that it might go away and let me focus on Python, the world's most useful language, and JavaScript, the world's most useful pidgin... but I don't think scientists can ignore Julia much longer.

I started writing about what makes Julia so interesting, but it turned into another post — up next. Spoiler: it's speed. [Edit: Here is that post! Julia in a nutshell.]

Learning from astrophysics

The Astropy project is a truly inspiring community — in just 2 years it has synthesized a dozen or so disparate astronomy libraries into an increasingly coherent and robust toolbox for astronomers and atrophysicists. What does this mean?

  • The software is well-tested and reliable.
  • Datatypes and coordinate systems are rich and consistent.
  • Documentation is useful and evenly distributed.
  • There is a tangible project to rally developers and coordinate funding.

Geophysicists might even be interested in some of the components of Astropy and the related SunPy project, for example:

  • astropy.units, just part of the ever-growing astropy library, as a unit conversion and quantity handler to compare with pint.
  • sunpy datatypes map and spectra for types of data that need special methods.
  • asv is a code-agnostic benchmarking library, a bit like freebench.

Speed dating for scientists

Much of my work is about connecting geoscientists in meaningful collaboration. There are several ways to achieve this, other than through project work: unsessions, wikis, hackathons, and so on. Now there's another way: speed dating.

Okay, it doesn't quite get to the collaboration stage, but Vaggi and his co-authors shared an ingenious way to connect people and give their professional relationship the best chance of success (an amazing insight, a new algorithm, or some software). They asked everyone at a small 40-person conference to complete a questionnaire that asked, among other things, what they knew about, who they knew, and, crucially, what they wanted to know about. Then they applied graph theory to find the most desired new connections (the matrix shows the degree of similarity of interests, red is high), and gave the scientists five 10-minute 'dates' with scientists whose interests overlapped with theirs, and five more with scientists who knew about fields that were new to them. Brilliant! We have to try this at SEG.

Vaggi, F, T Schiavinotto, A Csikasz-Nagy, and R Carazo-Salas (2014). Mixing scientists at conferences using speed dating. Poster presentation at EuroSciPy, Cambridge, UK, August 2014. Code on GitHub.

Vaggi, F, T Schiavinotto, J Lawson, A Chessel, J Dodgson, M Geymonat, M Sato, R Carazo Salas, A Csikasz Nagy (2014). A network approach to mixing delegates at meetings. eLife, 3. DOI: 10.7554/eLife.02273

Other highlights

  • sumatra to generate and keep track of simulations.
  • vispy, an OpenGL-based visualization library, now has higher-level, more Pythonic components.
  • Ian Osvald's IPython add-on for RAM usage.
  • imageio for lightweight I/O of image files.
  • nbagg backend for matplotlib version 1.4, bringin native (non-JS) interactivity.
  • An on-the-fly kernel chooser in upcoming IPython 3 (currently in dev).

All in all, the technical program was a great couple of days, filled with the usual note-taking and hand-shaking. I had some good conversations around my poster on modelr. I got a quick tour of the University of Cambridge geophysics department (thanks to @lizzieday), which made me a little nostalgic for British academic institutions. A fun week!

Slicing seismic arrays

Scientific computing is largely made up of doing linear algebra on matrices, and then visualizing those matrices for their patterns and signals. It's a fundamental concept, and there is no better example than a 3D seismic volume.

Seeing in geoscience, literally

Digital seismic data is nothing but an array of numbers, decorated with header information, sorted and processed along different dimensions depending on the application.

In Python, you can index into any sequence, whether it be a string, list, or array of numbers. For example, we can index into the fourth character (counting from 0) of the word 'geoscience' to select the letter 's':

>>> word = 'geosciences'
>>> word[3]
's'

Or, we can slice the string with the syntax word[start:end:step] to produce a sub-sequence of characters. Note also how we can index backwards with negative numbers, or skip indices to use defaults:

>>> word[3:-1]  # From the 4th character to the penultimate character.
'science'
>>> word[3::2]  # Every other character from the 4th to the end.
'sine'

Seismic data is a matrix

In exactly the same way, we index into a multi-dimensional array in order to select a subset of elements. Slicing and indexing is a cinch using the numerical library NumPy for crunching numbers. Let's look at an example... if data is a 3D array of seismic amplitudes:

timeslice = data[:,:,122] # The 122nd element from the third dimension.
inline = data[30,:,:]     # The 30th element from the first dimension.
crossline = data[:,60,:]  # The 60th element from the second dimension.

Here we have sliced all of the inlines and crosslines at a specific travel time index, to yield a time slice (left). We have sliced all the crossline traces along an inline (middle), and we have sliced the inline traces along a single crossline (right). There's no reason for the slices to remain orthogonal however, and we could, if we wished, index through the multi-dimensional array and extract an arbitrary combination of all three.

Questions involving well logs (a 1D matrix), cross sections (2D), and geomodels (3D) can all be addressed with the rigours of linear algebra and digital signal processing. An essential step in working with your data is treating it as arrays.

View the notebook for this example, or get the get the notebook from GitHub and play with around with the code.

Sign up!

If you want to practise slicing your data into bits, and other power tools you can make, the Agile Geocomputing course will be running twice in the UK this summer. Click one of the buttons below to buy a seat.

Eventbrite - Agile Geocomputing, Aberdeen

Eventbrite - Agile Geocomputing, London

More locations in North America for the fall. If you would like us to bring the course to your organization, get in touch.

A long weekend of creative geoscience computing

The Rock Hack is in three weeks. If you're in Houston, for AAPG or otherwise, this is going to be a great opportunity to learn some new computer skills, build some tools, or just get some serious coding done. The Agile guys — me, Evan, and Ben — will be hanging out at START Houston, laptops open, all say 5 and 6 April, about 8:30 till 5. The breakfast burritos and beers are on us.

Unlike the geophysics hackathon last September, this won't be a contest. We're going to try a more relaxed, unstructured event. So don't be shy! If you've always wanted to try building something but don't know where to start, or just want to chat about The Next Big Thing in geoscience or technology — please drop in for an hour, or a day.

Here are some ideas we're kicking around for projects to work on:

  • Sequence stratigraphy calibration app to tie events to absolute geologic time and to help interpret systems tracts.
  • Wireline log 'attributes'.
  • Automatic well-to-well correlation.
  • Facies recognition from core.
  • Automatic photomicrograph interpretation: grain size, porosity, sorting, and so on.
  • A mobile app for finding and capturing data about outcrops.
  • An open source basin modeling tool.

Short course

If you feel like a short course would get you started faster, then come along on Friday 4 April. Evan will be hosting a 1-day course, leading you through getting set up for learning Python, learning some syntax, and getting started on the path to scientific computing. You won't have super-powers by the end of the day, but you'll know how to get them.

Eventbrite - Agile Geocomputing

The course includes food and drink, and lots of code to go off and play with. If you've always wanted to get started programming, this is your chance!

To make up microseismic

I am not a proponent of making up fictitious data, but for the purposes of demonstrating technology, why not? This post is the third in a three-part follow-up from the private beta I did in Calgary a few weeks ago. You can check out the IPython Notebook version too. If you want more of this in person, sign up at the bottom or drop us a line. We want these examples to be easily readable, especially if you aren't a coder, so please let us know how we are doing.

Start by importing some packages that you'll need into the workspace,

%pylab inline
import numpy as np
from scipy.interpolate import splprep, splev
import matplotlib.pyplot as plt
import mayavi.mlab as mplt
from mpl_toolkits.mplot3d import Axes3D

Define a borehole path

We define the trajectory of a borehole, using a series of x, y, z points, and make each component of the borehole an array. If we had a real well, we load the numbers from the deviation survey just the same.

trajectory = np.array([[   0,   0,    0],
                       [   0,   0, -100],
                       [   0,   0, -200],
                       [   5,   0, -300],
                       [  10,  10, -400],
                       [  20,  20, -500],
                       [  40,  80, -650],
                       [ 160, 160, -700],
                       [ 600, 400, -800],
                       [1500, 960, -800]])
x = trajectory[:,0]
y = trajectory[:,1]
z = trajectory[:,2]

But since we want the borehole to be continuous and smoothly shaped, we can up-sample the borehole by finding the B-spline representation of the well path,

smoothness = 3.0
spline_order = 3
nest = -1 # estimate of number of knots needed (-1 = maximal)
knot_points, u = splprep([x,y,z], s=smoothness, k=spline_order, nest=-1)

# Evaluate spline, including interpolated points
x_int, y_int, z_int = splev(np.linspace(0, 1, 400), knot_points)

plt.gca(projection='3d')
plt.plot(x_int, y_int, z_int, color='grey', lw=3, alpha=0.75)
plt.show()

Define frac ports

Let's define a completion program so that our wellbore has 6 frac stages,

number_of_fracs = 6

and let's make it so that each one emanates from equally spaced frac ports spanning the bottom two-thirds of the well.

x_frac, y_frac, z_frac = splev(np.linspace(0.33, 1, number_of_fracs), knot_points)

Make a set of 3D axes, so we can plot the well path and the frac ports.

ax = plt.axes(projection='3d')
ax.plot(x_int, y_int, z_int, color='grey',
        lw=3, alpha=0.75)
ax.scatter(x_frac, y_frac, z_frac,
        s=100, c='grey')
plt.show()

Set a colour for each stage by cycling through red, green, and blue,

stage_color = []
for i in np.arange(number_of_fracs):
    color = (1.0, 0.1, 0.1)
    stage_color.append(np.roll(color, i))
stage_color = tuple(map(tuple, stage_color))

Define microseismic points

One approach is to create some dimensions for each frac stage and generate 100 points randomly within each zone. Each frac has an x half-length, y half-length, and z half-length. Let's also vary these randomly for each of the 6 stages. Define the dimensions for each stage:

frac_dims = []
half_extents = [500, 1000, 250]
for i in range(number_of_fracs):
    for j in range(len(half_extents)):
        dim = np.random.rand(3)[j] * half_extents[j]
        frac_dims.append(dim)  
frac_dims = np.reshape(frac_dims, (number_of_fracs, 3))

Plot microseismic point clouds with 100 points for each stage. The following code should launch a 3D viewer scene in its own window:

size_scalar = 100000
mplt.plot3d(x_int, y_int, z_int, tube_radius=10)
for i in range(number_of_fracs):
    x_cloud = frac_dims[i,0] * (np.random.rand(100) - 0.5)
    y_cloud = frac_dims[i,1] * (np.random.rand(100) - 0.5)
    z_cloud = frac_dims[i,2] * (np.random.rand(100) - 0.5)

    x_event = x_frac[i] + x_cloud
    y_event = y_frac[i] + y_cloud     
    z_event = z_frac[i] + z_cloud
    
    # Let's make the size of each point inversely proportional 
    # to the distance from the frac port
    size = size_scalar / ((x_cloud**2 + y_cloud**2 + z_cloud**2)**0.002)
    
    mplt.points3d(x_event, y_event, z_event, size, mode='sphere', colormap='jet')

You can swap out the last line in the code block above with mplt.points3d(x_event, y_event, z_event, size, mode='sphere', color=stage_color[i]) to colour each event by its corresponding stage.

A day of geocomputing

I will be in Calgary in the new year and running a one-day version of this new course. To start building your own tools, pick a date and sign up:

Eventbrite - Agile Geocomputing    Eventbrite - Agile Geocomputing

News of the month

The last news of the year. Here's what caught our eye in December.

Online learning, at a price

There was an online university revolution in 2012 — look for Udacity (our favourite), Coursera, edX, and others. Paradigm, often early to market with good new ideas, launched the Paradigm Online University this month. It's a great idea — but the access arrangement is the usual boring oil-patch story: only customers have access, and they must pay $150/hour — more than most classroom- and field-based courses! Imagine the value-add if it was open to all, or free to customers.

Android apps on your PC

BlueStacks is a remarkable new app for Windows and Mac that allows you to run Google's Android operating system on the desktop. This is potentially awesome news — there are over 500,000 apps on this platform. But it's only potentially awesome because it's still a bit... quirky. I tried running our Volume* and AVO* apps on my Mac and they do work, but they look rubbish. Doubtless the technology will evolve rapidly — watch this space. 

2PFLOPS HPC 4 BP

In March, we mentioned Total's new supercomputer, delivering 2.3 petaflops (quadrillion floating point operations per second). Now BP is building something comparable in Houston, aiming for 2 petaflops and 536 terabytes of RAM. To build it, the company has allocated 0.1 gigadollars to high-performance computing over the next 5 years.

Haralick textures for everyone

Matt wrote about OpendTect's new texture attributes just before Christmas, but the news is so exciting that we wanted to mention it again. It's exciting because Haralick textures are among the most interesting and powerful of multi-trace attributes — right up there with coherency and curvature. Their appearance in the free and open-source core of OpendTect is great news for interpreters.

That's it for 2012... see you in 2013! Happy New Year.

This regular news feature is for information only. We aren't connected with any of these organizations, and don't necessarily endorse their products or services. Except OpendTect, which we definitely do endorse.

Touring vs tunnel vision

My experience with software started, and still largely sits, at the user end. More often than not, interacting with another's design. One thing I have learned from the user experience is that truly great interfaces are engineered to stay out of the way. The interface is only a skin atop the real work that software does underneath — taking inputs, applying operations, producing outputs. I'd say most users of computers don't know how to compute without an interface. I'm trying to break free from that camp. 

In The dangers of default disdain, I wrote about the power and control that the technology designer has over his users. A kind of tunnel is imposed that restricts the choices for interacting with data. And for me, maybe for you as well, the tunnel has been a welcome structure, directing my focus towards that distant point; the narrow aperture invokes at least some forward motion. I've unknowingly embraced the tunnel vision as a means of interacting without substantial choices, without risk, without wavering digressions. I think it's fair to say that without this tunnel, most travellers would find themselves stuck, incapacitated by the hard graft of touring over or around the mountain.

Tour guides instead of tunnels

But there is nothing to do inside the tunnel, no scenery to observe, just a black void between input and output. For some tasks, taking the tunnel is the only obvious and economic choice — all you want is to get stuff done. But choosing the tunnel means you will be missing things along the way. It's a trade off.

For getting from A to B, there are engineers to build tunnels, there are travellers to travel the tunnels, and there is a third kind of person altogether: tour guides take the scenic route. Building your own tunnel is a grand task, only worthwhile if you can find enough passengers to use it. The scenic route isn't just a casual lackadaisical approach. It's necessary for understanding the landscape; by taking it the traveler becomes connected with the territory. The challenge for software and technology companies is to expose people to the richness of their environment while moving them through at an acceptable pace. Is it possible to have a tunnel with windows?

Oil and gas operating companies are good at purchasing the tunnel access pass, but are not very good at building a robust set of tools to navigate the landscape of their data environment. After all, that is the thing that we travellers need to be in constant contact with. Touring or tunneling? The two approaches may or may not arrive at the same destination and they have different costs along the way, making it different business.

What technology?

This is my first contribution to the Accretionary Wedge geology themed community blog. Charles Carrigan over at Earth-like Planet is hosting this months topic where he posts the question, "how do you perceive technology impacting the work that you do?" My perception of technology has matured, and will likely continue to change, but here are a few ways in which technology works for us at Agile. 

My superpower

I was at a session in December where one of the activities was to come up with one (and only one) defining superpower. A comic-bookification of my identity. What is the thing that defines you? The thing that you are or will be known for? It was an awkward experience for most, a bold introspection to quickly pull out a memorable, but not too cheesy, superpower that fit our life. I contemplated my superhuman intelligence, and freakish strength... too immodest. The right choice was invisibility. That's my superpower. Transparency, WYSIWYG, nakedness, openness. And I realize now that my superpower is, not coincidentally, aligned with Agile's approach to technology. 

For some, technology is the conspicuous interface between us and our work. But conspicuous technology constrains your work, ordains it even. The real challenge is to use technology in a way that makes it invisible. Matt reminds me that how I did it isn't as important as what I did. Making the technology seem invisible means the user must be invisible as well. Ultimately, tools don't matter—they should slip away into the whitespace. Successful technology implementation is camouflaged. 

I is for iterate

Technology is not a source of ideas or insights, such as you'd find in the mind of an experienced explorationist or in a detailed cross-section or map. I'm sure you could draw a better map by hand. Technology is only a vehicle that can deliver the mind's inner constructs; it's not a replacement for vision or wisdom. Language or vocabulary has nothing to do with it. Technology is the enabler of iteration. 

So why don't we iterate more in our scientific work? Because it takes too long? Maybe that's true for a hand-drawn contour map, but technology is reducing the burden of iteration. Because we have never been taught humility? Maybe that stems from the way we learned to learn: homework assignments have exact solutions (and are done only once), and re-writing an exam is unheard of (unless you flunked it the first time around).

What about writing an exam twice to demonstrate mastery? What about reading a book twice, in two different ways? Once passively in your head, and once actively—at a slower pace, taking notes. I believe the more ways you can interact with your media, data, or content, the better work will be done. Students assume that the cost required to iterate outweighs the benefits, but that is no longer the case with digital workflows. Embracing technology's capacity to iterate seemlessly and reliably is what a makes a grand impact in our work.

What do we use?

Agile strives to be open as a matter of principle, so when it comes to software we go for open source by default. Matt wrote recently about the applications and workstations that we use. 

News of the month

Welcome to our more-or-less regular new post. Seen something awesome? Get in touch!

Convention time!

Next week is Canada's annual petroleum geoscience party, the CSPGCSEGCWLS GeoConvention. Thousands of applied geoscientists will descend on Calgary's downtown Telus Convention Centre to hear about the latest science and technology in the oilfield, and catch up with old friends. We're sad to be missing out this year — we hope someone out there will be blogging!

GeoConvention highlights

There are more than fifty technical sessions at the conference this year. For what it's worth, these are the presentations we'd be sitting in the front row for if we were going:

Now run to the train and get to the ERCB Core Research Centre for...

Guided fault interpretation

We've seen automated fault interpretation before, and now Transform have an offering too. A strongly tech-focused company, they have a decent shot at making it work in ordinary seismic data — the demo shows a textbook example:

GPU processing on the desktop

On Monday Paradigm announced their adoption of NVIDIA's Maximus technology into their desktop applications. Getting all gooey over graphics cards seems very 2002, but this time it's not about graphics — it's about speed. Reserving a Quadro processor for graphics, Paradigm is computing seismic attributes on a dedicated Tesla graphics processing unit, or GPU, rather than on the central processing unit (CPU). This is cool because GPUs are massively parallel and are much, much faster at certain kinds of computation because they don't have the process management, I/O, and other overheads that CPUs have. This is why seismic processing companies like CGGVeritas are adopting them for imaging. Cutting edge stuff!

In other news...

This regular news feature is for information only. We aren't connected with any of these organizations, and don't necessarily endorse their products or services.