January 31, 2014

January linkfest

January 31, 2014/ Matt Hall

Time for the quarterly linkfest! Got stories for next time? Contact us.

BP's new supercomputer, reportedly capable of about 2.2 petaflops, is about as fast as Total's Pangea machine in Paris, which booted up almost a year ago. These machines are pretty amazing — Pangea has over 110,000 cores, and 442 terabytes of memory — but BP claims to have bested that with 1 petabyte of RAM. Remarkable.

Leo Uieda's open-source modeling tool Fatiando a Terra got an upgrade recently and hit version 0.2. Here's Leo himself demonstrating a forward seismic model:

I'm a geoscientst, get me out of here is a fun-sounding new educational program from the European Geosciences Union, which has recently been the very model of a progressive technical society (along with the AGU is another great example). It's based on the British outreach program, I'm a scientist, get me out of here, and if you're an EGU member (or want to be), I think you should go for it! The deadline: 17 March, St Patrick's Day.

Darren Wilkinson writes a great blog about some of the geekier aspects of geoscience. You should add it to your reader (I'm using The Old Reader to keep up with blogs since Google Reader was marched out of the building). He wrote recently about this cool tool — an iPad controller for desktop apps. I have yet to try it, but it seems a good fit for tools like ArcGIS, Adobe Illustrator.

Speaking of big software, check out Joe Kington's Python library for GeoProbe volumes — I wish I'd had this a few years ago. Brilliant.

And speaking of cool tools, check out this great new book by technology commentator and philosopher Kevin Kelly. Self-published and crowd-sourced... and drawn from his blog, which you can obviously read online if you don't like paper.

If you're in Atlantic Canada, and coming to the Colloquium next weekend, you might like to know about the wikithon on Sunday 9 February. We'll be looking for articles relevant to geoscientists in Atlantic Canada to improve. Tim Sherry offers some inspiration. I would tell you about Evan's geocomputing course too... but it's sold out.

Heard about any cool geostuff lately? Let us know in the comments.

January 29, 2014

6 questions about seismic interpretation

January 29, 2014/ Evan Bianco

This interview is part of a series of conversations between Satinder Chopra and the authors of the book 52 Things You Should Know About Geophysics (Agile Libre, 2012). The first three appeared in the October 2013 issue of the CSEG Recorder, the Canadian applied geophysics magazine, which graciously agreed to publish them under a CC-BY license.

Satinder Chopra: Seismic data contain massive amounts of information, which has to be extracted using the right tools and knowhow, a task usually entrusted to the seismic interpreter. This would entail isolating the anomalous patterns on the wiggles and understanding the implied subsurface properties, etc. What do you think are the challenges for a seismic interpreter?

Evan Bianco: The challenge is to not lose anything in the abstraction.

The notion that we take terabytes of prestack data, migrate it into gigabyte-sized cubes, and reduce that further to digitized surfaces that are hundreds of kilobytes in size, sounds like a dangerous discarding of information. That's at least 6 orders of magnitude! The challenge for the interpreter, then, is to be darn sure that this is all you need out of your data, and if it isn't (and it probably isn't), knowing how to go back for more.

SC: How do you think some these challenges can be addressed?

EB: I have a big vision and a small vision. Both have to do with documentation and record keeping. If you imagine the entire seismic experiment upon a sort of conceptual mixing board, instead of as a linear sequence of steps, elements could be revisited and modified at any time. In theory nothing would be lost in translation. The connections between inputs and outputs could be maintained, even studied, all in place. In that view, the configuration of the mixing board itself becomes a comprehensive and complete history for the data — what's been done to it, and what has been extracted from it.

The smaller vision: there are plenty of data management solutions for geospatial information, but broadcasting the context that we bring to bear is a whole other challenge. Any tool that allows people to preserve the link between data and model should be used to transfer the implicit along with the explicit. Take auto-tracking a horizon as an example. It would be valuable if an interpreter could embed some context into an object while digitizing. Something that could later inform the geocellular modeler to proceed with caution or certainty.

SC: One of the important tasks that a seismic interpreter faces is the prediction about the location of the hydrocarbons in the subsurface. Having come up with a hypothesis, how do you think this can be made more convincing and presented to fellow colleagues?

EB: Coming up with a hypothesis (that is, a model) is solving an inverse problem. So there is a lot of convincing power in completing the loop. If all you have done is the inverse problem, know that you could go further. There are a lot of service companies who are in the business of solving inverse problems, not so many completing the loop with the forward problem. It's the only way to test hypotheses without a drill bit, and gives a better handle on methodological and technological limitations.

SC: You mention "absolving us of responsibility" in your article. Could you elaborate on this a little more? Do you think there is accountability of sorts practiced in our industry?

EB: I see accountability from a data-centric perspective. For example, think of all the ways that a digitized fault plane can be used. It could become a polygon cutting through a surface on map. It could be a wall within a geocellular model. It could be a node in a drilling prognosis. Now, if the fault is mis-picked by even one bin, this could show up hundreds of metres away, depending on the dip of the fault, compared to the prognosis. Practically speaking, accounting for mismatches like this is hard, and is usually done in an ad hoc way, if at all. What caused the error? Was it the migration or was it the picking? Or what about the error in the measurement of the drill-bit? I think accountability is loosely practised at best because we don't know how to reconcile all these competing errors.

Until data can have a memory, being accountable means being diligent with documentation. But it is time-consuming, and there aren’t as many standards as there are data formats.

SC: Declaring your work to be in progress could allow you to embrace iteration. I like that. However, there is usually a finite time to complete a given interpretation task; but as more and more wells are drilled, the interpretation could be updated. Do you think this practice would suit small companies that need to ensure each new well is productive or they are doomed?

EB: The size of the company shouldn't have anything to do with it. Iteration is something that needs to happen after you get new information. The question is not, "do I need to iterate now that we have drilled a few more wells?", but "how does this new information change my previous work?" Perhaps the interpretation was too rigid — too precise — to begin with. If the interpreter sees her work as something that evolves towards a more complete picture, she needn't be afraid of changing her mind if new information proves us to be incorrect. Depth migration, for example, exemplifies this approach. Hopefully more conceptual and qualitative aspects of subsurface work can adopt it as well.

SC: The present day workflows for seismic interpretation for unconventional resources demand more than the usual practices followed for the conventional exploration and development. Could you comment on how these are changing?

EB: With unconventionals, seismic interpreters are looking for different things. They aren't looking for reservoirs, they are looking for suitable locations to create reservoirs. Seismic technologies that estimate the state of stress will become increasingly important, and interpreters will need to work in close contact to geomechanics. Also, microseismic monitoring and time-lapse technologies tend to push interpreters into the thick of the operations, which allow them to study how the properties of the earth change according to operations. What a perfect place for iterative workflows.

You can read the other interviews and Evan's essay in the magazine, or buy the book! (You'll find it in Amazon's stores too.) It's a great introduction to who applied geophysicists are, and what sort of problems they work on. Read more about it.

Join CSEG to catch more of these interviews as they come out.

January 23, 2014

Save the samples

January 23, 2014/ Matt Hall

A long while ago I wrote about how to choose an image format, and then followed that up with a look at vector vs raster graphics. Today I wanted to revisit rasters (you might think of them as bitmaps, images, or photographs). Because a question that seems to come up a lot is 'what resolution should my images be?'

Forget DPI

When writing for print, it is common to be asked for a certain number of dots per inch, or dpi (or, equivalently, pixels per inch or ppi). For example, I've been asked by journal editors for images 'at least 200 dpi'. However, image files do not have an inherent resolution — they only have pixels. The resolution depends on the reproduction size you choose. So, if your image is 800 pixels wide, and will be reproduced in a 2-inch-wide column of print, then the final image is 400 dpi, and adequate for any purpose. The same image, however, will look horrible at 4 dpi on a 16-foot-wide projection screen.

Rule of thumb: for an ordinary computer screen or projector, aim for enough pixels to give about 100 pixels per display inch. For print purposes, or for hi-res mobile devices, aim for about 300 ppi. If it really matters, or your printer is especially good, you are safer with 600 ppi.

The effect of reducing the number of pixels in an image is more obvious in images with a lot of edges. It's clear in the example that downsampling a sharp image (a to c) is much more obvious than downsampling the same image after smoothing it with a 25-pixel Gaussian filter (b to d). In this example, the top images have 512 × 512 samples, and the downsampled ones underneath have only 1% of the information, at 51 × 51 samples (downsampling is a type of lossy compression).

Careful with those screenshots

The other conundrum is how to get an image of, say, a seismic section or a map.

What could be easier than a quick grab of your window? Well, often it just doesn't cut it, especially for data. Remember that you're only grabbing the pixels on the screen — if your monitor is small (or perhaps you're using a non-HD projector), or the window is small, then there aren't many pixels to grab. If you can, try to avoid a screengrab by exporting an image from one of the application's menus.

For seismic data, you'd like to capture sample as a pixel. This is not possible for very long or deep lines, because they don't fit on your screen. Since CGM files are the devil's work, I've used SEGY2ASCII (USGS Open File 2005–1311) with good results, converting the result to a PGM file and loading into Gimp.

Large seismic lines are hard to capture without decimating the data. Rockall Basin. Image: BGS + Virtual Seismic Atlas.If you have no choice, make the image as large as possible. For example, if you're grabbing a view from your browser, maximize the window, turn off the bookmarks and other junk, and get as many pixels as you can. If you're really stuck, grab two or more views and stitch them together in Gimp or Inkscape.

When you've got the view you want, crop the window junk that no-one wants to see (frames, icons, menus, etc.) and save as a PNG. Then bring the image into a vector graphics editor, and add scales, colourbars, labels, annotation, and other details. My advice is to do this right away, before you forget. The number of times I've had to go and grab a screenshot again because I forgot the colourbar...

The Lenna image is from Hall, M (2006). Resolution and uncertainty in spectral decomposition. First Break 24, December 2006, p 43-47.

January 15, 2014

What is the Gabor uncertainty principle?

January 15, 2014/ Matt Hall

This post is adapted from the introduction to my article Hall, M (2006), Resolution and uncertainty in spectral decomposition. First Break 24, December 2006. DOI: 10.3997/1365-2397.2006027. I'm planning to delve into this a bit, partly as a way to get up to speed on signal processing in Python. Stay tuned.

Spectral decomposition is a powerful way to get more from seismic reflection data, unweaving the seismic rainbow.There are lots of ways of doing it — short-time Fourier transform, S transform, wavelet transforms, and so on. If you hang around spectral decomposition bods, you'll hear frequent mention of the ‘resolution’ of the various techniques. Perhaps surprisingly, Heisenberg’s uncertainty principle is sometimes cited as a basis for one technique having better resolution than another. Cool! But... what on earth has quantum theory got to do with it?

A property of nature

Heisenberg’s uncertainty principle is a consequence of the classical Cauchy–Schwartz inequality and is one of the cornerstones of quantum theory. Here’s how he put it:

At the instant of time when the position is determined, that is, at the instant when the photon is scattered by the electron, the electron undergoes a discontinuous change in momen- tum. This change is the greater the smaller the wavelength of the light employed, i.e. the more exact the determination of the position. At the instant at which the position of the electron is known, its momentum therefore can be known only up to magnitudes which correspond to that discontinuous change; thus, the more precisely the position is determined, the less precisely the momentum is known, and conversely. — Heisenberg (1927), p 174-5.

The most important thing about the uncertainty principle is that, while it was originally expressed in terms of observation and measurement, it is not a consequence of any limitations of our measuring equipment or the mathematics we use to describe our results. The uncertainty principle does not limit what we can know, it describes the way things actually are: an electron does not possess arbitrarily precise position and momentum simultaneously. This troubling insight is the heart of the so-called Copenhagen Interpretation of quantum theory, which Einstein was so famously upset by (and wrong about).

Dennis Gabor (1946), inventor of the hologram, was the first to realize that the uncertainty principle applies to signals. Thanks to wave-particle duality, signals turn out to be exactly analogous to quantum systems. As a result, the exact time and frequency of a signal can never be known simultaneously: a signal cannot plot as a point on the time-frequency plane. Crucially, this uncertainty is a property of signals, not a limitation of mathematics.

Getting quantitative

You know we like the numbers. Heisenberg’s uncertainty principle is usually written in terms of the standard deviation of position σ_x, the standard deviation of momentum σ_p, and the Planck constant h:

In other words, the product of the uncertainties of position and momentum is small, but not zero. For signals, we don't need Planck’s constant to scale the relationship to quantum dimensions, but the form is the same. If the standard deviations of the time and frequency estimates are σ_t and σ_f respectively, then we can write Gabor’s uncertainty principle thus:

So the product of the standard deviations of time, in milliseconds, and frequency, in Hertz, must be at least 80 ms.Hz, or millicycles. (A millicycle is a sort of bicycle, but with 1000 wheels.)

The bottom line

Signals do not have arbitrarily precise time and frequency localization. It doesn’t matter how you compute a spectrum, if you want time information, you must pay for it with frequency information. Specifically, the product of time uncertainty and frequency uncertainty must be at least 1/4π. So how certain is your decomposition?

References

Heisenberg, W (1927). Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik, Zeitschrift für Physik 43, 172–198. English translation: Quantum Theory and Measurement, J. Wheeler and H. Zurek (1983). Princeton University Press, Princeton.

Gabor, D (1946). Theory of communication. Journal of the Institute of Electrical Engineering 93, 429–457.

The image of Werner Heisenberg in 1927, at the age of 25, is public domain as far as I can tell. The low res image of First Break is fair use. The bird hologram is form a photograph licensed CC-BY by Flickr user Dominic Alves.

January 09, 2014

Try an outernship

January 09, 2014/ Matt Hall

In my experience, consortiums under-deliver. We can get the best of both worlds by making the industry–academia interface more permeable.

At one of my clients, I have the pleasure of working with two smart, energetic young geologists. One recently finished, and the other recently started, a 14-month super-internship. Neither one had more than a BSc in geology when they started, and both are going on to do a postgraduate degree after they finish with this multinational petroleum company.

This is 100% brilliant — for them and for the company. After this gap-year-on-steroids, what they accomplish in their postgraduate studies will be that much more relevant, to them, to industry, and to the science. And corporate life, the good bits anyway, can teach smart and energetic people about time management, communication, and collaboration. So by holding back for a year, I think they've actually got a head-start.

The academia–industry interface

Chatting to these young professionals, it struck me that there's a bigger picture. Industry could get much better at interfacing with academia. Today, it tends to happen at a few key relationships, in recruitment, and in a few long-lasting joint industry projects (often referred to as JIPs or consortiums). Most of these interactions happen on an annual timescale, and strictly via presentations and research reports. In a distributed company, most of the relationships are through R&D or corporate headquarters, so the benefits to the other 75% or more of the company are quite limited.

Less secrecy, free the data! This worksheet is from the Unsolved Problems Unsession in 2013.Instead, I think the interface should be more permeable and dynamic. I've sat through several JIP meetings as researchers have shown work of dubious relevance, using poor or incomplete data, with little understanding of the implications or practical possibilities of their insights. This isn't their fault — the petroleum industry sucks at sharing its goals, methods, uncertainties, and data (a great unsolved problem!).

Increasing permeability

Here's my solution: ordinary human collaboration. Send researchers to intern alongside industry scientists for a month or two. Let them experience the incredible data and the difficult problems first hand. But don't stop there. Send the industry scientists to outern (yes, that is probably a word) alongside the academics, even if only for a week or two. Let them experience the freedom of sitting in a laboratory playground all day, working on problems with brilliant researchers. Let's help people help each other with real side-by-side collaboration, building trust and understanding in the process. A boring JIP meeting once a year is not knowledge sharing.

Have you seen good examples of industry, government, or academia striving for more permeability? How do the high-functioning JIPs do it? Let us know in the comments.

If you liked this, check out some of my other posts on collaboration and knowledge sharing...

Expert culture is bad for you

Capturing conferences

Proceedings of an unsession

January 07, 2014

Ternary diagrams

January 07, 2014/ Matt Hall

I like spectrums (or spectra, if you must). It's not just because I like signals and Fourier transforms, or because I think frequency content is the most under-appreciated attribute of seismic data. They're also an important thinking tool. They represent a continuum between two end-member states, both rare or unlikely; in between there are shades of ambiguity, and this is usually where nature lives.

Take the sport–game continuum. Sports are pure competition — a test of strength and endurance, with few rules and unequivocal outcomes. Surely marathon running is pure sport. Contrast that with a pure game, like darts: no fitness, pure technique. (Establishing where various pastimes lie on this continuum is a good way to start an argument in a pub.)

There's a science purity continuum too, with mathematics at one end and social sciences somewhere near the other. I wonder where geology and geophysics lie...

Degrees of freedom

The thing about a spectrum is that it's two-dimensional, like a scatter plot, but it has only one degree of freedom, so we can map it onto one dimension: a line.

The three-dimensional equivalent of the spectrum is the ternary diagram: 3-parameter space mapped onto 2D. Not a projection, like a 3D scatter plot, because there are only two degrees of freedom — the parameters of a ternary diagram cannot be independent. This works well for volume fractions, which must sum to one. Hence their popularity for the results of point-count data, like this Folk classification from Hulk & Heubeck (2010).

We can go a step further, natch. You can always go a step further. How about four parameters with three degrees of freedom mapped onto a tetrahedron? Fun to make, not so fun to look at. But not as bad as a pentachoron.

How to make one

The only tools I've used on the battlefield, so to speak are Trinity, for ternary plots, and TetLab, for tetrahedrons (yes, I went there), both Mac OS X only, and both from Peter Appel of Christian-Albrechts-Universität zu Kiel. But there are more...

A great many tools are discussed in this StackOverflow thread, mostly using R and Python.
Of those tools, ggtern (in R) seems to have a lot of features, like contours and density maps.
For Python, WxTernary and Veusz look like they may be the best choices.
There's a script here too, but it looks a little dated. Ripe for rescuing in a Notebook...
If you are stuck in Excel (we can help!), there may be options for you too.
If you like pencils, there's a printable template on SubSurfWiki.

Do you use ternary plots, or are they nothing more than a cute way to show some boring data? How do you make them? Care to share any?

The cartoon is from xkcd.com, licensed CC-BY-NC. The example diagram and example data are from Hulka, C and C Heubeck (2010). Composition and provenance history of Late Cenozoic sediments in southeastern Bolivia: Implications for Chaco foreland basin evolution and Andean uplift. Journal of Sedimentary Research 80, 288–299. DOI: 10.2110/jsr.2010.029 and available online from the authors.

Blog