July 30, 2015

On answering questions

July 30, 2015/ Matt Hall

On Tuesday I wrote about asking better questions. One of the easiest ways to ask better questions is to hang back a little. In a lecture, the answer to your question may be imminent. Even if it isn't, some thinking or research will help. It's the same with answering questions. Better to think about the question, and maybe ask clarifying questions, than to jump right in with "Let me explain".

Here's a slightly edited example from Earth Science Stack Exchange:

I suppose natural gas underground caverns on Earth have substantial volume and gas is in gaseous form there. I wonder how it would look like inside such cavern (with artificial light of course). Will one see a rocky sky at big distance?

The first answer was rather terse:

What is a good answer?

This answer, addressing the apparent misunderstanding the OP (original poster) has about gas being predominantly found in caverns, was the first thing that occurred to me too. But it's incomplete, and has other problems:

It's not very patient, and comes across as rather dismissive. Not very welcoming for this new user.
The reference is far from being an appropriate one, and seems to have been chosen randomly.
It only addresses sandstone reservoirs, and even then only 'typical' ones.

In my own answer to the question, I tried to give a more complete answer. I tried to write down my principles, which are somewhat aligned with the advice given on the Stack Exchange site:

Assume the OP is smart and interested. They were smart and curious enough to track down a forum and ask a question that you're interested enough in to answer, so give them some credit.
No bluffing! If you find yourself typing something like, "I don't know a lot about this, but..." then stop writing immediately. Instead, send the question to someone you know that can give a better answer then you.
If possible, answer directly and clearly in the first sentence. I usually write it in bold. This should be the closest you can get to a one-word answer, especially if it was a direct question.
Illustrate the answer with an example. A picture or a numerical example — if possible with working code in an accessible, open source language — go a long way to helping someone get further.
Be brief but thorough. Round out your answer with some different angles on the question, especially if there's nuance in your answer. There's no need for an essay, so instead give links and references if the OP wants to know more.
Make connections. If there are people in your community or organization who should be connected, connect them.

It's remarkable how much effort people are willing to put into a great answer. A question about detecting dog paw-prints on a pressure pad, posted to the programming community Stack Overflow, elicited some great answers.

The thread didn't end there. Check out these two answers by Joe Kington, a programmer–geoscientist in Houston:

One epic answer with code and animated GIFs, showing how to make a time-series of pawprints.
A second answer, with more code, introducing the concept of eigenpaws to improve paw recognition.

A final tip: writing informative answers might be best done on Wikipedia or your corporate wiki. Instead of writing a long response to the post, think about writing it somewhere more accessible, and instead posting a link to your answer.

What do you think makes a good answer to a question? Have you ever received an answer that went beyond helpful?

July 28, 2015

On asking questions

July 28, 2015/ Matt Hall

If I had only one hour to solve a problem, I would spend up to two-thirds of that hour in attempting to define what the problem is. — Anonymous Yale professor (often wrongly attributed to Einstein)

Asking questions is a core skill for professionals. Asking questions to know, to understand, to probe, to test. Anyone can feel exposed asking questions, because they feel like they should know or understand already. If novices and 'experts' alike have trouble asking questions, if your community or organization does not foster a culture of asking, then there's a problem.

What is a good question?

There are naive questions, tedious questions, ill-phrased questions, questions put after inadequate self-criticism. But every question is a cry to understand the world. There is no such thing as a dumb question. — Carl Sagan

Asking good questions is the best way to avoid the problem of feeling silly or — worse — being thought silly. Here are some tips from my experience in Q&A forums at work and on the Internet:

Do some research. Go beyond a quick Google search — try Google Scholar, ask one or two colleagues for help, look in the index of a couple of books. If you have time, stew on it for a day or two. Do enough to make sure the answer isn't widely known or trivial to find. Once you've decided to ask a network...
Ask your question in the right forum. You will save yourself a lot of time by going taking the trouble to find the right place — the place where the people most likely to be able to help you are. Avoid the shotgun approach: it's not considered good form to cross-post in multiple related forums.
Make the subject or headline a direct question, with some relevant detail. This is how most people will see your question and decide whether to even read the rest of it. So "Help please" or "Interpretation question" are hopeless. Much better is something like "How do I choose seismic attribute parameters?" or "What does 'replacement velocity' mean?".
Provide some detail, and ideally an image. A bit of background helps. If you have a software or programming problem, just enough information needed to reproduce the problem is critical. Tell people what you've read and where your assumptions are coming from. Tell people what you think is going on.
Manage the question. Make sure early comments or answers seem to get your drift. Edit your question or respond to comments to help people help you. Follow up with new questions if you need clarification, but make a whole new thread if you're moving into new territory. When you have your answer, thank those who helped you and make it clear if and how your problem was solved. If you solved your own problem, post your own answer. Let the community know what happened in the end.

If you really want to cultivate your skills of inquiry, here is some more writing on the subject...

How do I ask a good question? — advice from the world's biggest Q&A site
How to ask questions the smart way — by open-source hacker Eric Raymond
Ten tips for asking good questions — from For Dummies
How to be amazingly good at asking questions — by Mike Martel
How to cultivate the art of asking good questions — Warren Berger, A More Beautiful Question

Supply and demand

Knowledge sharing networks like Stack Exchange, or whatever you use at work, often focus too much on answers. Capturing lessons learned, for example. But you can't just push knowledge at people — the supply and demand equation has two sides — there has to be a pull too. The pull comes from questions, and an organization or community that pulls, learns.

Do you ask questions on knowledge networks? Do you have any advice for the curious?

Don't miss the next post, On answering questions.

July 21, 2015

Seismic inception

July 21, 2015/ Matt Hall

A month ago, some engineers at Google blogged about how they had turned a deep learning network in on itself and produced some fascinating and/or disturbing images:

One of the images produced by the team at Google. Click to see a larger version. Read more. CC-BY.

The basic recipe, which Google later open sourced, involves training a deep learning network (basically a multi-layer neural network) on some labeled images, animals maybe, then searching for matching patterns in a target image, like these clouds. If it finds something, it emphasizes it — given the data, it tries to construct an animal. Then do it again.

Or, here's how a Google programmer puts it (one of my favourite sentences ever)...

Making the "dream" images is very simple. Essentially it is just a gradient ascent process that tries to maximize the L2 norm of activations of a particular DNN layer.

That's all! Anyway, the point is that you get utter weirdness:

OK, cool... what happens if you feed it seismic?

That was my first thought, I'm sure it was yours too. The second thing I thought, and the third, and the fourth, was: wow, this software is hard to compile. I spent an unreasonable amount of time getting caffe, the Berkeley Vision & Learning Centre's deep learning software, working. But on Friday I cracked it, so today I got to satisfy my curiosity.

The short answer is: reptiles. These weirdos were 8 levels down, which takes about 20 minutes to reach on my iMac.

Seismic data from the Virtual Seismic Atlas, courtesy of Fugro. — Seismic data from the Virtual Seismic Atlas, courtesy of Fugro.

THE DEEPDREAM TREATMENT. Mostly reptiles.

Er, right... what's the point in all this?

That's a good question. It's just a bit of fun really. But it makes you wonder:

What if we train the network on seismic facies? I think this could be very interesting.
Better yet, what if we train it on geology? Probably spurious: seismic is not geology.
Does this mean learning networks are just dumb machines, or can they see more than us? Tough one — human vision is highly fallible. There are endless illusions to prove this. But computers only do what we tell them, at least for now. I think if we're careful what we ask for, we can use these highly non-linear data-crunching algorithms for good.
Are we out of a job? Definitely not. How do you think machines will know what to learn? The challenge here is to make this work, and then figure out how it can help change, or at least accelerate, our understanding of the subsurface.

This deep learning stuff — of which the University of Toronto was a major pioneer during its emergence in about 2010 — is part of the machine learning revolution that you are, like it or not, experiencing. It will take time, and it will make awful mistakes, but the indications are that machine learning will eat every analytical method for breakfast. Customer behaviour prediction, computer vision, natural language processing, all this stuff is reeling from the relatively sudden and widespread availability of inexpensive computer intelligence.

So what are we going to do with that?

Okay, one more. from Paige Bailey's Twitter feed.

July 15, 2015

Ask your employer about being more awesome

July 15, 2015/ Matt Hall

Open source software needs money to survive. If you work at a corporation with a positive bottom line, and you use open source software to help you maintain it, I'd urge you to consider asking your organization to help out. You can't imagine the difference it makes — these projects take serious resources to run: server hardware, infrastructure maintenance, professional developers, research and development, legal and marketing functions, educational outreach, work in developing countries,... just like commercial, closed-source, black-or-at-least-dark-grey-box software.

(Come to think of it, the only thing they don't have is sales personnel driving to golf courses in a BMW 5 series. How many of those have you paid for with those license fees?)

Which projects need your company's help?

There are some fundamental projects, but they tend to be quite well funded already, both financially and in-kind. For example, software engineers at companies like IBM and Google make substantial contributions to the Linux kernel. Still, your company definitely depends on technology from the following projects:

The Linux Foundation — responsible for the kernel of the Linux operating system.
Free Software Foundation — the umbrella for a ridiculous number of software tools.
The Apache Foundation — maintainers of the eponymous web server, and forerunners of the ongoing big data and machine learning revolutions and the tools that power them.

These higher-level projects are closer to my heart, and do great working supporting the work of scientists:

The Mozilla Foundation — check out the Mozilla Science Lab and Software Carpentry
The WikiMedia Foundation — for Wikipedia, and the MediaWiki software that powers it (as well as AAPG's and SEG's wikis)
NumFOCUS Foundation — all the better to help you wield scientific Python!

If money really isn't an option, consider working somewhere where it is an option. If that's not an option either, then there are plenty of other ways to make a difference:

Use and champion open source software at your place of work.
Submit tickets for the software you use, and engage with the community.
If you can code, submit patches, documentation, or whatever you can.

Now, if we only had an Open Geoscience Foundation to help fund projects in geoscience...

July 10, 2015

Software, stats, and tidal energy

July 10, 2015/ Matt Hall

Today was the last day of the conference part of SciPy 2015 in Austin. Almost all the talks at this conference have been inspiring and/or enlightening. This makes it all the more wonderful that the organizers get the talks online within a couple of hours (!), so you can see everything (compared to about 5% maximum coverage at SEG).

Jake Vanderplas, a young astronomer and data scientist at UW's eScience Institute, gave the keynote this morning. He eloquently reviewed the history and state-of-the-art of the so-called SciPy stack, the collection of tools that Pythonistic scientists use to get their research done. If you're just getting started in this world, it's about the best intro you could ask for:

Chris Fonnesbeck treated the room to what might as well have been a second keynote, so well did he express his convictions. Beautiful slides, and a big message: statistics matters.

Kristen Thyng, an energetic contributor to the conference, gave a fantastic talk about tidal energy, her main field, as well as one about perceptual colourmaps, which is more of a hobby. The work includes some very nice visualizations of tidal currents in my home province...

Finally, I highly recommend watching the lightning talks. Apart from being filled with some mind-blowing ideas, many of them eliciting spontaneous applause (imagine that!), I doubt you will ever witness a more effective exercise in building a community of passionate professionals. It's remarkable. (If you don't have an hour these three are awesome.)

Next we'll be enjoying the 'sprints', a weekend of coding on open source projects. We'll be back to geophysics blogging next week :)

July 10, 2015

Geophysics at SciPy 2015

July 10, 2015/ Matt Hall

Yesterday was the geoscience day at SciPy 2015 in Austin.

At lunchtime, Paige Bailey (Chevron) organized a Birds of a Feather on GIS. This was a much-needed meetup for anyone interested in spatial data. It was useful to hear about the tools the fifty-or-so participants use every day, and a great chance to air some frustrations like Why is it so hard to install a geospatial stack? And questions like How do people make attractive maps with the toolset?

One way to make attractive maps is go beyond the screen and 3D print them. Almost any subsurface dataset could seem more tangible and believable as a 3D object, and Joe Kington (Chevron) showed us how to make data into objects. Just watch:

Matteus Ueckermann followed up with some virtual elevation models, showing how Python can process not just a few tiles of data, but can handle hydrology modeling for the entire world:

Nicola Creati (OGS, Trieste) showed us the PyGmod package, a new and fully parallel geodynamic simulation tool for HPC nuts. So now you can make more plate tectonic models before most people are out of bed!

We also heard from Lindsey Heagy and Gudnir Rosenkjaer from UBC, talking about various applications of Rowan Cockett's awesome SimPEG package to their work. As at the hackathon in Denver, it's very clear that this group's investment in and passion for a well-architected, integrated package is well worth the work, giving everyone who works with it superpowers. And, as we all know, superpowers are awesome. Especially geophysical ones.

Last up, I talked about striplog, a small package for handling interval and point data in logs, core, and other 1D datasets. It's still very immature, but almost ready for real-world users, so if you think you have a use case, I'd love to hear from you.

Today is the last day of the conference part, before we head into the coding sprints tomorrow. Stay tuned for more, or follow the #scipy2015 hashtag to keep up. See all the videos, which go up almost right after talks, on YouTube.

July 09, 2015

You'd better read this

July 09, 2015/ Evan Bianco

The clean white front cover of this month's Bloomberg Businessweek carries a few lines of Python code, and two lines of English as a footnote... If you can't read that, then you'd better read this. The entire issue is a single essay written by Paul Ford. It was an impeccable coincidence: I picked up a copy before boarding the plane to Austin for SciPy 2015. This issue is a grand achievement; it could be the best thing I've ever read. Go out an buy as many copies as you can, and give them to your friends. Or read it online right now.

Not your grandfather's notebook

Jess Hamrick is a cognitive scientist at UC Berkeley who makes computational models of human behaviour. In her talk, she described how she built a multi-user server for Jupyter notebooks to administer course content, assign homework, even do auto-grading for a class with 220 undergrads. During her talk, she invited the audience to list their GitHub usernames on an Etherpad. Minutes after she stood down from her podium, she granted access, so we could all come inside and see how it was done.

Dangerous defaults

I wrote a while ago about the dangers of defaults, and as Matteo Niccoli highlighted in his 52 Things essay, How to choose a colourmap, default colourmaps can be especially harmful. Matplotlib has long been criticized for its nasty default colourmap, but today redeemed itself with a new default. Hear all about it from Stefan van der Walt:

Sound advice

Allen Downey of Olin College gave a wonderful talk this afternoon about teaching digital signal processing to students using fun and intuitive audio signals as the hook. Watch it yourself, it's well worth the 20 minutes or so:

If you're really into musical and audio applications, there was another talk on the subject, by Brian McFee (Librosa project).

More tomorrow as we head into Day 2 of the conference.

July 02, 2015

Attribute analysis and statistics

July 02, 2015/ Matt Hall

Last week I wrote a basic introduction to attribute analysis. The post focused on the different ways of thinking about sampling and intervals, and on how instantaneous attributes have to be interpolated from the discrete data. This week, I want to look more closely at those interval attributes. We'd often like to summarize the attributes of an interval info a single number, perhaps to make a map.

Before thinking about amplitudes and seismic traces, it's worth reminding ourselves about different kinds of average. This table from SubSurfWiki might help...

A peculiar feature of seismic data. from a statistical point of view, is the lack of the very low frequencies needed to give it a trend. Because of this, it oscillates around zero, so the average amplitude over a window tends to zero — seismic data has a mean value of zero. So not only do we have to think about interpolation issues when we extract attributes, we also have to think about statistics.

Fortunately, once we understand the issue it's easy to come up with ways around it. Look at the trace (black line) below:

The mean is, as expected, close to zero. So I've applied some other statistics to represent the amplitude values, shown as black dots, in the window (the length of the plot):

Average absolute amplitude (light green) — treat all values as positive and take the mean.
Root-mean-square amplitude (dark green) — tends to emphasize large values, so it's a bit higher.
Average energy (magenta) — the mean of the magnitude of the complex trace, or the envelope, shown in grey.
Maximum amplitude (blue) — the absolute maximum value encountered, which is higher than the actual sample values (which are all integers in this fake dataset) because of interpolation.
Maximum energy (purple) — the maximum value of the envelope, which is higher still because it is phase independent.

There are other statistics besides these, of course. We could compute the median average, or some other mean. We could take the strongest trough, or the maximum derivative (steepest slope). The options are really only limited by your imagination, and the physical relationship with geology that you expect.

We'll return to this series over the summer, asking questions like How do you know what to expect? and Does a physically realistic relationship even matter?

To view and run the code that I used in creating the figures for this post, grab the iPython/Jupyter Notebook.

Blog