Is your data digital or just pseudodigital?

pseudodigital_analog.png

A rite of passage for a geologist is the making of an original geological map, starting from scratch. In the UK, this is known as the ‘independent mapping project’ and is usually done at the end of the second year of an undergrad degree. I did mine on the eastern shore of the Embalse de Santa Ana, just north of Alfarras in Catalunya, Spain. (I wrote all about it back in 2012.)

The map I drew was about as analog as you can get. I drew it with Rotring Rapidograph pens on drafting film. Mistakes had to be painstakingly scraped away with a razor blade. Colour had to be added in pencil after the map had been transferred onto paper. There is only one map in existence. The data is gone. It is absolutely unreproducible.

pseudodigital_palaeo.png

Digitize!

In order to show you the map, I had to digitize it. This word makes it sound like the map is now ‘digital data’, but it’s really not useful for anything scientific. In other words, while it is ‘digital’ in the loosest sense — it’s a bunch of binary bits in the cloud — it is not digital in the sense of organized data elements with semantic meaning. Let’s call this non-useful format palaeodigital. The lowest rung on the digital ladder.

You can get palaeodigital files from many state and national data repositories. For example, it’s how the Government of Nova Scotia stores its offshore seismic ‘data’ files — as TIFF files representing scans of paper sections submitted by operators. Wiggle trace, obviously, making them almost completely useless.

pseudodigital_proto.png

Protodigital

Nobody draws map by hand anymore, that would be crazy. Adobe Illustrator and (better) Inkscape mean we can produce beautifully rendered maps with about the same amount of effort as the hand-drawn version. But… this still isn’t digital. This is nothing more than a computerized rip-off of the analog workflow. The result is almost as static and difficult to edit as it was on film. (Wish you’d used a thicker line for your fault traces on those 20 maps? Have fun editing those files!)

Let’s call the computerization of analog workflows or artifacts protodigital. I’m thinking of Word and Powerpoint. Email. SeisWorks. Techlog. We can think of data in the same way… LAS files are really just a text-file manifestation of a composite log (plus their headers are often garbage). SEG-Y is nothing more than a bunch of traces with a sidelabel.


Together, palaeodigital and protodigital data might be called pseudodigital. They look digital, but they’re not quite there.

(Just to be clear, I made all these words up. They are definitely silly… but the point is that there’s a lot of room between analog and useful, machine-learning-ready digital.)


pseudodigital_digital.png

Digital data

So what’s at the top of the digital ladder? In the case of maps, it’s shapefiles or, better yet, GeoJSON. In these files, objects are described in terms of real geographic parameters, such at latitiude and longitude. The file contains the CRS (you know you need that, right?) and other things you might need like units, data provenance, attributes, and so on.

What makes these things truly digital? I think the following things are important:

  • They can all be self-documenting

  • …and can carry more or less arbitrary amounts of metadata.

  • They depend on open formats, some text and some binary, that are widely used.

  • There is free, open-source tooling for reading and writing these formats, usually with reference implementations in major languages (e.g. C/C++, Python, Java).

  • They are composable. Without too much trouble, you could write a script to process batches of these files, adapting to their content and context.

Here’s how non-digital versions of a document, e.g. a scholoarly article, compare to digital data:

pseudodigital_document.png

And pseudodigital well logs:

pseudodigital_log.png

Some more examples:

  • Photographs with EXIF data and geolocation.

  • GIS tools like QGIS let us make beautiful maps with data.

  • Drawing striplogs with a data-driven tool like Python striplog.

  • A fully-labeled HDF5 file containing QC’d, machine-learning-ready well logs.

  • Structured, metadata-rich documents, perhaps in JSON format.

Watch out for pseudodigital

Why does all this matter? It matters because we need digital data before we can do any analysis, or any machine learning. If you give me pseudodigital data for a project, I’m going to spend at least 50% of my time, probably more, making it digital before I can even get started. So before embarking on a machine learning project, you really, really need to know what you’re dealing with: digital or just pseudodigital?

TRANSFORM happened!

transform_sticker.jpg

How do you describe the indescribable?

Last week, Agile hosted the TRANSFORM unconference in Normandy, France. We were there to talk about the open suburface stack — the collection of open-source Python tools for earth scientists. We also spent time on the state of the Software Underground, a global community of practice for digital subsurface scientists and engineers. In effect, this was the first annual Software Underground conference. This was SwungCon 1.

The space

I knew the Château de Rosay was going to be nice. I hoped it was going to be very nice. But it wasn’t either of those things. It exceeded expectations by such a large margin, it seemed a little… indulgent, Excessive even. And yet it was cheaper than a Hilton, and you couldn’t imagine a more perfect place to think and talk about the future of open source geoscience, or a more productive environment in which to write code with new friends and colleagues.

It turns out that a 400-year-old château set in 8 acres of parkland in the heart of Normandy is a great place to create new things. I expect Gustave Flaubert and Guy de Maupassant thought the same when they stayed there 150 years ago. The forty-two bedrooms house exactly the right number of people for a purposeful scientific meeting.

This is frustrating, I’m not doing the place justice at all.

The work

This was most people’s first experience of an unconference. It was undeniably weird walking into a week-long meeting with no schedule of events. But, despite being inexpertly facilitated by me, the 26 participants enthusiastically collaborated to create the agenda on the first morning. With time, we appreciated the possibilities of the open space — it lets the group talk about exactly what it needs to talk about, exactly when it needs to talk about it.

The topics ranged from the governance and future of the Software Underground, to the possibility of a new open access journal, interesting new events in the Software Underground calendar, new libraries for geoscience, a new ‘core’ library for wells and seismic, and — of course — machine learning. I’ll be writing more about all of these topics in the coming weeks, and there’s already lots of chatter about them on the Software Underground Slack (which hit 1500 members yesterday!).

The food

I can’t help it. I have to talk about the food.

…but I’m not sure where to start. The full potential of food — to satisfy, to delight, to start conversations, to impress, to inspire — was realized. The food was central to the experience, but somehow not even the most wonderful thing about the experience of eating at the chateau. Meals were prefaced by a presentation by the professionals in the kitchen. No dish was repeated… indeed, no seating arrangement was repeated. The cheese was — if you are into cheese — off the charts.

There was a professionalism and thoughtfulness to the dining that can perhaps only be found in France.

Sorry everyone. This was one of those occasions when you had to be there. If you weren’t there, you missed out. I wish you’d been there. You would have loved it.

The good news is that it will happen again. Stay tuned.

Fear and loathing in oil & gas

Sometimes you have to swallow your fear. This is one of those times.

The proliferation of 3D seismic in the 1980s was a major step forward for the petroleum industry. However, it took more than a decade for the 3D seismic method to become popular. During that decade, seismic equipment continued to evolve, particularly with the advent of telemetry recording systems that needed for doing 3D surveys offshore.

Things were never the same again. New businesses sprouted up to support it, and established service companies and tech companies exploded size and in order to keep up with the demand and all the new work.

Not so coincidently, another major shift happened in the late 1980s and early 1990s with the industry-wide shift to Sun workstations in order to cope with the crunching and rendering the overwhelming influx of all these digits. UNIX workstations with hilariously large cathode-ray tube monitors became commonplace. This industry helped make Sun and many other IT companies very wealthy, and once again everything was good. At least until Sun's picnic was trampled on by Linux workstations in the early 2000s, but that's another story...

I think the advent of 3D seismic is one of many examples of the upstream oil and gas industry thriving on technological change. 3D seismic changed everything, facilitating progress in the full sense of the word and we never looked back. As an early career geoscientist, I don't know what the world was like before 3D seismic, but I have interpreted 2D data and I know it's an awful experience — even on a computer.

Debilitating skepticism?

Today, in 2017, we find ourselves in the middle of the next major transformation. Like 3D seismic before it, machine learning will alter yesterday's landscape beyond all recognition. We've been through all of this before, but this time, for some reason it feels different. Many people are cautious, unconvinced about whether this next thing will live up to the hype. Other people are vibrating with excitement viewing the whole thing with rose-coloured glasses. Still others truly believe that it will fail — assertively rejecting hopes and over-excited claims that yes, artificial intelligence will catapult us into a better world, a world beyond our wildest dreams.

A little skepticism is healthy, but I meet a lot of people who are so skeptical about this next period of change that they are ignoring it. It feels to me like an unfair level of dismissal, a too-rigid stance. And it has left me rather perplexed: Why is there so much resistance and denial this time around? Why the apprehension?

I'll wager the reason it is different this time because this change is happening to us, in spite of us, whether we like it or not. We're not in the driving seat. Most of us aren't even in the passenger seat. Unlike seismic technology and UNIX|Linux workstations, our sector has had little to do with this revolution. We haven't been pushing for it, instead, it is dragging us along with it. Worse, it's happening fast; even the people who are trying to keep up with it can barely hold on. 

We need you

This is the opportunity of a lifetime. It's happening. High time to crank up the excitement, get involved, be a part of it. I for one want you to be part of it. Come along with us. We need you, whether you like it or not. 


This post was provoked by a conversation on LinkedIn.

Working without a job

I have drafted variants of this post lots of times. I've never published them because advice always feels... presumptuous. So let me say: I don't have any answers. But I do know that the usual way of 'finding work' doesn't work any more, so maybe the need for ideas, or just hope, has grown. 

Lots of people are out of work right now. I just read that 120,000 jobs have been lost in the oil industry in the UK alone. It's about the same order of magnitude in Canada, maybe as much as 200,000. Indeed, several of my friends — smart, uber-capable professionals — are newly out of jobs. There's no fat left to trim in operator or service companies... but the cuts continue. It's awful.

The good news is that I think we can leave this downturn with a new, and much better, template for employment. The idea is to be more resilient for 'next time' (the coming mergers, the next downturn, the death throes of the industry, that sort of thing).

The tragedy of the corporate professional 

At least 15 years ago, probably during a downturn, our corporate employers started telling us that we are responsible for our own careers. This might sound like a cop-out, maybe it was even meant as one, but really it's not. Taken at face value, it's a clear empowerment.

My perception is that most professionals did not rise to the challenge, however. I still hear, literally all the time, that people can't submit a paper to a conference, or give a talk, or write a blog, or that they can't take a course, or travel to a workshop. Most of the time this comes from people who have not even asked, they just assume the answer will be No. I worry that they have completely given in; their professional growth curtailed by the real or imagined conditions of their employment.

More than just their missed opportunity, I think this is a tragedy for all of us. Their expertise effectively gone from the profession, these lost scientists are unknown outside their organizations.

Many organizations are happy for things to work out that way, but when they make the situation crystal clear by letting people go, the inequity is obvious. The professional realizes, too late, that the career they were supposed to be managing (and perhaps thought they were managing by completing their annual review forms on time) was just that — a career, not a job. A career spanning multiple jobs and, it turns out, multiple organizations.

I read on LinkedIn recently someone wishing recently let-go people good luck, hoping that they could soon 'resume their careers'. I understand the sentiment, but I don't see it the same way. You don't stop being a professional, it's not a job. Your career continues, it's just going in a different direction. It's definitely not 'on hold'. If you treat it that way, you're missing an opportunity, perhaps the best one of your career so far.

What you can do

Oh great, unsolicited advice from someone who has no idea what you're going through. I know. But hey, you're reading a blog, what did you expect? 

  • Do you want out? If you think you might want to leave the industry and change your career in a profound way, do it. Start doing it right now and don't look back. If your heart's not in this work, the next months and maybe years are really not going to be fun. You're never going to have a better run at something completely different.
  • You never stop being a professional, just like a doctor never stops being a doctor. If you're committed to this profession, grasp and believe this idea. Your status as such is unrelated to the job you happen to have or the work you happen to be doing. Regaining ownership of our brains would be the silveriest of linings to this downturn.
  • Your purpose as a professional is to offer help and advice, informed by your experience, in and around your field of expertise. This has not changed. There are many, many channels for this purpose. A job is only one. I firmly believe that if you create value for people, you will be valued and — eventually — rewarded.
  • Establish a professional identity that exists outside and above your work identity. Get your own business cards. Go to meetings and conferences on your own time. Write papers and articles. Get on social media. Participate in the global community of professional geoscientists. 
  • Build self-sufficiency. Invest in a powerful computer and fast Internet. Learn to use QGIS and OpendTect. Embrace open source software and open data. If and when you get some contracting work, use Tick to count hours, Wave for accounting and invoicing, and Todoist to keep track of your tasks. 
  • Find a place to work — I highly recommend coworking spaces. There is one near you, I can practically guarantee it. Trust me, it's a much better place to work than home. I can barely begin to describe the uplift, courage, and inspiration you will get from the other entrepreneurs and freelancers in the space.
  • Find others like you, even if you can't get to a coworking space, your new peers are out there somewhere. Create the conditions for collaboration. Find people on meetup.com, go along to tech and start-up events at your local university, or if you really can't find anything, organize an event yourself! 
  • Note that there are many ways to make a living. Money in exchange for time is one, but it's not a very efficient one. It's just another hokey self-help business book, but reading The 4-Hour Workweek honestly changed the way I look at money, time, and work forever.
  • Remember entrepreneurship. If you have an idea for a new product or service, now's your chance. There's a world of making sh*t happen out there — you genuinely do not need to wait for a job. Seek out your local startup scene and get inspired. If you've only ever worked in a corporation, people's audacity will blow you away.

If you are out of a job right now, I'm sorry for your loss. And I'm excited to see what you do next.

Key technology trends in earth science

Yesterday, I went to the workshop entitled, Grand challenges and research opportunities in geophysics, organized by Cengiz Esmersoy, Wafik Beydoun, Colin Sayers, and Yoram Shoham. I was curious if there'd be overlap with the Unsolved Problems Unsession we hosted in Calgary, and had reservations about it being an overly fluffy talkshop, but it was much better than I expected.

Ken Tubman, VP of Geosciences and Reservoir Engineering at ConocoPhillips, gave a splendid talk to open the session. But it was the third talk of the session, from Mark Koelmel, General Manager of Earth Sciences at Chevron, that resonated most with me. He highlighted 5 trends in applied earth science.

Data and information management

Data volumes are expanding with Moore's law. Chevron has more than 15 petabytes of data, by 2020 they will have more than 100PB. Koelmel postulated that spatial metadata and tagging will become pervasive and our data formats will have to evolve accordingly. Instead of managing ridiculously large amounts of data, a better solution may be to 'tag it and chuck it in the closet' — Google's approach to the web (and we know the company has been exploring the use of Hadoop). Beyond hardware, he stressed that new industry standards are needed now. The status quo is holding us back.

Full azimuth seismic data

Only recently have we been able to wield the computing power to deal with the kind of processes needed for full-waveform inversion. It's not only because of data volumes that new processing facilities will not be cheap — or small. He predicted processing centres that resemble small cities in terms of their power consumption. An interesting notion of energy for energy, and the reason for recent massive growth in Google's power production capability. (Renewables for power, oil for cooling... how funny would that be?)

Interpretive seismic processing and imaging

Interpretation, and processing are actually the same thing. The segmentation of seismic technology will have to be stitched back together. Imagine the interpreter working on field data, with a mixing board to produce just the right image for today's work. How will service companies (who acquire data and make images), and operators (who interpret data and make prospects) merge their efforts? We may have to consider different business relationships.

Full-cycle interpretation systems

The current state of integration is sequential at best, each node in a workflow produces static inputs for the next step, with minimal iteration in between. Each component of the sequence typically ends with 'throwing things over the wall' to the next node. With this process, the uncertainties are cumulative throughout, which is unnerving because we don't often know what the uncertainties are. Koelmel's desired future state is one of seamless geophysical processing, static model-building, and dynamic reservoir simulation. It won't reduce uncertainties altogether, but by design it will make them easier to identify and addressed.

Intellectual property

The number of patents filed in this industry has more than tripled in the last decade. I assumed Koelmel was going to give a Big Oil lecture on secrecy and patents, touting them as a competitive advantage. He said just the opposite. He asserted that industries with excessive patenting (think technology, and Big Pharma) make innovation difficult. Chevron is no stranger to the patent processes, filing 125 patents both in 2012 and in 2011, but this is peanuts compared to Schlumberger (462 in 2012) and IBM (6457 in 2012). 

The challenges geophysicists are facing are not our own. They stem from the biggest problems in the industry, which are of incredible importance to mankind. Perhaps expanding the value proposition to such heights is more essential than ever. Geophysics matters.