Opinion article – Exploring data journalism

Written by Jonathan Soma, Knight Chair Professor of Professional Practice in Data Journalism and Director of the Lede professional program at the University of Columbia 

What does it take to be a data journalist? There’s no single answer! The field of data journalism is infinitely wide and infinitely deep, with as many approaches as there are stories in the world.

Unlike something like photo or magazine journalism, the “data” aspect of data journalism is almost always about the source of the information, not the end product. Data becomes another language for collecting information – if I spoke Spanish, I could certainly report out a lot more stories around the United States. It’s the same thing with data! A few extra skills can quickly open your world to sources and details that would otherwise be locked away.

With a field so large, how does one get started in data journalism, and where might they go? Let’s look at a few examples of the many genres in the field.

Beat reporters often find themselves as “accidental” data journalists, picking up the tools and skills they need just to finish a story. Fleshing out a piece about a school’s performance might involve downloading a spreadsheet on graduation rates across a city’s school districts, quickly leading the reporter to “pivot tables” as a way of grouping and summarizing data.

SAMPLE PROJECT: Find an open data portal for your country or city. Use pivot tables to summarize the data based on states or districts.

Investigative journalists are likely experts at the acquisition step of the data process – not everything is a simple-to-download Excel file! They might use Chrome extensions or Python to build interactive “scrapers” that comb through websites, filling out forms and clicking buttons, all the while downloading important information that’s locked away online.

Freedom of information requests also tend to come back in unfriendly formats, with data presented as scans of emails or tables inside of PDFs. The scans might require “optical character recognition” (OCR) technology to allow you to copy out the text, although tools like Google’s Pinpoint or newer versions of MacOS Preview can tackle that automatically. Spreadsheet tables displayed in PDFs are a constant headache for data journalists who might rely on Cometdocs, Tabula or Python’s pdfplumber.

SAMPLE PROJECT: Upload a document (or a photograph of one!) to Google’s Pinpoint and see how well it does at extracting the text. Find a PDF with a table in it and give it a shot with Cometdocs or Tabula.

Mapping and geographic analysis has become a hot topic in the past few years, giving journalists the ability to peer into locked-down countries and war zones through satellite imagery. Data can come from satellites sponsored by the United States of the EU, or through private companies that provide daily snapshots of every corner of the Earth’s surface. A data journalist might process this data using free tooling like GDAL or QGIS, or opt for more expensive, paid products like ESRI’s ArcGIS.

What we call mapping is part of the larger field of GIS, or geographic information systems. GIS is a discipline all of its own, and you might even find specialized journalists with a Masters or PhD in the field! But despite the high limits of what’s possible with geographic data, it’s never been easier to get started.

SAMPLE PROJECT: Use Google Earth Pro’s historic imagery to show change over time for anywhere on the planet, or try making a MapBox scrollytelling experience.

Graphics journalism is often the most technically advanced version of the data journalist. Before a graphics journalist can even get to the visualization step, they often have to scrape, clean and analyze the data just like a “normal” data journalist. It’s only after all that work that they get to build the charts and graphs! A popular, accessible tool is Datawrapper, which many newsrooms use for standard visuals.

If a newsroom is looking to build custom visualizations, however, it gets complicated quickly. Modern toolkits are often based on Svelte or React, JavaScript frameworks that tie together data and visuals to present as interactives on the page. Even static graphics – ones that don’t change or move – are often exported from Adobe Illustrator with ai2html, a tool from the New York Times that allows text to be translated and easy viewing on different screen sizes. Even though you might think of a graphics desk as being focused on visual design, there’s a lot hiding right under the surface!

SAMPLE PROJECT: Learn the basics of interactive web development with Svelte, or how to use ai2html in Adobe Illustrator.

The world of data journalism might be intimidating at first, but such a wide array of tools and approaches also means there are plenty of opportunities to get started.