BBC Data Day III - November 2014

The BBC’s journalists and technologists gathered last week, for the third BBC Academy Data Day.

Journalism Meets Technology at BBC Data Day - Part One

These events take place around every six months, and are a chance for BBC staff to be exposed to the challenges and opportunities of data driven journalism within the BBC and to be enthused and inspired by examples of great data use. The day was presented by the enthusiastic and irrepressible Bill Thompson, and offered, ‘an up to the minute tour of current and important developments in data journalism in particular and data strategy, tactics and policy in general from the BBC and beyond’.

The day kicked off with the BBC’s Phil Fearnley and Robin Pembrooke joined on stage by the Economist’s Kenneth Cukier to discuss ‘Data Journalism: The State of the Art - BBC and Beyond’. Phil the ‘Director of Homepage and myBBC’ talked about the importance of staying relevant in a data-driven world and how crucial it is for the BBC to treat each person as an individual. Between now and 2017, the BBC’s myBBC initiative aims to move from channels and demographics; no personal data used in recommendations and the creative process driven by narrow data thinking to a world where audiences are served individually based on their tastes, interests and passions. Social and knowledge graphs will be utilised for recommendations and deep insights will allow us to commission more successfully. At the heart of myBBC will be the user profile and activity data, augmented by data drawn from content and concepts. Phil stressed the importance of myBBC to the BBC as a business transformation project which will impact on our public service and commercial activities.

Kenneth Cukier, ‘Data Editor’ for the Economist tackled the data journalism imperative and concluded that data journalism breaks two things in traditional newsrooms; meaning that we don’t know at the outset what may find and with the likelihood of negative discovery, our institutional processes and cultures need to be be comfortable with failure. Kenn’s parting shot was that what we’re doing is not not data journalism, it’s journalism.

Robin Pembrooke the BBC’s ‘Head of News Products’ for BBC Future Media talked about how the it’s possible to undertake data journalism at scale with linked data with the challenges of high numbers of concurrent requests, processing accurately in real time, the wide variety of devices and screen sizes and multiple views on fast changing data sets. The BBC’s linked data model allows data to be surfaced alongside related content with BBC Things offering a way to aggregate related content to data - a more detailed write us is available here.

Liliana Bounegru, Jonathan Stoneman and Paul Bradshaw offered ‘Reports from the Data Journalism Frontline’. Liliana is ‘Data Journalism Lead’ at the European Journalism Centre and co-author of the Data Journalism Handbook. She mentioned just some of the many digital tools and methods available to journalists including hyphe, netvizz which extracts data from different sections of the FaceBook platform for research purposes, DMI-TCAT for Twitter analysis and Gephi. Amongst the resources Liliana profiled as a really useful directory of data journalism tools and a counter-Jihadism project which examined social media to determine how different counter-jihadist groups in Europe might be connected and how.

Jonathan Stoneman is a data expert and trainer and made the point that in this world, data and statistics can be separate. He profiled a fascinating behind the data story of jaywalking arrests in Champaign, Illinois. Students asked for FOI request on what people got arrested for – they made a pivot table to reveal the biggest arrest offence – and in 7th place was something unexpected that led to a story.

Paul Bradshaw, online journalist, blogger and academic covered some of the data scraping options available to journalists including Kimono for turning websites into structured APIs from your browser;, which does a very similar thing; Web Scraper extension for the Chrome browser; Outwit Hub, one of oldest web scraping tools; Tabular and Doc Cloud for releasing data locked in PDFs; Muse which looks at email patterns; Open Refine which sorts out messy data; for when you have no data; Reporter an iOS app for your phone that collects data; Query Tree to set up templated queries for regular data stream.

Also of note was the Migrants Files Investigation which won the Data Journalism awards.

“The Migrants’ Files project was launched in August 2013 by a group of European journalists who joined forces to accurately calculate and report the deaths of emigrants seeking refuge in Europe. This pan-European consortium of journalists is partially funded by the European non-profit organization”.

There was an interesting comment from the floor about how it can be hard for journalists to find the time to work with data, which Paul robustly responded to in this blog post.

“Data is increasingly falling into journalists’ inboxes; finding it is a simple search query away; our sources have increasing access to it themselves; and the tools to analyse it are free. Saying “I haven’t got time” to deal with that type of information is cutting off a large chunk of potential leads, context and verification. It is actually making extra work for yourself. It is actually adding time”.

Look out for a second post on BBC Data Day, as well as some guests posts from some of the terrific journalists and speakers who attended.