How we built Datastringer

Journalists always rely on sources to do their job. Sources very often provide journalists with stories, as well as quotes and interviews. Big agencies like Reuters and AP do that too: they are fed wire reports from local journalists on the ground, everywhere on the globe. These people are called stringers.

Picture yourself as a local journalist. You love local police stories, there’s always something to write about, and your audience cares a lot about them. You get these stories by calling your sources in the force in the morning, and by receiving press releases every once in a while. How fantastic would it be if you could have a little piece of software you could tell what you’re looking for, then let it do the leg-work and alert you when it finds something worth your attention?

That’s what Datastringer aims to do. It was built to be a simple as possible, and very much journalism-oriented. The idea is not new in itself, it’s basically a monitoring tool. However, Datastringer wants to:

  • Simplify the process of monitoring streaming pure-data sources,
  • Give you the freedom to use all JavaScript and Node’s power to mash up several datasets at once,
  • Provide a ready-to-use environment to help you define your alert parameters, as well as encourage you to perform basic data-analysis operations on the monitored datasets before setting the alerts.

As always, I will try my best not to fall into PR, and instead, to offer my personal view on Datastringer’s creation process, as well as what we learnt by doing it. Do head to the repo and wiki for examples, snippets, and more code-related stuff.

From Idea to Hacking

The idea came to me in November, during the first Hacks/Hackers London meetup I attended. OpenNews Fellow Annabel Church led a workshop about this thing she called “data stringers.” Attendees were asked to gather in teams to design one of these.

It was a pure coincidence, but I buddied up with my university mates and Iain Collins, who would later join the BBC News Labs too. The idea we had was in the continuity of my work about housing in London: we wanted to combine several datasets available on data.gov.uk, to produce an as-accurate-as-possible picture of the cost of living in the capital. Average property prices, energy prices index… u After the event, I gave our post-its to Annabel and promised to be in touch later.

The idea never quite left my head, but it was only months later that I considered taking it to the next level, after I discovered Annabel’s Datawi.re on Github. I found the installation requirements too high for journalists, and I needed a good project to showcase in my OpenNews Fellowship application, so I set myself a deadline on 16th August, with the objective to produce a simpler version of Annabel’s project.

Fortunately, the News Labs offers me something invaluable: an incredible freedom of pursuing this sort of projects. Once Matt Shearer, our Innovation Manager and News Labs’ boss, greenlighted the project with his legendary “good stuff”, I started working. I was joined by Clément Geiger, a Parisian friend of mine, whom I considered much more aware of good practices, proper software development, and architecture than I am.

We started hacking.

Datastringer’s Philosophy

Keep it simple, stupid

From the start, I knew what I wanted: a product as simple as possible that journalists could use. Venturing into databases, tricky installations, and anything more complex than JavaScript was an instant no-no.

(I will come back to this philosophy later in the article, as I tried to keep this product vision as much alive as I could, thinking that it would be the right role for me in my attempt to mediate between such different worlds.)

Working at Distance

After I convinced myself and my boss to do the project, the first big task was to make sure that we (Clément and I) were on the same page, in order to work efficiently. Working at distance is rather complicated, and technology hasn’t quite replaced a proper human interaction. Emails read tougher than intended, and Skype chats provoke this weird feeling of seriousness even between two friends. Plus, we were both working on different schedules: working hours for me, free time and evenings for Clément.

We didn’t have much time, though, so we started quite quickly. While Clément designed the architecture of the project, I attached myself to finding sources of data we could use to test and demo the project we called Datastringer.

Regularly, we would go through each other’s commits on GitHub. And progressively, as we added lines, error-handling functions, callbacks and fallbacks, I started to lose track of what we were doing. At first, the architecture we agreed on dissociated the program in two separate tasks: fetching the data, and comparing the data; respectively called “sources” and “outputs”. Each of which containing a bunch of sub-functions.

Pivoting

“I don’t understand anymore”

The day before the 16th August, e.g. one day before the OpenNews deadline, and three days before our presentation to the News Labs, Clément came to London for a final sprint. We worked all day on the project and on our caffeine-per-gram-of-blood level. Until this key moment when, while writing the documentation, I asked Clément for help.

“Can you explain to me how the interaction between sources and outputs (the program’s core) works? I’m not sure to get it,” I said. “Sure, that’s easy, look.” Clément started drawing on the large windows, explaining out loud the inner interactions. When he was finished, I was panicking. “I don’t understand anymore.”

What followed was a long discussion during which he tried to explain this stuff to me, and during which I felt more and more uncomfortable. What was an output really doing? Why couldn’t we unite them in a single file? If I can’t understand it, how can I expect my fellow journos to use it and make it their own?

There was something deeply wrong with our work, and I didn’t want to continue in this direction. And then, the beauty of human interaction happened. I said: “Here’s what I want, here are the functions I want, here is how I want somebody who want to use it to proceed.” And we took the problem piece by piece, throwing away huge parts of it, tearing the code apart, to eventually agree on what the architecture should be.

In the Underground heading home, we were both smiling and laughing. We were going to make the deadline, we were about to be able to release a first working version, and we made it so much simpler. And we did. We received a very warm welcome on Hacker News and GitHub, with a lot of attention.

Open Sourcing

Another success for this project is that it is the first step towards more open-sourcing of BBC News Labs’ projects. As many developers, we have been using open-source software extensively, from hacking to production. I am personally very glad that everybody shares this will to give back to open source, and to contribute to this community.

To do so, we made a GitHub organisation where our creations will be available. There is more to come on this matter.

A Word on the Technical Side

If Datastringer aims to be as simple as possible in the future, and to be relevant to a non-technical audience, I believe that the way we built it allows the power users to do quite a lot with it.

As I mentioned earlier, we decided to go with JavaScript as main programming language. There are a number of reasons for that:

  • We believed that many online journos now have to get their hands dirty and are (more or less) familiar with JavaScript,
  • JavaScript is flexible and is quite a very good way to fetch and consume data online, and…
  • I must admit that we wanted to stay in control when it came to the code, so we had to pick a language we were familiar with.

About the way it works, you’ve got to think of it this way: datastringer.js is a sort of black box which uses user-input values, parameters, and functions, stored in use_cases.json. Let’s have a look at this file’s structure:

javascript [{ "stringer": "local-police-stringer.js", "parameters": ["metropolitan", "00AGGU"] }, { "stringer": "crime-stringer.js", "parameters": ["51.52863195218981", "-0.12342453002929688", "6", "10"] }]

This is what it looks like after the user went through the initial configuration wizard. Each part of the JSON defines:

  1. A stringer that is going to be called—this is a JavaScript function that is going to perform the fetching of the data and its optional manipulation. This function takes some parameters.
  2. A set of parameters that the function is going to use. These are completely dependent on your data source!

In this example, to fetch data from a local police force from Police.uk API, one need to provide a police force and a neighborhood code. For the crime statistics, it’s not a neighborhood code anymore, but GPS coordinates. We performed operations on the returned data to group the crime per category (arson, bicycle theft…) and per month to compare the total monthly amount of arrests for each category. The comparison will be done against an average, the third parameter, in number of months (i.e. 6 months here), and the alert will be triggered if there is an increase or decrease of x%, x being the fourth parameter, here 10%.

The function which does these operations is located at the path indicated in the stringer key, and it reuses the parameters given under. See for example the syntax of the crime-stringer.js function:

javascript function stringer(lat, lng, numberOfMonths, threshold, callback) { // get the data for each month // sort crimes in categories // for each category, calculate the average on y months, y being numberOfMonths // compare the last month value and the average // if this difference is more than x % or less than -x %, trigger the alert, x being threshold }

As you can hopefully see, we kept things as simple as possible, while guaranteeing flexibility and scalability:

  1. For a given stringer, you can define as many sets of parameters as you like. As an example, for the use_cases.json file pasted above, you could monitor a large number of neighborhoods by simply appending to this file your neighbourhood codes. It will reuse the local-police-stringer.js every time.
  2. The stringer() function is voluntarily very general, and can be used to do complicated operations (like storing a reference file or changing completely the data’s format for comparison) as well as for simply fetching a JSON file.

Then, should the alert be triggered, the work is deferred to mailer.js, which is quite trivial too. At the moment, it combines nodemailer and postfix to send an email to the address the user gave us in the wizard.

Future: Testing and Improvements

The project is still in its early stages. I very much intend to improve it, and I am amazed by the reaction it received, both on Hacker News and on GitHub. People are offering help and expressing interest, and that’s really heart-warming.

However, managing all these individual efforts will require time and a long-term-ish vision of Datastringer. A road-map needs to be drafted to remind everyone at every stage of the philosophy we should abide by, and to give as much visibility as we can to people interested in adopting the product.

As the BBC values state: “We take pride in delivering quality and value for money.” And that is something we ought to improve for Datastringer, which is so far quite hacky. We are already working on it.

Another comment we received during launch was striking: even though the project aims to be as simple to use as possible and even announces that “it’s fine” if you don’t know how to code, the installation process takes place on GitHub, and involve cloning a repo and running npm here and here. That appeared common sense and second nature to me, but… it’s not. How many journos have git installed on their machine and use the command line? I would guess not that many.

We lost track of our audience here, and that definitely will be in our roadmap as something we need to improve. A graphical interface or a service are possibilities we will be exploring.

My main focus for the weeks to come will be to field-test the project and to make the software more accessible for the audience. I am very much looking forward to our 1.0 release, that I will consider ready when anyone can install Datastringer on his machine and get up and running without coding.

As part of this intention, we just kickstarted a test with BBC London journalists interested in crime statistics. Their way of collecting data was pretty much similar to the one described in the introduction to installation page: they call their sources, and they check the Metropolitan police’s website for public announcements. I felt like I was talking magic to them when I said the process could not only be automated, but completely transparent and unobstrusive for them.

When we were setting up the parameters that they were interested in monitoring, the team were hesitating between comparing the rise of crime for a month to the same month’s number from last year, or to an yearly average. I asked: “What would be your headline if you received this story alert?” “Yearly average,” they said.

With this very strong connection between the headline and the alert parameters validated by the very first field test, I knew we were onto something.


Categories:

Tags: