22 October 2018

(re)building use cases for language technology in the newsroom

Engagement producer Alli Shultes shares five ideas from our latest language technology #newsHACK with the Summa consortium.

Four years ago, News Labs hosted a hackathon for media organisations, research institutions and start-ups working in the field of language technology.

Scalable Understanding of Multilingual MediA, or Summa, is one partnership that came out of that event. Funded by the EU's biggest research and innovation programme Horizon 2020, the Summa consortium has spent the last three years building a media monitoring platform that can help journalists keep abreast of global news --- no matter the language or format. By chaining together technologies such as automated transcription, machine translation, topic detection and story clustering, it gives journalists the ability to get a high-level overview of global news trends --- as well as more nuanced views on stories and topics that they're interested in.

What's next? At BBC News Labs, we've been investigating new applications of the technology that's been developed as part of the project. We have some ideas; for example, our Summa team has built a prototype that allows editors to easily search the transcripts of Arabic television programmes for key words. But we also think that there are more ways that the technology can be solving problems that editors, producers and journalists face in their day-to-day workflows.

Screenshot of list of Arabic television feeds in tool

An example of a prototype the BBC's Summa team built using the technology

That's one of the reasons we hosted a Summa #newsHACK in Bonn, Germany this October with BBC Connected Studio, and in collaboration with the Summa project and the German broadcaster Deutsche Welle. After a half-day of workshops and ideation sessions, five teams of researchers, developers, data scientists and journalists prototyped their ideas for how Summa technology might solve problems in existing production workflows. Here's a quick look at what teams built, what our expert panel thought and where we plan to go from here.

The panel

Deviating from the more competitive format of our regular #newsHACKs, where judges select winners in pre-determined categories, we asked a group of innovators in the media industry to offer feedback to participants on their prototypes.

summa the panel

Esra Dogramaci is a digital change maker with significant experience in digital transformation across news. She was most recently at Deutsche Welle as senior editor for digital, and previously served as the BBC's first digital consultant, working across a variety of presences including Newsnight, HardTalk, and the Travel Show.

Adam Thomas is the Director of the European Journalism Centre, an independent European non-profit connecting journalists with new ideas. Previously, he has served as the Chief Product Officer at Storyful and Head of Communications at the international nonprofit Sourcefabric. He has worked on media development projects in over 50 countries worldwide.

Yacine Messaoui is a digital transformation consultant. He most recently served as technology strategist and the director of Al Jazeera's media platforms department — where he led the implementation of the Network newsroom transformation initiatives. He is the organiser of the "Future Media, Leaders Summit" and speaks worldwide about the impact of technology on economy, culture and politics.

The pitches

Team 1: Make the news social again

This team from Technical Universitat Darmstadt and University of Edinburgh attempted to re-engage news consumers on article pages with a module that reveals how stories and topics are discussed on Twitter.

They envisioned featuring a combination of the most engaged-with tweets and tweets selected through argumentation mining — a process that finds text that either supports or refutes a given statement or claim.

The team said that an additional benefit of mining social media for comments is that platforms like Twitter and Facebook already have some verification systems in place, offsetting the burden of moderation for newsroom staff.

summa team1

Adam: I feel like if you made this a back-end system, it would make this really useful for editors. ... The NYT have Perspective AI, which is something similar for comments and, as editors, they have control... and I think that would be super interesting. News organisations are also using tweets to show reactions, so I think as a back-end system it'd be great.

Esra: "As far as I know, the BBC wouldn't allow embeds coming over from Twitter because they felt that they wouldn't be able to control content of tweets... if you had someone hack into somebody else's account, and you're pulling that over, how do you mitigate some of those risks that are coming over from Twitter?"

Yacine: I think you are touching on an important problem that media organisations are facing today... however, you are opening doors to a tsunami of content. It's sad to see that most media organisations now outsource commenting to Facebook and Twitter. I think the attempt is a good start, but I'd say in theory there needs to be a review process.

Team 2: Read this!

This team built a prototype that uses Summa technologies to support or refute claims in news stories. Users need only highlight a suspicious statement to pull in reporting from other news organisations that suggests whether the statement is true or false.

If a claim couldn't be verified or refuted, it would be labeled as needing further investigation.

summa team2

Adam: "I really love the idea of providing more context for users. The more context we can apply, the more trust we can build with audiences. As a browser extension there are a number of people doing these things. The sources are the issue. There is research showing that fact-checking is being perceived as a part of the problem by one side of the political spectrum."

Esra: "I'd think a platform would be all over something like this. At the bottom of the article you could have a verification rating — 60% verified, 40% could not be verified... and then you could create a ranking of verifiable news sources. You could also think about linking to the comments themselves."

Yacine: "This type of initiative always has to do with educating citizens. So I'd love to see this used more for an educational perspective, in schools and universities, where people learn to have critical thinking skills."

Team 3: BBC News FLYA

Citing research showing that language-learning is a core skill for young people — and that the same demographic finds news depressing to read — this team built a new service that teaches foreign languages to students around the globe.

Combining an online learning portal, mobile app and podcast with transcriptions, the team allows users to see clustered article titles in their native languages after performing a search. They also presented a readability metric to alert readers to how difficult an article's language is. Readers can choose to revert individual paragraphs in an article to their native language if they are having trouble understanding the vocabulary used. Entity highlighting also gives readers the ability to easily look up terms and concepts that they are unfamiliar with.

summa team3

Adam: "I think news organisations need to look at the information needs of their audiences more generally as well. They focus a lot on breaking news, but there are other information needs that their audiences have. Learning languages is one of them.

I think there is an expat market for this, where there is genuine interest in local news but generally you don't understand it. But I think Summa is going to have to be really, really good in the translations..."

Esra: "How are you selecting stories [that appeal to your audience]? I had two ideas. One is News Mavens, which is a group where women aggregate the news. They look at stories across Europe that have a women-specific focus, they pull from all different parts of the European landscape... Then there's the Constructive Institute in Denmark, which does more solutions-focused journalism... if I were in your shoes, I'd be reaching out to people like them."

Yacine: "The idea is great... I like the fact that you can translate one piece [of an article]."

Team 4: Six-Nine: News Perspectives

summa team4

This team built a visualisation tool for comparing which topics and stories are being covered most frequently in various languages. The prototype made us of the Summa platform's translation, topic detection and entity extraction technologies.

The team said that they also might implement a quiz feature, with players selecting the stories that they think are biggest in certain language groups.

Adam: I don't think we understand enough about how people read... I think we're seeing newer news organisations thinking about their verticals in completely different ways. De Correspondent is one example. They tend to focus on single themes... but they don't do pan-European views of a story. I think with interactives you could do something really interesting.*

Esra: "I want to know: who is your audience? What makes it interesting? News nerds will love it."

Yacine: "The categorisation of content has always been an interesting topic — how do we cluster topics for people's interest? You presented in a nice way by using different countries and different perspectives. So I was just thinking how that would apply for media organisations? Do we need to tag content in that way?"

Team 5: EnGague

What if people had access to the raw data that informs reporting? This team of Summa partners imagined a new timeline tool that could show how topics covered by the news change over time.

Focusing initially on political parties and politicians, the team's timeline visualisation allowed users to view articles in different languages and countries, and also across different time periods.

summa team5

Adam: "This could be interesting for engaging editors. .. the MIT Media Project, BBC Monitoring Project and places like the Guardian...spend a lot of time looking at their own output, and making sure we have a diversity of opinions. This could be an interesting potential use case."

Esra: "What would be nice would be something like this that could be visualised... What I would suggest is looking at the GEN Data Journalism awards [for ideas]."

Yacine: "If people just remembered what we said and what we said in the past, I think people would be wiser in their judgement and decisions.... This is valuable from an information perspective and interesting to the audience."

What's next?

The Summa project will come to an end in 2019, after three years of development. At that time, many of the technologies that have been developed will be available as open-source modules. Here's a tentative list of what's planned for release, as we continue to work on finalising details.

News Labs will continue its involvement in research projects contributing to the development of language technologies for the newsroom. Most recently, a joint proposal submitted by BBC News Labs and external partners has been approved for funding under Horizon 2020. Called GoURMET (Global Under-Resourced Media Translation), the three-year project will investigate how neural networks can be most effectively used to power machine-assisted translation for low resource languages.

Get involved!

You can sign up to receive emails with the latest news on the Summa project. Email us at newslabs@bbc.co.uk to be added to the list.

summa get involved

Related Links


Love data and code?

We'd like to hear from you.