The Juicer

Status: active
How might we support exploration and understanding of journalism at a global, meta level?
feature-image

The Juicer is a news aggregation and content extraction API. It takes articles from the BBC and other news sites, automatically parses them and tags them with related DBpedia entities. The entities are grouped in four categories: people, places, organisations and things (everything that doesn’t fall in the first three).

The Juicer API

Documentation for the Juicer API is publicly available at this link.

To use the API, you need an API key. If you’re a BBC employee, you can obtain an API key by registering at the Developer Portal and requesting “bbcrd-juicer-apis-product” as the product you want to access.

If you do not work for the BBC, you can request an API key by emailing newslabs@bbc.co.uk, provided you plan to use it for non-commercial purposes and agree to the terms and conditions listed below under “FAQs”.

Web interfaces

We’ve built a beta web interface to the Juicer, which you can use if you are connected to a BBC network. Additionally, the WAT webapp uses the Juicer API to generate graphical representations of which news organisations are covering what.

We are working hard to give everyone more ways to play with the data available in the Juicer, including a public search interface, trending topics and more jaw-dropping visualisations.

FAQs

BBC Juicer: What it is and what it is NOT.

BBC Juicer is a news aggregation ‘pipeline’. It ingests news articles and extracts the best from them - well, just like a fruit juicer does. BBC Juicer pipeline is watching RSS feeds of news outlets. When a new article is published on one of these RSS feeds, BBC Juicer scrapes the news article, both raw text and metadata (e.g. date, time, title, news source …). In the next step BBC Juicer identifies and tags concepts mentioned in the article text making them searchable and therefore useful for trend analysis. The Juicer API allows users to retrieve JSON representations of the news articles.

Which news sources did you include and why?

At the moment, BBC Juicer monitors around 850 RSS feeds from international, national and local news outlets. We started with British and other English language sources and are expanding into other languages. Importantly the list of sources does not claim to be comprehensive nor does it claim to provide a representative set of sources.

We do NOT ingest content that is behind a paywall. Any news outlet that does not provide news content for free will not be ingested in BBC Juicer. However if you wish the BBC to remove your content from BBC Juicer please get in touch: newslabs@bbc.co.uk. Please provide details of the content to be removed, including the date of any article or headline.

If you are working for a newspaper or other news outlet, which provides free online content, and would like it to be part of BBC Juicer, or if you are planning a project, which requires content from other news sources, please get in touch: newslabs@bbc.co.uk. Please provide details of how you will provide open access to your technology. We will look into including your source.

Where are the tags to each article coming from?

Tagging is not manual or editorially controlled. It is an algorithmic process. The algorithm analyses the raw text of the article to find concepts, i.e. people, places and organisations that appear in the text. Depending on the context in which it appears within the article a term may or may not have a tag assigned to it (depending on some confidence thresholds).

The respective tag doesn’t need to appear in the exact wording in the text. An article may, for example, be tagged with Greece on the basis of containing the word Greece or Hellenic Republic.

The result of the ‘concept extraction’ depends on the current version of Wikipedia that underlies BBC Juicer (currently updated once a month and on a number of parameters to adjust the engine). All this is still very much in development. It is an experimental project and we don’t take responsibility for incomplete or false assignment of tags.

I found an article on Juicer the other day and now the link is broken. Why?

If an article was amended by the original news source, this will not trigger a re-tagging within BBC Juicer unless the article was republished on RSS. If the article is no longer accessible on the publisher’s website, the link to the original source will break. BBC Juicer will however keep title, text, metadata and tags assigned to the original article.

I have more questions, I have ideas and I want to contribute!

Are you a student or working in research and would like to use BBC Juicer for a project?

If you are looking for a data source for an academic project, we are very happy for you to use BBC Juicer. You can access BBC Juicer through the simple graphical user interface we provide, together with a key. Depending on your project you may quickly want to do more. You can access the full database of articles collected since autumn 2014 via the BBC Juicer API. BBC Juicer is free to use for academic projects, but we ask you to attribute BBC News Labs and we would be happy to consider writing about the results of your research on our BBC News Labs Blog.

Are you a journalist and would like to use BBC Juicer for your daily work?

We are actively exploring, through hackathons, how BBC Juicer may be useful for a journalist in their daily work. If you have a bit of time on your hands to get involved with us, and discuss how BBC Juicer may be useful to you or your newsroom, we would love to hear from you: newslabs@bbc.co.uk

Are you a UX designer, front end or back end developer?

BBC Juicer is still in development and we are always open to bring in people who can contribute their expertise to develop the system further, both on the back-end side as well as developing new front ends that speak to journalists, and other user groups. Get in touch: newslabs@bbc.co.uk.

I think something isn’t working here. Who do I tell?

If you have a question about BBC Juicer, experience problems using it or think there is something in there that should not please do get in touch with us via: newslabs@bbc.co.uk

Do I have to pay to use BBC Juicer?

You can access BBC Juicer for free provided that your use is non-commercial and you comply with all applicable terms of use.


Next Priorities

  1. Public interfaces / prototypes to support open use and adoption of The Juicer.
  2. Multilingual, cross-language topic mapping, in order to look at global topic trends / patterns.