About the BBC - Helen Papadopoulos

Do you remember the time: Discoveries from BBC Genome project

Helen Papadopoulos Helen Papadopoulos — Thu, 20 Oct 2011 14:05:38 +0000

One of the many joys of working on the Genome Project has been uncovering the connections we have with the past, the BBC and its broadcasting output. Here is a small selection of stories we have unearthed over the past few months.

Collections of time

Wallace Grevatt was an avid collector of the Radio Times and author of 'BBC Children's Hour: A Celebration of Those Magical Years'. We were able to scan many of the magazines from his collection.

A couple of weeks ago, I discovered this letter in the Radio Times from 1983, appealing for back issues of the magazines to complete his collection.

Watching with nan

In preparing to write this update on the project I also talked to our quality assessment team. They have spent the past few months poring over the data 8 hours a day and I wanted to know what thoughts the project might have stirred in them. Each of us will take different things out of the past when leafing through an old edition of the Radio Times. Hours and hours of checking the data, and for her, one of the many reminders of her childhood has been reading the listings information about 'Watch With Mother', not just the programmes she watched, but a time she spent being cared for by her grandmother and watching 'Watch With Mother', not with her mother, but her nan.

In the 1950s, the Radio Times regularly published BBC vacancies and I found this one which I sent to John Zubrzycki, BBC Research and Development. He commented: "That ad encapsulates why I wanted to become an engineer at the BBC. I wonder if it's too late to apply :-)"

As we progress on the project I'm sure we will find lots more gems such as these so I will be sure to share them with you next time I do a BBC Genome update.

Helen Papadopoulos is the Project Manager of BBC Genome

BBC Genome update: Search, discovery & access

Helen Papadopoulos Helen Papadopoulos — Wed, 12 Oct 2011 17:32:37 +0000

Navigating the BBC's Broadcast History

My dad is a physicist, working in quantum field theory, and he introduced me to the work of Richard Feynman at a very early age. Feynman is probably the most famous physicist after Einstein (though younger readers may prefer Brian Cox) and he managed to make some of the deepest mysteries of science accessible to us all.

In 1981 Feynman was featured in a BBC Horizon programme called 'The Pleasure of Finding Things Out', a title that was also used for a printed collection of his essays and interviews, and I'm sure my dad will be pleased that my current work with the Genome Project will bring this and many other treasures from the BBC's past to the attention of Feynman fans and others, and give all of us a way of finding out what the BBC has been broadcasting on television and radio, all the way back to 1923.

The Genome Project

Genome will create a complete database of the BBC's broadcast history, giving details of every programme the BBC broadcast - or at least intended to broadcast - on radio or television since the schedule was first published in 1923. Since the information is not available electronically for most of that period we're taking a brute force approach and digitising the one complete record we do have: printed copies of the Radio Times.

We started in 2010 with a small-scale pilot and its success convinced us that the project was technically feasible and would be value for money. As I write we've scanned over 4,500 magazines and have imaged more than 360,000 pages, so things are moving well.

Why Are We Doing This?

Genome will provide a database of BBC programming that will be used to support a wide variety of applications and services provided by the BBC and others, but it has a deeper value too.

Most people find pleasure and comfort in recalling programmes which we watched on television or listened to on the radio. The BBC's broadcast output and the Radio Times reflect events and society at home and abroad, and Genome is a gateway to that past. This is not just about finding out what was on television on the day you were born, but an historical record which can enrich our knowledge of the world, discovering our past and influencing our future.

How We Did It

The first step was assembling the copies of the magazines themselves. As you'd expect, the BBC holds several full sets of Radio Times, including preservation copies at the BBC Written Archives centre in Caversham, but these are contained in bound volumes for reference. Rather than disbind them, we tried to acquire as many loose issues of the Radio Times as possible so that they could be scanned easily.

Fortunately, we were able to borrow loose magazines from private collectors including an extensive collection from the 1920s. BBC Worldwide lent us the loose collection they acquired from television historian Wallace Grevatt after his death in 2003.

Extracting the data

Once the scanned images of the magazines had been produced, the mammoth task of capturing the text in the Radio Times could be begin.

This is largely being done automatically, but in order for this to be feasible we had to analyse the magazine formats, layouts and channel history over the 88 year period to create rules could be applied to capture the programme listings in a meaningful way.

We devised a schema which would house the various parts of the programme listings such as time, title, synopsis, cast, crew and so on, and doing this revealed just how complex the BBC's channel history is.

Here is a snapshot of the pre-war 'network and nations' radio services and how they merged or were replaced. It shows both the geographical transmitter history over this period and the complexity of the data sources we were dealing with.

We also discovered that the Radio Times itself is complex due to the changing layouts and formats, but it is reasonable that editors in 1923 would not have worried about making life difficult for a team of experts trying to scan the magazines more than 80 years later.

What We've Got

Using optical character recognition (OCR) software to recognise the text and semantic rules to segment the information uncovered in the magazines, the data is available to us as a collection of XML (eXtensible Markup Language) files. These are not reader-friendly, so we have developed a tool that can read them and present the information in a more accessible format for checking and validation as well as allowing us to show off the amazing details of the BBC's schedule over the decades.

At the moment we have received XML and searchable PDF files for six decades of the BBC's programming, a total of 2.3 million programme listings and we expect between 3 and 3.5 million programme listings by final delivery in December. We will then make it available during 2012.

Helen Papadopoulos is the Project Manager of BBC Genome

BBC Genome: The Complete Broadcast History of the BBC

Helen Papadopoulos Helen Papadopoulos — Thu, 19 Aug 2010 10:45:52 +0000

Most people know that the BBC does not have a copy of every programme it has ever broadcast. The main reason for this is that when broadcasting began it was seen as an ephemeral medium, and there was no way to record and store what was being transmitted.

Although it became possible to record programmes in the 1950s, magnetic tape was very expensive and recording equipment bulky and complicated, and until relatively recently only those programmes that were considered worth the cost and effort of recording and archiving for posterity were retained. The head of BBC Information and Archives, Sarah Hayes, has already written about this in detail for the Internet blog back in September 2009.

However, even though we may not have a copy of each programme in the BBC's vast archive, there may still be something related to or derived from the original programme: stills, non-broadcast footage, music, documentation, props or other material connected with what was broadcast.

The skilled researchers who work with programme-makers inside the BBC and independent production companies are used to hunting for additional material and know where to look, but on the whole the public don't even know where to start. BBC Genome is our attempt to solve that problem, by creating a comprehensive, easy-to-use online catalogue of all of the BBC's programmes so that people can discover which programmes we have, which we don't have, when and where they were broadcast and even what else we've got that might interest them.

We're working on the basis that "full or near-full public access to archives is both achievable and the right ultimate goal" and, sitting at the heart of a reshaped BBC Online, BBC Genome is the first step towards that goal. It will provide a timeline from the foundation of the British Broadcasting Company in 1922 and provide details of the programmes, channels and services which map on to that timeline, bringing the broadcast history of the BBC to life.

What BBC Genome does

The BBC stores information about the programmes we make and broadcast in many different ways, each one designed to support a specific task or function, but none of these are comprehensive nor in a publicly accessible or searchable form. We want to ensure that our broadcast history becomes and remains a working asset for audiences, and at the end of last year we set about finding a way to reconstruct the BBC's broadcast history all the way back to 1922.

We needed to create a central core, or spine, for the catalogue of broadcast records and there was one source in particular that provided a comprehensive record of the BBC's broadcast history going back to 1923: Radio Times.

It is an ideal place to begin because we have easy access to it, it contains a record of everything we intended to broadcast - even if what actually went on air wasn't what we planned to show - and it is in a structure and format that people readily recognise, with basic but consistent details for all programmes, along with regional variations. It even lists radio frequencies!

We started with a pilot project to scan two years' worth of Radio Times and extract the programme listings details from the scanned pages, in order to establish the approach and processes. Working with experts at a UK firm which specialises in projects like ours and with the British Library, every page of the 1948 and 1977 editions of Radio Times was scanned.

The Genome pilot: Metadata, OCR and outputs

The images were then converted into computer-readable text using optical character recognition (OCR) software before being divided into separate channel and programme listings so that we could identify details including the programme title, channel name, date and time of broadcast and a synopsis of the show. All of this information was stored in a database that we used to support an experimental website that presented the information in a form similar to BBC Programmes pages.

These are a couple of the early pages complementing the BBC Programmes services that we created from the XML files.

During the process we learned an awful lot and collaborated with several BBC departments as we worked out how to make the process accurate and repeatable. As you would expect, we set a very tough and exacting technical specification for the scanning, partly to optimise the accuracy of the OCR process, but just as importantly for long-term preservation purposes. We didn't want anyone in the BBC to have to come back and pay to scan the pages again in five, ten or even thirty years' time if we could avoid it.

At the end of the trial we knew we could extract programme records from Radio Times, but the other important part of the project was to create a BBC channel and service history. There are records for when channels and services began, ended or were rebranded, just not in a single accessible place, and we quickly discovered how complicated the BBC's broadcast history is. For example, in order to work out when regional opt-outs started we needed to search Radio Times and a host of other sources at the BBC Written Archives Centre.

The picture below shows over 20 different editions of Radio Times all for the same week in 1971.

What's next for BBC Genome?

In September we will begin the full-scale project of digitising over 80 years' worth of broadcast records. That's approximately 400,000 pages of Radio Times, 3 million programmes and 300 million words to recognise through OCR.

In less than a year we expect the Radio Times digitisation project to be completed and for the first time there will be, in one place, a comprehensive record of every programme.

What you'll be able to search and discover

Initially, you will be able to search by programme title, by year, day and time. Once we fully populate the database with contributors, programme synopses and other sources of data, you'll be able to find people and places and all the programme records they feature in.

You might well discover during your searches that the programme schedules are not entirely correct. They were, of course, correct when each issue of Radio Times was published, but in the early days of radio and television technical hitches sometimes affected the schedules. Similarly, throughout the BBC's broadcast history, changes in live broadcasts and major events at home and abroad will have meant that the published schedules in Radio Times were not always accurate.

What you'll be able to access

Radio Times is owned by the BBC's commercial arm BBC Worldwide and we currently do not have the rights to show the scanned pages themselves, although we hope we may be able to in the future. However, you will be able to take a journey back in time and rediscover how the BBC's networks and programmes reflect Britain's social history. You can already access some archive materials via the collections featured on the BBC Archive site, and Genome will provide access to additional archive information.

Although the BBC only has about 20-25% of the programmes in its physical archive, this still amounts to more than a million hours of output. Radio Times will provide the programme listing and, once that's done, we will start to provide access to the programmes themselves along with other material such as scripts or photos - which will be especially useful where physical programmes no longer exist or where we don't have the rights to make the programme itself available - and begin to make it all visible from BBC Online.

Making everything available will take time, but the Radio Times programme records will soon create the spine for Genome and are a vital first step in bringing the BBC's broadcast history to life.

One last note: Radio Times was first published on 28 September 1923 and I have referred to the foundation of the BBC in 1922. Using other sources, we do plan to make programme records available from the first ever BBC broadcast on 14 November 1922 when the Marconi transmitting station 2LO was taken over by the BBC. It truly will be a complete broadcast history of the BBC!

Helen Papadopoulos is the Project Manager of BBC Genome