<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/blogs/shared/nolsol.xsl"?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>

<title>
BBC Internet Blog
 - 
Mark Neves
</title>
<link>https://bbcbreakingnews.pages.dev/blogs/bbcinternet/</link>
<description>Staff from the BBC&apos;s online and technology teams talk about BBC Online, BBC iPlayer, and the BBC&apos;s digital and mobile services. The blog is reactively moderated. Posts are normally closed for comment after three months. Your host is Eliza Kessler. </description>
<language>en</language>
<copyright>Copyright 2012</copyright>
<lastBuildDate>Mon, 22 Nov 2010 15:15:03 +0000</lastBuildDate>
<generator>http://www.sixapart.com/movabletype/?v=4.33-en</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 


<item>
	<title>Trialling search on message boards: technical details</title>
	<description><![CDATA[<p><a href="https://bbcbreakingnews.pages.dev/blogs/bbcinternet/2010/11/trialing_search_on_bbc_message.html">Messageboard search (as mentioned in David's post)</a> has been implemented using the full text search engine that ships with <a href="http://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server </a>2005.  We have found this engine to be very efficient and reliable and a massive improvement over the offering in SQL Server 2000.  We currently use this engine to provide article search on <a href="https://bbcbreakingnews.pages.dev/dna/h2g2/">H2G2</a> and <a href="https://bbcbreakingnews.pages.dev/dna/memoryshare/">Memoryshare.</a></p>

<p>We were unable to support messageboard search across all services in the same way we support article search because of the huge amount of posts in our system.  We had to tackle the problem in a different way.</p>

<p>The messageboard search system has been architected to use a dedicated "Search" database.  Resources dictate that this new database will initially live on the same server as the main messageboard database, but the architecture allows us to easily move the solution to one or more separate servers, as we bring search to more messageboards in the future.  </p>

<p>One of the design goals was to make it easy to bring search to messageboards one by one, allowing us to provide effective and fast search without compromising the quality of existing services.</p>

<div class="imgCaptionLeft" style="float: left; ">
<img alt="Diagram explaining how search works on the BBC message boards" src="https://bbcbreakingnews.pages.dev/blogs/bbcinternet/img/search_messageboards_diagram.jpg" width="595" height="626" class="mt-image-left" style="margin: 0 20px 5px 0;" /><p style="width:595px;font-size: 11px; color: rgb(102, 102, 102);"> </p></div>

<p>The diagram above shows the two databases.  The messageboard database stores all the posts across all messageboards in a single table.  The search database fetches the latest posts belonging to the messageboards that have been configured to support search.  The posts for each messageboard are stored in dedicated tables in the Search database.  We create a separate full text index on each table that allows us to efficiently gather search results on a per-messageboard basis, without having to filter the search results that come back from the engine.  We make use of the ranking values the engine provides to order the results by relevance.</p>

<p>In order to provide the fastest and least resource-hungry searches, we've adopted a simple "AND" based search.  We take the search term the user gives us, create a list of search words using the space character as the separator, remove any Stop words (i.e. common words that are no use in terms of search, such as "the", "and", etc.), and then ask the search engine to find all the posts that contain all the words in the term.</p>

<p>For example, if the term "Fish and chips" was passed in, we would first create a list of search words using the space character as the separator, yielding "Fish", "and" and "chips".  We would then remove the Stop words, causing "and" to be removed, leaving "Fish" and "chips".  We then ask the engine to find all posts that contain the words "Fish" and "chips".  Incidentally, all searches are case insensitive.</p>

<p>The number of months' worth of posts that are searchable is also configurable, on a per messageboard basis.  Again, this is to allow us to control the amount of server resource that messageboard search requires.  The number of months of searchable content will be decided on how busy the board is, and the nature of the board.  Some boards may be more interested in current posts, whereas other boards may have content that's more historical and therefore still valuable as time passes.  Ultimately it is our intention to allow all posts on all messageboards to be searchable.</p>

<p><em>Mark Neves is lead database engineer, DNA team, Audience Publishing Services, Programmes and On-Demand, BBC Future Media & Technology</em> <br />
</p>]]></description>
         <dc:creator>Mark Neves 
Mark Neves
</dc:creator>
	<link>https://bbcbreakingnews.pages.dev/blogs/bbcinternet/2010/11/trailing_search_on_message_boa.html</link>
	<guid>https://bbcbreakingnews.pages.dev/blogs/bbcinternet/2010/11/trailing_search_on_message_boa.html</guid>
	<category>innovation</category>
	<pubDate>Mon, 22 Nov 2010 15:15:03 +0000</pubDate>
</item>


</channel>
</rss>

 
