Blog  Subscribe to our RSS feed RSS

Google Search Appliance Integration

Google Search Appliance

Gossamer Threads acquired a new toy for our data centre – well it’s not ours, but we’re allowed to play with it! Take a look at our new Google Search Appliance!

The Google Search Appliance (or The Google Box as we so lovingly call it) was needed for Coursework.info, a massive academic essay repository – over 148,000 and growing.

The Goal

Generally, when users search the site, they look for particular keywords or phrases in a document to get a more relevant and concise list of results. We wanted to take advantage of Google’s elegant scoring algorithms to generate those relevant results.

In addition, the Google algorithm can be context aware. If you wanted to look within a result of Romeo and Juliet essays for the subject of “unrequited love,” the algorithm would search the blocks of text within the document and weight headings and bolded titles higher (meaning that this section is specifically about “unrequited love”). Documents with such weighted items would score higher on the relevance scale.

Google Search Appliance
The Google Appliance, happily sitting atop its new family of Gossamer servers

The Solution

Let Google deliver us from our search woes!

Gossamer Links already has a strong search API, allowing you to choose from a database specific full text indexing system like MySQL or MS SQL, an internal algorithm developed by Gossamer Threads, or a non-indexed full table scan.

Google Search Appliance
Why yellow? We have no clue either.

We simply needed to add another search API to allow Gossamer Links to hook into the newly acquired Google Search Appliance, essentially outsourcing its search function. We put together a search driver system for the new hardware in under a week and found that it not only returned the results quicker, the relevance of the results was greatly increased – exactly what we wanted!

How it works

On Coursework, the new search driver hooks into Gossamer Links’ plugin system and is invoked whenever a document is added, updated or removed. Documents go through a multi-step validation process; checking for document integrity, plagiarism and document relevance before becoming available to users on the site. The search driver is aware when a document is made live (or removed from the public view) and updates the Google Box appliance accordingly using XML requests.

Google Search Appliance Workflow - Search Driver
Search Driver Workflow

When users are searching, we proxy the user’s query to the Google appliance, parse the information, and format the XML response into the standard look of the website.

Google Search Appliance Workflow - User Interaction
User Interaction Workflow

Hosted at the Gossamer Threads data centre, Coursework.info has blazing fast search times. Have a look at the Coursework site and see how fast the search is (in 148,000+ documents – that’s over 40gb of essays and papers!).

Interested?

The Google Search Appliance is a sophisticated piece of technology, but with our experience and technical know how, we can help you set up an incredible Google-powered search function.

Feel free to contact us to discuss how we can customize a Google search solution for you.

More Info:
More on the Google Search Appliance »
More about Gossamer Host »