Topodia
Topodia is a search engine that allows users to specify a few example documents as a query in order to find additional similar documents. It works by extracting terms from the example documents. Those terms are used to create “topic profiles”, which can then be used as queries in any search engine for finding similar documents. Our novel technical contribution is how the new queries are generated from the extracted terms. Topodia also stores documents and topic profiles generated by different users, thus supporting sharing of search results.
We have built Topodia for users who want to build a collection of documents on a given topic, for example finding web pages, research papers, or patents on a topic of interest, be it Dalmatian dog breeders, mutations relevant to autism, micro-mechanics, or treatment of multiple myeloma. Most current search technology assumes that a user is searching for a single document, rather than trying to build a collection of documents. This leads to a particular user interface and search strategy, which are very different from the ones we employ. Search engines such as Google,Yahoo!, Lexis or Entrez-pubmed take user queries, give responses, and let users then alter their queries repeatedly until they find the document they want, but do not explicitly support finding and browsing collections of documents organized by topic.
Many tasks involve collecting multiple web pages or other documents on a topic. A central activity in that process is what we call the “find more” action: given the current set of documents on a given topic, find more similar documents. This activity is widely used. When looking for a hotel in Paris, one might want to find more similar hotels; when deciding what papers to cite when writing a paper one wants to find more similar papers (we use Topodia, by just feeding it any paper we are writing); when looking for a consultant to help solve a problem, one wants to find more similar consultants to the current list one has. Similarly, one often wants to choose among different software programs that do some task such as PDF to text conversion; finding more reviews and blogs about software with a given functionality can be highly useful.
Furthermore, one would hope that the work that one person has done in collecting documents relevant to a given topic would be “published” and shared with others. This is sometimes the case, as in references in scientific publications or in Wikipedia, or when people add tags to documents in systems such as Technorati (technorati.com) or Flickr (www.flickr.com), but is often not done on the results of web searches, as it generally requires additional effort to publish the results. Topodia eliminates this additional effort; it is set up so that the collections of documents (found by repeated applications of “find more” on a topic) are available for search and hence reuse by others[1]
Topodia is a tool for semi-automated gathering and sharing of sets of documents on a topic. Key to Topodia’s performance is intelligent use of the fact that Topodia’s primary action is to find more documents similar to the current set. Topodia lets users specify documents as a query, and automatically generates and executes search queries that return a set of documents judged most likely to be on the same topic as those in the query. Topodia can then iterate based on user feedback either on the query terms used or on the documents returned. By using modern term extraction techniques and contrasting the terms in the query documents with those on random web documents, we generate queries that are highly accurately targeted to the topic of interest. The result of a user interaction is both a set of query terms and a set of documents on the desired topic, which can then be stored for future use or shared with other users.
[1] Topodia allows access control, so that one can also build collections that are not publicly available.

Topodia