Monday, June 16, 2008

My tiny test with Powerset


As a small, anecdotal evaluation of Powerset, I tried two queries. On both queries, Powerset did not do as well as Google, yet there are promising signs of what PowerSet can do.

The first query was a variation of Ron Kaplan's example (see previous post) - to see who John McCain praised I used the query "McCain praises". The top two snippets showed that McCain praised Bush in 2004. The other snippets are somewhat problematic - the lexical parser seems to have ignored word ordering: "Tom Coburn [...] praising McCain" was returned as a result, so are "McCain was promoted" and "McCain's bill". The highlighting suggests that the words "promoted" and "bill" were matched with "praises". If done right, this could be a tremendous boost in recall.

In comparison, Google returned two recent instances of McCain praising someone. In June 2008, McCain praised Hillary and Jindal, the Governor of Louisiana. It appears that Google beat out Powerset, 2-1.

If you look closely at the top section of Powerset's result page, you will see a section under Factz: McCain praised Tatopoulos. Now who is this Tatopoulous? Unfortunately for Powerset, this factoid was taking from an article on the film Outlander. Interesting? Yes; Relevant? Not really.

Names can be very ambiguous. "Michael Jordan" usually refers to arguably the best basketball player who ever lived; yet in the information retrieval world, that name sometimes refer to a professor at Berkeley. In the same way, "McCain" refers to two persons. Can a system like Factz do a better job? Perhaps enumerate various ambiguities by tabs? That could be quite useful.

The second query was, I thought, a bit more tricky. In Gilbert and Sullivan's play, the Yeomen of the Guard, Colonel Fairfax married Elsie, but Elsie thought Fairfax was another guy, Leonard. With such a mess, perhaps the brains in Powerset will do a better job. My query, "who did fairfax marry in yeomen of the guard" is phrased in a question, which Powerset is built to handle. But yet again, Google came up with better results. The forth and fifth results, are relevant: "Elsie agrees to be Fairfax's bride", "The disguised Fairfax discovers that it is Elsie that he has married". Powerset, on the other hand, did not even give any female character's name from the play.

That may seem a bit pathetic, but notice that Google's results do not come from Wikipedia. Since Powerset only indexed Wikipedia, it can only do as good as Wikipedia. Google did not return anything from Wikipedia for that query on the first page, so perhaps that Wikipedia article was not very helpful. Clicking through from Powerset's result, a fancy outline of the Wikipedia article showed up on the right. In the bottom of that outline tool, there is a search bar - I typed in the word "marry", and sections on the page became highlighted. That by itself is nothing new, but if you look closer, the words highlighted are not all "marry" - instead, words such as "wedding", "marriage", and "wed" are matched. That is neat.

2 comments:

Anonymous said...

Cool post! I just wanted to note the fact that 'McCain praises' is really ambiguous (praise is also a noun), and that it can truly mean praises given to or given by McCain - so in that sense Powerset is trying to give you a little of both worlds in matching correctly either case.

Lorenzo
thione@powerset.com

Mark Johnson said...

Another follow up. "Whom did mccain praise?" gets the results you want. Also, for a fair comparison against Google, you should do a site restricted search (site:en.wikipedia.org)

{mark} powerset product manager