Wednesday, July 09, 2008

Tool of the Month

The best applications not only solves real problems, they put a smile on your face. Here is a fantastic resignation letter generator for yahoo.

Honorable mention: PicLens

Friday, June 27, 2008

Bill Gate's last day

Today is Bill Gate's last day at Microsoft. New York Times has a nice article. Hate him or love him, the man has delivered solutions for people all around the world. And to dedicate his earnings and his time and energy to philanthropy, I for one admire him.

In a loosely related article, Wall Street Journal pokes fun at the relationship between Microsoft, Yahoo, and Google. A delightful read.

Friday, June 20, 2008

Google speller needs a grammar lesson?


Google is amazing. Among other things, their speller does a great job. Which is why it was shocking to me that the other day, when I was comparing Google and Powerset, I found an obvious error.

My query "who did leonard marry in yeomen of the guard" got the speller to suggest "who did leonard married in yeomen of the guard". Even a third-grader knows that that is wrong. While plain old n-gram may not fix that (bi-grams almost certainly cannot), the resulting document count may help: the original query had 92100 results, and the suggestion only 9640.

Monday, June 16, 2008

My tiny test with Powerset


As a small, anecdotal evaluation of Powerset, I tried two queries. On both queries, Powerset did not do as well as Google, yet there are promising signs of what PowerSet can do.

The first query was a variation of Ron Kaplan's example (see previous post) - to see who John McCain praised I used the query "McCain praises". The top two snippets showed that McCain praised Bush in 2004. The other snippets are somewhat problematic - the lexical parser seems to have ignored word ordering: "Tom Coburn [...] praising McCain" was returned as a result, so are "McCain was promoted" and "McCain's bill". The highlighting suggests that the words "promoted" and "bill" were matched with "praises". If done right, this could be a tremendous boost in recall.

In comparison, Google returned two recent instances of McCain praising someone. In June 2008, McCain praised Hillary and Jindal, the Governor of Louisiana. It appears that Google beat out Powerset, 2-1.

If you look closely at the top section of Powerset's result page, you will see a section under Factz: McCain praised Tatopoulos. Now who is this Tatopoulous? Unfortunately for Powerset, this factoid was taking from an article on the film Outlander. Interesting? Yes; Relevant? Not really.

Names can be very ambiguous. "Michael Jordan" usually refers to arguably the best basketball player who ever lived; yet in the information retrieval world, that name sometimes refer to a professor at Berkeley. In the same way, "McCain" refers to two persons. Can a system like Factz do a better job? Perhaps enumerate various ambiguities by tabs? That could be quite useful.

The second query was, I thought, a bit more tricky. In Gilbert and Sullivan's play, the Yeomen of the Guard, Colonel Fairfax married Elsie, but Elsie thought Fairfax was another guy, Leonard. With such a mess, perhaps the brains in Powerset will do a better job. My query, "who did fairfax marry in yeomen of the guard" is phrased in a question, which Powerset is built to handle. But yet again, Google came up with better results. The forth and fifth results, are relevant: "Elsie agrees to be Fairfax's bride", "The disguised Fairfax discovers that it is Elsie that he has married". Powerset, on the other hand, did not even give any female character's name from the play.

That may seem a bit pathetic, but notice that Google's results do not come from Wikipedia. Since Powerset only indexed Wikipedia, it can only do as good as Wikipedia. Google did not return anything from Wikipedia for that query on the first page, so perhaps that Wikipedia article was not very helpful. Clicking through from Powerset's result, a fancy outline of the Wikipedia article showed up on the right. In the bottom of that outline tool, there is a search bar - I typed in the word "marry", and sections on the page became highlighted. That by itself is nothing new, but if you look closer, the words highlighted are not all "marry" - instead, words such as "wedding", "marriage", and "wed" are matched. That is neat.

Wednesday, June 11, 2008

PowerSet - a recent talk

Recently I attended a talk by PowerSet's chief scientist, natural language processing guru, Ron Kaplan. He explained how usage patterns in natural languages (e.g. English) make bag-of-word search engines suffer in precision and recall. For example, to search for "who Obama critized" one may use the query words "obama critized". In that case, a bag-of-word search engine (e.g. Google) would return documents with the phrases "Hillary critized Obama" and "McCain critized Obama" - these are cases when a search engine returned results that were irrelevant,  a precision problem. In other cases, synonyms for "critized" are not matched, so documents with "Obama rejected" are not returned - this means relevant results are not returned, a recall problem.

He then went on to discuss what technologies PowerSet is built on: Lexical Functional Grammar, Transfer/Glue Semantics, Finite state morphology - technologies to learn/detect structure from free-form documents.  Crowd-sourced, structured datastreams are also used - freebase and wikipedia come to mind. By acquiring content (indexing terms with lexical/semantic information), packing ambiguity into compact states, a new generation of software is ready to improve for a wide range of tasks: summarization, question answering, translation, entity extraction... As for search, with less than 1 second in query response time, and less than 1 second per sentence in indexing time, he thinks natural language search is ready for prime time. Or at least, ready to convince investors to put in more money.

A handful of my co-workers also went to the talk. Most of them were underwhelmed. "Sure it was a good talk, but I heard it two years ago," said one. "I tried something similar to his example and it didn't work," said another. To be fair, PowerSet has been over-hyped - some thought it would be a Google Killer. And by only indexing wikipedia, it would not be far-fetched to say they under-delivered. However, in terms of bringing innovation to the table, and trying to improve search experience, I like their efforts. I won't put money on them yet, but I'll be watching.

Sunday, February 17, 2008

Guns and Democracy

As a democracy matures, she must let go of the diaper of violence. It may be true that the right to bear arms is still essential in the early days of a free State in the 21st century. Yet as the people of the State become more accustomed to the machinery of democracy, and as revolution by election becomes routine, tolerance for violence must be adjusted. In a mature democracy, where dissent is respected and encouraged in the chambers of debate, and efficient checks and balances are in place to prevent lunatic tyrants, proponents that fail to win the popular vote must not be allowed to infringe on anyone's right to life. As such, the right to bear arms by individuals would have served its purpose, and must be allowed to retire.