Wednesday, April 14, 2010

Which term?

Sometimes I can't decide between using one term over the other. For example, is it multitouch or multi-touch? Potato or potato? Okay, the last example is a joke, but you know what I mean...

Google can help! Just search the two terms in Google and see which query gives you more results. There are two issues:
  1. The first is fairly obvious: you need to add quotes to the terms you're searching. Otherwise, you won't be searching for exact matches.
  2. Google often re-writes your query and show those results instead of your original. To disallow this, add nfpr=1 to the search URL.
With the two above, voila! A Ruby script to compare two terms:
#!/usr/bin/ruby

require 'net/http'
require 'uri'
require 'cgi'

NUM_RESULTS_REGEX = /#{"<p id=resultStats>&nbsp;Results"+
" <b>\\d+<\/b> - <b>\\d+<\/b>"+
" of about <b>([\\d,]+)<\/b> for <b>"}/
SEARCH_URL = "http://www.google.com/search?nfpr=1&q="

if ARGV.size != 2
$stderr.puts "Usage: ./which_usage.rb \"term 1\" \"term 2\""
exit(1)
end

ARGV.each { |q|
puts %Q{"#{q}": #{
(Net::HTTP.get URI.parse(
SEARCH_URL+CGI.escape(%Q{"#{q}"})
)).scan(NUM_RESULTS_REGEX).first.first} results.}
}
Enjoy!

Wednesday, April 7, 2010

ForeCite JCDL 2010 Poster

To build on my previous post about ForeCiteNote, a poster paper on ForeCite will be presented in this year's Joint Conference on Digital Libraries (JCDL 2010):

Thuy Dung Nguyen, Min-Yen Kan, Dinh-Trung Dang, Markus Hänse, Ching Hoi Andy Hong, Minh-Thang Luong, Jesse Prabawa Gozali, Kazunari Sugiyama and Yee Fan Tan (2010). ForeCite: towards a reader-centric scholarly digital library. To appear in the Joint Conference on Digital Libraries (JCDL '10). Gold Coast, Australia, June. Poster.
The poster should be available soon via the ACM Digital Libraries or the WING publications page.

Tuesday, April 6, 2010

Matlab: Clear and Rehash

I'm having some problems with the new single-session Matlab solution Im using for my experiments. For some reason, random errors seem to pop up here and there. Variables that should be defined are not found by Matlab.

At first, I thought its a time sync issue, so I added some pause(n) statements here and there. This helped a bit but did not eliminate the problem. I later realized that it could be the variable definitions. So now, at the start of every iteration, I cleared my variables (I probably should have done this to begin with). Unfortunately, this does not solve the problem either.

Later I realized that it could be a problem with the path. So now I also issue a rehash path command.

That should do it.

Monday, April 5, 2010

Matlab @ Angsana

Today is about 1 week from the ACM MM 2010 deadline and I was having problems running Matlab on angsana.comp. The issue lies with the Matlab FLEXnet license manager. Well, not really. Apparently at this time, a lot of people seem to require Matlab. As a result, the maximum number of users for Matlab was reached and there is nothing that the SoC helpdesk (tech services) can do about it (Yes, I've emailed them a few times).

My current experiments (read: predicament) is such that my database is hosted on one server, but running the experiments require Matlab and need to be done on angsana. My previous solution involves running a Ruby script to transfer the required data from one server to angsana via scp, remotely run my .m files on the data, and then to scp back the results for parsing. This solution was a quick hack so each new call actually (inefficiently) makes a new ssh connection and hence a new Matlab session. As a result, Matlab needs to be constantly available for all the calls to work.

With the licensing issue, I had to rework the script to use one SSH connection and hence one Matlab session. After an hour or two, the rework is done. And just in case, I added a safeguard for any runtime exceptions to pause program execution. This way I can immediately start a Matlab session as soon as my program exits (relinquishing its hold on one of the Matlab slots).

In summary: