Jesse's Research: June 2012

Sunday, June 17, 2012

Ruby Multithreading using Subprocesses

Since most implementations of Ruby do not have native (OS-level) multithreading, a quick way to achieve parallelism is to refactor the parallel code into a separate ruby code. Then, in your main ruby code, you create threads that invoke the separate ruby code as a subprocess (by calling Kernel#system or IO::popen).

Of course, this strategy only makes sense for very few cases. In mine, I merely wanted to run the same piece of code on a massive number of files. No communication needed.

For example:

# Define the separate ruby code
PROCESS_FILE_BIN = "/home/me/bin/process_file.rb"

# Get files to process
files = get_list_of_files

# Setup mutex
mutex = Mutex.new

# Run up to num_processes
threads = []
num_processes.times {
threads << Thread.new {
loop do
# Get file
file = nil
mutex.synchronize do
file = files.pop unless files.empty?
end

# Break if done
break if file.nil?

# Process file
system(PROCESS_FILE_BIN + " " + file)
end
}
}

# Wait for threads to finish
threads.each { |t|
t.join
}