Sunday, June 17, 2012

Ruby Multithreading using Subprocesses

Since most implementations of Ruby do not have native (OS-level) multithreading, a quick way to achieve parallelism is to refactor the parallel code into a separate ruby code. Then, in your main ruby code, you create threads that invoke the separate ruby code as a subprocess (by calling Kernel#system or IO::popen).

Of course, this strategy only makes sense for very few cases. In mine, I merely wanted to run the same piece of code on a massive number of files. No communication needed.

For example:

# Define the separate ruby code
PROCESS_FILE_BIN = "/home/me/bin/process_file.rb"

# Get files to process
files = get_list_of_files

# Setup mutex
mutex = Mutex.new

# Run up to num_processes
threads = []
num_processes.times {
  threads << Thread.new {
    loop do
      # Get file
      file = nil
      mutex.synchronize do
        file = files.pop unless files.empty?
      end

      # Break if done
      break if file.nil?

      # Process file
      system(PROCESS_FILE_BIN + " " + file)
    end
  }
}

# Wait for threads to finish
threads.each { |t|
  t.join
}