Friday, March 19, 2010

Proxying and parallelizing processes

Some code just flat out refuses to run multi-threaded. Like GeckoFX. It's a great project, and I found it to be much more reliable than WebBrowser (aka IE), but it just won't run multi-threaded (or at least me and several other people haven't figured out how)

I had to write some CPU-intensive, non-interactive code involving GeckoFX, so parallelization was a must. Well, when multi-threading won't fly, multi-processing (as in launching code on a separate process instead of a separate thread) can be a viable alternative. This does complicate RPC a bit but we can tuck this under a proxy that serializes parameters and then gets the return value through a named pipe:

public class ProcessInterceptor : IInterceptor {
    ... 

    public void Intercept(IInvocation invocation) {
        var pipename = Guid.NewGuid().ToString();
        var procArgs = new List<string> {
            Quote(invocation.TargetType.AssemblyQualifiedName),
            Quote(invocation.MethodInvocationTarget.Name),
            pipename,
        };
        procArgs.AddRange(invocation.Arguments.Select(a => Serialize(a)));
        var proc = new Process {
            StartInfo = {
                FileName = "runner.exe",
                Arguments = String.Join(" ", procArgs.ToArray()),
                UseShellExecute = false,
                CreateNoWindow = true,
            }
        };
        using (var pipe = new NamedPipeServerStream(pipename, PipeDirection.In)) {
            proc.Start();
            pipe.WaitForConnection();
            var r = bf.Deserialize(pipe);
            r = r.GetType().GetProperty("Value").GetValue(r, null);
            proc.WaitForExit();
            if (proc.ExitCode == 0) {
                invocation.ReturnValue = r;
            } else {
                var ex = (Exception) r;
                throw new Exception("Error in external process", ex);
            }
        }
    }
}

And that "runner.exe" thing is the host, just a console app that is responsible for deserializing parameters, calling the method, managing exceptions and then send back the return value (if any):

public class Runner { 
    ... 
    public static int Main(string[] args) { 
        var pipename = args[2]; 
        using (var pipe = new NamedPipeClientStream(".", pipename, PipeDirection.Out)) { 
            pipe.Connect(); 
            try { 
                var type = Type.GetType(args[0]); 
                var method = type.GetMethod(args[1]); 
                var instance = Activator.CreateInstance(type); 
                var parameters = args.Skip(3).Select(p => lf.Deserialize(p)).ToArray(); 
                var returnValue = method.Invoke(instance, parameters); 
                bf.Serialize(pipe, new Result { Value = returnValue }); 
                return 0; 
            } catch (Exception e) { 
                bf.Serialize(pipe, new Result { Value = e }); 
                return 1; 
            } 
        } 
    } 
}

And now we can parallelize. Here's a silly example (can't post the actual GeckoFX process, it's proprietary stuff):

public class TargetCode { 
    public virtual int Add(int a, int b) { 
        return a + b; 
    } 
}

[Test] 
public void Parallel() { 
    var generator = new ProxyGenerator(); 
    var t = generator.CreateClassProxy<TargetCode>(new ProcessInterceptor()); 
    var r = Enumerable.Range(0, 100).AsParallel().Sum(i => t.Add(i, i)); 
    Assert.AreEqual(9900, r); 
}

This will launch a separate process for each iteration. On a dual-core CPU, the Task Parallel Library will launch by default at most two threads to run in parallel, so you would have at most two runner.exe instances running at the same time, thus achieving multi-process parallelism.

Now this is not a general solution, it worked for my specific usecase but it has several caveats:

  • Doesn't support generic or overloaded methods (it shouldn't be hard to implement)
  • Target code must be interceptable (virtual, non-sealed, etc)
  • Target code must have parameterless constructor (it shouldn't be hard to lift this restriction)
  • Method parameters are passed through command-line so they can't be very long. (it shouldn't be hard to lift this restriction)
  • Target code should be practically stand-alone since the host won't have the same app.config as it parent, nor any other previous initialization, etc.
  • The target code should be sufficiently long-running to justify the overhead of proxying, reflection, serialization and process launching.

Full code is here.

No comments: