Thursday, February 10, 2011

Running Solr on .NET

Many people are reluctant to use Solr in .NET applications because it requires installing a Java runtime. It might complicate deployment for an otherwise all-Microsoft application, which is one of the reasons why the Stackoverflow team chose to use Lucene.NET instead. Personally, I think the advantages of Solr, like:

  • Much less code to develop and maintain compared to an in-house implementation of features like faceting, caching, suggestions, etc
  • Interface simpler than Lucene, yet flexible and powerful.
  • Web console.
  • Easy scaling.

(just to name a few), more than make up for the deployment hassle, at least for my applications. In other words, I think the problem of deploying Solr is easier than the problem of coding faceting, caching, suggestions, etc over Lucene(.net) efficiently. I'd rather not reinvent this wheel.

But I understand that things would be simpler if Solr just ran somehow directly on .NET. I see three ways to make that happen:

  1. Manually port the source code to .NET. Many well-known projects ported from Java took this path: log4net, Quartz.NET, NHibernate, Lucene.NET, GitSharp to name a few.
  2. Automatically port the source code to .NET. NGit and db4o use this approach, using Sharpen.
  3. Automatically convert Java bytecode to .NET IL.

All these options have their pros and cons:

Option 1 gives you the most freedom but requires the most effort, both for the initial port and for keeping up with the original code.

Option 2 still requires manual tweaks (Lluis mentions modifications both to Sharpen itself and its generated code), and generates java-ish code, whereas in the first approach you're free to translate things as you see fit. It may be convenient to write a usability layer on top of the generated code.

Option 3 is the most automatic of them all. Using IKVM you can just get the JARs, run them through ikvmc and get .NET DLLs as a result. It requires the least effort, and almost no coding at all (you still might have to wrap some Java-specific APIs). However it also requires having the whole IKVM runtime and OpenJDK at runtime which can be cumbersome and may have some negative performance impact.

In the particular case of Solr, the IKVM+OpenJDK dependency isn't really a big deal, because it's a server, not a library, and it's easily xcopy-able and not as heavy as a full Java runtime (for a standard Windows install, that is). Sure, Solr can also be embedded in Java apps, but not even the Solr team recommends it for general usage.

I actually started experimenting with Solr and IKVM back in June 2009. I ran into ClassLoader issues, and since I didn't really need it, I gave up. I've seen many people express interest in having Solr running on .NET, and I did point them to my incomplete solution, yet nobody did anything (I'm actually a bit ticked off by this).

Anyway, a few days ago I accidentally bumped into this article by Kevin Miller where he explains how to use IKVM to run Tika on .NET. I'm not (yet) interested in Tika itself, but the article also contained the solution for my ClassLoader problems.

So I got back to working on this and actually got it to a usable state. It has four modes of operations:

  1. Embedded, just like you would embed it in a Java app
  2. Server, using the embedded Jetty (launched from command-line)
  3. Server, as Windows Service (also using the embedded Jetty)
  4. Server, as a IHttpHandler, so it can be hooked up to IIS as a regular ASP.NET app.

The ASP.NET mode still needs some work: queries work, but updates don't. There's probably something wrong in the Servlet-ASP.NET adapters.

Honestly, I haven't tested it much. I still don't need it, so I don't intend to put much time on it. I haven't done any performance tests. I don't expect this to be as fast as the original Java Solr of course, since there's the whole IKVM runtime and OpenJDK in the middle, but I do believe it will have acceptable performance.

Consider this a call for contributors: SolrIKVM is currently quite usable, but I wouldn't call it production-quality yet. If you really want Solr on .NET, fork the project, try it, test it, fix the issues. Make it happen. Feel free to ask me any questions about it.

By the way, huge kudos to Jeroen Frijters, it's pretty awesome that IKVM can run Jetty and Solr like this.

Project is on github.

13 comments:

MikeEast said...

I feel I want to get busy porting it.

It would be a lot of work, but it would be a great learning experience of the inner workings of solr.

Let's do it!

Edin said...

I really don't see why installing java runtime is such a big deal. Also, I think that keeping Lucene.NET up-to-date is more important than porting Solr to .NET

Mauricio Scheffer said...

@MikeEast: if you have the time and willingness to take on such a project, then by all means go ahead! Note that it's a different project than the one I describe here.

Mauricio Scheffer said...

@Edin: Installing a Java runtime is not a problem to me either, and that's why I don't plan to spend much time on this IKVM port. But I know many people could be interested in this.

Debbs said...

@Edin: Were you able to setup solr on shared windows hosting (IIS7)?

Zac said...

Do you think an appropriate usage of the embedded mode would be for running automated integration tests? I am currently having my integration tests fire up a jetty server which is pretty slow and hacky.

Mauricio Scheffer said...

@Zac: yes, absolutely. Although right now the only way to talk to embedded Solr is through SolrJ.

Zac said...

ok that would be cool, I'll be keeping an eye on this project - let me know if you want any help testing. Thanks again for all the cool stuff you do.

Marcus Granström said...

I have noticed that a lot of customers I have sometimes are reluctant to using Solr as well. The concern they have are always with maintaining it. This is when they have In my opinon the best would be to port Solr to .Net. Ofc a lot of work to do it. After that I do not agree to the fact that you have to keep it up dated with "Solr" can become a different product and go it's own way.

Thx for creating Solrnet. Great Job on that.

Mauricio Scheffer said...

@Marcus: well yes, that's why I chose to do this with IKVM: it has practically zero maintenance, and requires very little initial effort. In fact I updated SolrIKVM from 1.4 to 3.1 in one hour. If you go the way of source code porting it would require orders of magnitude more effort.
If you're interested in making this production-quality, take a look at the issues.

Sanket Shirgaonkar said...

what is deduplication ? and How to achieve solrnet deduplication in asp.net application with sql server as database.

Sanket Shirgaonkar said...

what is deduplication? and how to achieve it is solrnet with asp.net and sql server application.

Anonymous said...

Hi MAURICIO SCHEFFER,

Below is my class

public class Product
{
[SolrField("Sku")]
public string Sku { get; set; }

[SolrField("SourceId")]
public int SourceId { get; set; }

[SolrField("Status")]
public string Status { get; set; }

[SolrField("LastModifiedDateTimeUtc")]
public DateTime LastModifiedDateTimeUtc { get; set; }

[SolrField("Fields")]
public Fields Fields { get; set; }

[SolrField("ListFields")]
public ListFields ListFields { get; set; }

[SolrField("ListCultureFields")]
public ListCultureFields ListCultureFields { get; set; }
}

It is Deserialize successfully but when i add into solr using solr.add(product). Not properly load document in solr.

Please help me.