Monday, October 12, 2009

Untangling the mess: Solr, SolrNet, NHibernate, Lucene

I've recently received several questions about the relationship between Solr, SolrNet, NHibernate, Lucene, Lucene.Net, etc, how they fit together, how they should be used, what features does each provide. Here's an attempt at elucidating the topic:

Let's start from the bottom up:

  • RDBMS: every programmer knows what these are. Oracle, SQL Server, MySQL, etc. Everyone uses them, to the point that it's often used as a Golden Hammer. RDBMS can be stand-alone programs (client-server architecture) or embedded (running within your application).
  • Lucene was written to do full-text indexing and searching. The most known example of full-text searching is Google. You throw words at it and it returns a ranked set of documents that match those words.
    In terms of data structures, Lucene at its core implements an inverted index, while relational databases use B-tree variants. Fundamentally different beasts.
    Lucene is a Java library, this means that it's not a stand-alone application but instead embedded in your program.
  • Full-text functions in relational databases: nowadays almost all major RDBMS offer some full-text capabilities: MySQL, SQL Server, Oracle, etc. As far as I know, they are all behind Lucene in terms of performance and features. They can be easier to use at first, but they're proprietary. If you ever need some advanced feature, switching to Lucene could be a PITA.
  • Lucene.Net is a port of Java Lucene to the .Net platform. Nothing more, nothing less. It aims to be fully API compatible so all docs on Java Lucene can be applied to Lucene.Net with minimal translation effort. Index format is also the same, so indices created with Java Lucene can be used by Lucene.Net and vice versa.
  • NHibernate is a port of Java Hibernate to the .Net platform. It's an ORM (object-relational mapper), which basically means that it talks to relational databases and maps your query results as objects for easier consumption in object-oriented languages.
  • NHibernate.Search is a NHibernate contrib project that integrates NHibernate with Lucene.Net. It's a port of the Java Hibernate Search project. It keeps a Lucene index in sync with a relational database and hides some of the complexity of raw Lucene, making it easier to index and query.
    This article explains its basic usage.
  • Solr is a search server. It's a stand-alone Java application that uses Lucene to provide full-text indexing and searching through a XML/HTTP interface. This means that it can be used from any platform/language. It can be embedded in your own Java programs, but it's not its primary design purpose.
    While very flexible, it's easier to use than raw Lucene and provides features commonly used in search applications, like faceted search and hit highlighting. It also handles caching, replication, sharding, and has a nice web admin interface.
    This article is a very good tour of Solr's basic features.
  • SolrNet is a library to talk to a Solr instance from a .Net application. It provides an object-oriented interface to Solr's operations. It also acts as an object-Solr mapper: query results are mapped to POCOs.
    The latest version also includes Solr-NHibernate integration. This is similar to NHibernate.Search: it keeps a Solr index in sync with a relational database and lets you query Solr from the NHibernate interface.
    Unlike NHibernate and NHibernate.Search, which can respectively create a DB schema and a Lucene index, SolrNet can't automatically create the Solr schema. Solr does not have this capability yet. You have to manually configure Solr and set up its schema.


In case this wasn't totally clear, here's a diagram depicting a possible NHibernate-SolrNet architecture:

Diagram made with gliffy!

5 comments:

mygamebest said...
This comment has been removed by a blog administrator.
marek said...

Hi,

I really like the SolrNet idea, especially the integration with NHibernate.
I'm new in Solr world, so please let me know if I'm doing something really stupid.

First problem: I can't delete document.
I tried today to get it working and I could insert and update new document mapped from my very simple POCO.
Unfortunately I could not delete the document. I'm getting null reference in SolrNetListener, line 57. ITransaction s and T entity seem to be fine, and not null.

Second problem:
I tried to insert a little bit more realistic object, with collection of other objects and this is not working as I would expect. It is not indexing properly (maybe not serializing properly) and not mapping properly back to POCO when I get the results from solr.

Do you have any examples I could take a look with NHibernate integration and SolrNet? If possible with some realistic scenario, eg. blog, post, comment.

Thanks in advance.
Marek

mausch said...

Hi Marek, NHibernate integration is currently not on the sample app. Also, the integration as it is now is intended for entities that very similarly mapped on NH and SolrNet, so it is quite limited. Keep in mind that information has to be denormalized before storing on Solr, it's not a RDBMS. About the NRE, please post the full exception stack trace to the google group

amjed said...

simple yet covers the subtle differences ...

kieran said...

Thanks for the catelogue of the various technologies. Using Lucene.Net on Azure at present and we were trying to figure out the pros and cons of adding Solr to that mix