Friday, March 12, 2010

Low-level SolrNet

I recently got a question about how to handle multi-faceting in SolrNet, a nice feature of Solr that can be very useful to the end-user. eBay uses a kind of multi-faceting interface.
If you know nothing about Solr or SolrNet, read on, this article isn't so much about Solr as API design.

The Solr wiki has an example query with multi-faceting:


For those of you that are not into Solr, this is just a regular URL query string that is passed to the Solr endpoint. The final URL looks like this (modulo encoding):


And this is how you represent this query in the SolrNet object model:

var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Document>>(); 
ISolrQueryResults<Document> results = solr.Query("mainquery", new QueryOptions { 
    FilterQueries = new[] { 
        new LocalParams {{"tag", "dt"}} + Query.Field("doctype").Is("pdf") 
    Facet = new FacetParameters { 
        Queries = new[] { 
            new SolrFacetFieldQuery(new LocalParams {{"ex", "dt"}} + "doctype") 

We build object models like this one because they're programmable, objects and methods can be programmatically combined to build (or compose) our intention. Like Hibernate's Criteria API.

Opposed to this is the string, and most of the time we hate it because it's opaque, it doesn't have any syntactical meaning within our object-oriented code. It has no programmability, no composability. We use very generic classes to build strings, like StringBuilders or StringWriters, which don't convey any syntactical information about what we're actually doing. If we need to extract information from a string, we have to write a parser, which is not a trivial task. But the string also has its advantages: it's naturally serializable (or should I say already serialized), it can be more readable and more concise. And those are some of the reasons why Hibernate also provides the HQL API. You might be thinking that this dichotomy of objects and strings is really a matter of serialization and deserialization, but I'm talking about human-readable strings here, whereas a serialized format is frequently for machine consumption only.

So if we already know what the query string is, how can we simplify the chunk of code above? Thanks to IoC, we can easily tap into some of SolrNet's "internal" components without worrying about what dependencies they need:

Func<string, string, KeyValuePair<string, string>> kv = (k, v) => new KeyValuePair<string, string>(k, v); 
var connection = ServiceLocator.Current.GetInstance<ISolrConnection>(); 
var xml = connection.Get("/select", new[] { 
    kv("q", "mainquery"), 
    kv("fq", "status:public"), 
    kv("fq", "{!tag=dt}doctype:pdf"), 
    kv("facet", "on"), 
    kv("facet.field", "{!ex=dt}doctype"), 
var parser = ServiceLocator.Current.GetInstance<ISolrQueryResultParser<Document>>(); 
ISolrQueryResults<Document> results = parser.Parse(xml); 

ISolrConnection is just a wrapper over the HTTP request, we give it the querystring parameters and get Solr's XML response, then we feed the response to the parser component and voilà, we have our results.

And since it's just a regular HTTP request, we can go even lower:

using (var web = new WebClient()) { 
    var xml = web.DownloadString("http://localhost:9983/solr/select/?q=mainquery&fq=status%3Apublic&fq=%7B!tag%3Ddt%7Ddoctype%3Apdf&facet=on&facet.field=%7B!ex%3Ddt%7Ddoctype");
    var parser = ServiceLocator.Current.GetInstance<ISolrQueryResultParser<Document>>(); 
    ISolrQueryResults<Document> results = parser.Parse(xml); 

I'll leave it to you to decide which one to use. Like the choice between HQL and Criteria, sometimes you might prefer one over the other depending on the context. Just keep in mind that these components' interfaces are not as stable as the "really public" documented interfaces, they might have breaking changes more often.


David Craft said...

Excellent stuff.. Now i really have no excuse for downloading the latest SolrNet. I had to write my own code for this.. So it will be interesting to see how our code differs :)

rajinimaski said...

Truly a nice one :) Please include how multifaceting NESTED queries work...
It would indeed be great...

Anonymous said...

Hi Mauricio

Enjoyed the article but can you tell me if it is possible to represent the following multi-facet query string (2 facet values selected from the doctype facet field) using the SolrNet LocalParams class or does it require the "internal" component approach?

q=mainquery&fq=status:public&fq={!tag=dt}doctype:(pdf OR txt)&facet=on&facet.field={!ex=dt}doctype

mausch said...

@rajinimaski: sure, post the full details in the google group.

mausch said...

@Anonymous: sure, use || to build your OR query. Please use the SolrNet google group for further questions about this.

Theodor said...

This is exactly what i was looking for. Good stuff!

Anonymous said...

Nice done!!! But I've got a problem by building the query by using solrnet. When I use the line " Query.Field("status").Is("public")" I always get a bad request. Without this line every thing is working nearly perfectly but the value of the facetes is not true.

Thanks for your work!

Mauricio Scheffer said...

@Anonymous: please use the mailing list for support questions:

Anonymous said...


First thanks for the post which allows you how supports multi facet query with constraint.

Related to this example i've tried to make it works but it still not working, could someone help me to find the mistake ?

Here is my solr query :

facet=true,sort=publishingdate desc,facet.mincount=1,q=service:1 AND publicationstatus:LIVE,facet.field={!ex=dt}user,wt=javabin,fq={!tag=dt}user:10,version=2

Thanks in advance for answers, David.

Mauricio Scheffer said...

@David: please use the mailing list for support questions:

Vana bechtold said...

Hi i cannot find these two methods
ISolrQueryResults in solrnet. could you please let me know where is it? thank you

LokeshGupta said...

Your Example is not working properly if i want to add my database it always gives 404 remote not found error. I want to implement it in my project please explain.

LokeshGupta said...

Your example is not working properly when i connect it to my database it will give 404 remote server not found error. Please explain how to implement it.

Mauricio Scheffer said...

@LokeshGupta please use the mailing list for support questions: