Monday, December 13, 2010

Customizing SolrNet

One of the most voted enhancement requests for SolrNet (an Apache Solr client for .NET) right now is to add support for POSTing when querying.

Let me explain: queries are serialized by SolrNet and sent to Solr via HTTP. Normally, queries are issued with a GET request and the query itself goes in the query string part of the URL. A simple query URL might look like this: http://localhost:8983/solr/select?q=id:123 .

The problem arises when the query is too long to fit in the query string. Even though the HTTP protocol does not place any a priori limit on the length of a URI, most (all?) servers do, for performance and security reasons.

Here's a little program that reproduces this issue:

internal class Program {
    private const string serverURL = "http://localhost:8983/solr";

    private static void Main(string[] args) {
        Startup.Init<Dictionary<string, object>>(serverURL);
        var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
        solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
    }
}

This creates the query "id:0 OR id:1 OR ... OR id:999", it's about 10KB after encoding, more than enough for our tests. Running this against Solr on Jetty 6 makes Jetty throw:

2010-12-13 17:52:33.362::WARN:  handle failed 
java.io.IOException: FULL 
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) 
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) 
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) 
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) 
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Not very graceful... it should probably respond with 414 Request-URI Too Long instead of throwing like this, but clients shouldn't send such long URIs anyway.

Steven Livingston has a good blog post describing a patch modifying some classes in SolrNet to deal with this issue. However, even though I never foresaw this problem when writing SolrNet, solving it does not really require any changes to the existing codebase.

In this particular case, what we need to do concretely is override the Get() method of the ISolrConnection service and make it issue POST requests instead of GET. We can write a decorator to achieve this:

public class PostSolrConnection : ISolrConnection {
    private readonly ISolrConnection conn;
    private readonly string serverUrl;

    public PostSolrConnection(ISolrConnection conn, string serverUrl) {
        this.conn = conn;
        this.serverUrl = serverUrl;
    }

    public string Post(string relativeUrl, string s) {
        return conn.Post(relativeUrl, s);
    }

    public string Get(string relativeUrl, IEnumerable<KeyValuePair<string, string>> parameters) {
        var u = new UriBuilder(serverUrl);
        u.Path += relativeUrl;
        var request = (HttpWebRequest) WebRequest.Create(u.Uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";
        var qs = string.Join("&", parameters
            .Select(kv => string.Format("{0}={1}", HttpUtility.UrlEncode(kv.Key), HttpUtility.UrlEncode(kv.Value)))
            .ToArray());
        request.ContentLength = Encoding.UTF8.GetByteCount(qs);
        request.ProtocolVersion = HttpVersion.Version11;
        request.KeepAlive = true;
        try {
            using (var postParams = request.GetRequestStream())
            using (var sw = new StreamWriter(postParams))
                sw.Write(qs);
            using (var response = request.GetResponse())
            using (var responseStream = response.GetResponseStream())
            using (var sr = new StreamReader(responseStream, Encoding.UTF8, true))
                return sr.ReadToEnd();
        } catch (WebException e) {
            throw new SolrConnectionException(e);
        }
    }
}

Now we have to apply this decorator:

private static void Main(string[] args) {
    Startup.Init<Dictionary<string, object>>(new PostSolrConnection(new SolrConnection(serverURL), serverURL));
    var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

That's it! If you're using Windsor, applying the decorator looks like this:

private static void Main(string[] args) {
    var container = new WindsorContainer();
    container.Register(Component.For<ISolrConnection>()
        .ImplementedBy<PostSolrConnection>()
        .Parameters(Parameter.ForKey("serverUrl").Eq(serverURL)));
    container.AddFacility("solr", new SolrNetFacility(serverURL));
    var solr = container.Resolve<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

This is the real benefit of writing decoupled code. Not testability, but flexibility. Testability is nice of course, but not the primary purpose.
When your code is decoupled, you can even implement entire features mostly by rearranging the object graph. This is pretty much how I implemented multicore support in SolrNet.

The PostSolrConnection implementation above works with SolrNet 0.3.0 and probably also 0.2.3. PostSolrConnection is not the default because: a) it needs to be tested thoroughly, and b) Solr doesn't emit cache headers when POSTing so it precludes caching.

20 comments:

Khalid Abuhakmeh said...

Thanks, I'm trying this code out right now. It is strange though that the framework doesn't just include it by as a configurable option out of the box.

Thanks for your great work.

Mauricio Scheffer said...

@Khalid: extensible/composable > configurable.

Mun said...

Does this still work with the latest version of Solr (0.4.0.2002)? I'm getting the following error:

PostSolrConnection' does not implement interface member 'SolrNet.ISolrConnection.PostStream(string, string, System.IO.Stream, System.Collections.Generic.IEnumerable>)'

Mauricio Scheffer said...

@Mun: there have been breaking changes in 0.4.0

vivin joy said...
This comment has been removed by the author.
vivin joy said...

Hi.. I used ur above mentioned way of post, but when i try to send Foreign characters- its not properly decoded over Solr. In the sense- if i send "espaƱol", its decoded as "espaA+ol"
in Solr

vivin joy said...

Please use ContentType as "application/x-www-form-urlencoded; charset=utf-8" . Or else Solr doesn't parse non-ASCII characters

Anonymous said...

I have written a version for the new solrnet where i override PostStream the same way we overload Post but i am getting a StackOverflowException on the parameters.Select().ToArray() line. I really need to do this for SolrNet 0.4 please help.

Mauricio Scheffer said...

@Anonymous : post your code as a fork of the SolrNet repository on github so I can review it. Please always use the google group for questions about SolrNet.

Anonymous said...

Thanks Mario i will use the google group in future.
I have found my issue, I was somewhow creating a Filter Query that was causing the StackOverflowException.
Refactoring the construction of the query solved the problem.
The only change from your version here was to override PostStream and call conn.PostStream inside!

Alexey Kozhemiakin said...

Hi Anonymous, please share your changes here or commit them to github, we have the same issue and it will really help.

-Alexey

Alexey Kozhemiakin said...

No worries, I've just modified the code from blogpost myself. Problem solved.

Claudio said...

Hi Mauricio,

it's not working when you add cores. Then it still uses GET as Request Method. How can I fix this?

Mauricio Scheffer said...

Claudio, please post this on the SolrNet mailing list : https://groups.google.com/forum/#!forum/solrnet

Recbooks said...

Hi Maurico

When I use SolrNet with apachesolr4.0.0 Iam facing issues in sending the query,the query gets inserted instead of getting appended.
For eg
I get this
http://192.168.97.170:8080/solr//select?q=(productNativeId:88863)&rows=100000000&version=2.2#/core0

instead of
http://192.168.97.170:8080/solr/#/core0/select?q=(productNativeId:88863)&rows=100000000&version=2.2

Apache solr 4.0.0 has a different URL that has # at the end(for link
).
for eg
http://localhost:8080/solr/#/

is there anything I can do about this.Please help me

Mauricio Scheffer said...

Recbooks, please post this on the SolrNet mailing list : https://groups.google.com/forum/#!forum/solrnet

LaSerpe said...

I have a somehow similar need: I'd like to set KeepAlive request's property to false. It's there a more straight-forward approach than reimplementing ISolrConnection's Get method?

Mauricio Scheffer said...

This is the fourth time I post this so I'm going to use all caps this time, ok?

PLEASE USE THE GOOGLE GROUP FOR ALL QUESTIONS ABOUT SOLRNET: https://groups.google.com/forum/?fromgroups#!forum/solrnet

COMMENTING ON A TWO-YEAR-OLD POST IS NOT THE APPROPRIATE CHANNEL TO ASK QUESTIONS ABOUT SOLRNET.

Sorry I had to do that but hopefully that will give my message some more visibility.

Anonymous said...

Get latest from Github for solrnet master branch, it's already supported. It's under solrnet.impl namespace.

Tapan Chapadia said...

When I try to use above, I get below WebException in c# code.

"The remote server returned an error: (500) Internal Server Error."

When I see logs in solr. I find below.

"Solr requires that request parameters sent using application/x-www-form-urlencoded content-type can be read through the request input stream. Unfortunately, the stream was empty / not available. This may be caused by another servlet filter calling ServletRequest.getParameter*() before SolrDispatchFilter, please remove it.org.apache.solr.common.SolrException: Solr requires that request parameters sent using application/x-www-form-urlencoded content-type can be read through the request input stream. Unfortunately, the stream was empty / not available. This may be caused by another servlet filter calling ServletRequest.getParameter*() before SolrDispatchFilter, please remove it."

Not sure what is going wrong here. Appreciate if you can help here.