Monday, December 13, 2010

Customizing SolrNet

One of the most voted enhancement requests for SolrNet (an Apache Solr client for .NET) right now is to add support for POSTing when querying.

Let me explain: queries are serialized by SolrNet and sent to Solr via HTTP. Normally, queries are issued with a GET request and the query itself goes in the query string part of the URL. A simple query URL might look like this: http://localhost:8983/solr/select?q=id:123 .

The problem arises when the query is too long to fit in the query string. Even though the HTTP protocol does not place any a priori limit on the length of a URI, most (all?) servers do, for performance and security reasons.

Here's a little program that reproduces this issue:

internal class Program {
    private const string serverURL = "http://localhost:8983/solr";

    private static void Main(string[] args) {
        Startup.Init<Dictionary<string, object>>(serverURL);
        var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
        solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
    }
}

This creates the query "id:0 OR id:1 OR ... OR id:999", it's about 10KB after encoding, more than enough for our tests. Running this against Solr on Jetty 6 makes Jetty throw:

2010-12-13 17:52:33.362::WARN:  handle failed 
java.io.IOException: FULL 
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) 
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) 
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) 
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) 
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Not very graceful... it should probably respond with 414 Request-URI Too Long instead of throwing like this, but clients shouldn't send such long URIs anyway.

Steven Livingston has a good blog post describing a patch modifying some classes in SolrNet to deal with this issue. However, even though I never foresaw this problem when writing SolrNet, solving it does not really require any changes to the existing codebase.

In this particular case, what we need to do concretely is override the Get() method of the ISolrConnection service and make it issue POST requests instead of GET. We can write a decorator to achieve this:

public class PostSolrConnection : ISolrConnection {
    private readonly ISolrConnection conn;
    private readonly string serverUrl;

    public PostSolrConnection(ISolrConnection conn, string serverUrl) {
        this.conn = conn;
        this.serverUrl = serverUrl;
    }

    public string Post(string relativeUrl, string s) {
        return conn.Post(relativeUrl, s);
    }

    public string Get(string relativeUrl, IEnumerable<KeyValuePair<string, string>> parameters) {
        var u = new UriBuilder(serverUrl);
        u.Path += relativeUrl;
        var request = (HttpWebRequest) WebRequest.Create(u.Uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";
        var qs = string.Join("&", parameters
            .Select(kv => string.Format("{0}={1}", HttpUtility.UrlEncode(kv.Key), HttpUtility.UrlEncode(kv.Value)))
            .ToArray());
        request.ContentLength = Encoding.UTF8.GetByteCount(qs);
        request.ProtocolVersion = HttpVersion.Version11;
        request.KeepAlive = true;
        try {
            using (var postParams = request.GetRequestStream())
            using (var sw = new StreamWriter(postParams))
                sw.Write(qs);
            using (var response = request.GetResponse())
            using (var responseStream = response.GetResponseStream())
            using (var sr = new StreamReader(responseStream, Encoding.UTF8, true))
                return sr.ReadToEnd();
        } catch (WebException e) {
            throw new SolrConnectionException(e);
        }
    }
}

Now we have to apply this decorator:

private static void Main(string[] args) {
    Startup.Init<Dictionary<string, object>>(new PostSolrConnection(new SolrConnection(serverURL), serverURL));
    var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

That's it! If you're using Windsor, applying the decorator looks like this:

private static void Main(string[] args) {
    var container = new WindsorContainer();
    container.Register(Component.For<ISolrConnection>()
        .ImplementedBy<PostSolrConnection>()
        .Parameters(Parameter.ForKey("serverUrl").Eq(serverURL)));
    container.AddFacility("solr", new SolrNetFacility(serverURL));
    var solr = container.Resolve<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

This is the real benefit of writing decoupled code. Not testability, but flexibility. Testability is nice of course, but not the primary purpose.
When your code is decoupled, you can even implement entire features mostly by rearranging the object graph. This is pretty much how I implemented multicore support in SolrNet.

The PostSolrConnection implementation above works with SolrNet 0.3.0 and probably also 0.2.3. PostSolrConnection is not the default because: a) it needs to be tested thoroughly, and b) Solr doesn't emit cache headers when POSTing so it precludes caching.

20 comments:

  1. Thanks, I'm trying this code out right now. It is strange though that the framework doesn't just include it by as a configurable option out of the box.

    Thanks for your great work.

    ReplyDelete
  2. @Khalid: extensible/composable > configurable.

    ReplyDelete
  3. Does this still work with the latest version of Solr (0.4.0.2002)? I'm getting the following error:

    PostSolrConnection' does not implement interface member 'SolrNet.ISolrConnection.PostStream(string, string, System.IO.Stream, System.Collections.Generic.IEnumerable>)'

    ReplyDelete
  4. @Mun: there have been breaking changes in 0.4.0

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Hi.. I used ur above mentioned way of post, but when i try to send Foreign characters- its not properly decoded over Solr. In the sense- if i send "espaƱol", its decoded as "espaA+ol"
    in Solr

    ReplyDelete
  7. Please use ContentType as "application/x-www-form-urlencoded; charset=utf-8" . Or else Solr doesn't parse non-ASCII characters

    ReplyDelete
  8. I have written a version for the new solrnet where i override PostStream the same way we overload Post but i am getting a StackOverflowException on the parameters.Select().ToArray() line. I really need to do this for SolrNet 0.4 please help.

    ReplyDelete
  9. @Anonymous : post your code as a fork of the SolrNet repository on github so I can review it. Please always use the google group for questions about SolrNet.

    ReplyDelete
  10. Thanks Mario i will use the google group in future.
    I have found my issue, I was somewhow creating a Filter Query that was causing the StackOverflowException.
    Refactoring the construction of the query solved the problem.
    The only change from your version here was to override PostStream and call conn.PostStream inside!

    ReplyDelete
  11. Hi Anonymous, please share your changes here or commit them to github, we have the same issue and it will really help.

    -Alexey

    ReplyDelete
  12. No worries, I've just modified the code from blogpost myself. Problem solved.

    ReplyDelete
  13. Hi Mauricio,

    it's not working when you add cores. Then it still uses GET as Request Method. How can I fix this?

    ReplyDelete
  14. Claudio, please post this on the SolrNet mailing list : https://groups.google.com/forum/#!forum/solrnet

    ReplyDelete
  15. Hi Maurico

    When I use SolrNet with apachesolr4.0.0 Iam facing issues in sending the query,the query gets inserted instead of getting appended.
    For eg
    I get this
    http://192.168.97.170:8080/solr//select?q=(productNativeId:88863)&rows=100000000&version=2.2#/core0

    instead of
    http://192.168.97.170:8080/solr/#/core0/select?q=(productNativeId:88863)&rows=100000000&version=2.2

    Apache solr 4.0.0 has a different URL that has # at the end(for link
    ).
    for eg
    http://localhost:8080/solr/#/

    is there anything I can do about this.Please help me

    ReplyDelete
  16. Recbooks, please post this on the SolrNet mailing list : https://groups.google.com/forum/#!forum/solrnet

    ReplyDelete
  17. I have a somehow similar need: I'd like to set KeepAlive request's property to false. It's there a more straight-forward approach than reimplementing ISolrConnection's Get method?

    ReplyDelete
  18. This is the fourth time I post this so I'm going to use all caps this time, ok?

    PLEASE USE THE GOOGLE GROUP FOR ALL QUESTIONS ABOUT SOLRNET: https://groups.google.com/forum/?fromgroups#!forum/solrnet

    COMMENTING ON A TWO-YEAR-OLD POST IS NOT THE APPROPRIATE CHANNEL TO ASK QUESTIONS ABOUT SOLRNET.

    Sorry I had to do that but hopefully that will give my message some more visibility.

    ReplyDelete
  19. Get latest from Github for solrnet master branch, it's already supported. It's under solrnet.impl namespace.

    ReplyDelete
  20. When I try to use above, I get below WebException in c# code.

    "The remote server returned an error: (500) Internal Server Error."

    When I see logs in solr. I find below.

    "Solr requires that request parameters sent using application/x-www-form-urlencoded content-type can be read through the request input stream. Unfortunately, the stream was empty / not available. This may be caused by another servlet filter calling ServletRequest.getParameter*() before SolrDispatchFilter, please remove it.org.apache.solr.common.SolrException: Solr requires that request parameters sent using application/x-www-form-urlencoded content-type can be read through the request input stream. Unfortunately, the stream was empty / not available. This may be caused by another servlet filter calling ServletRequest.getParameter*() before SolrDispatchFilter, please remove it."

    Not sure what is going wrong here. Appreciate if you can help here.

    ReplyDelete