One of the most voted enhancement requests for SolrNet (an Apache Solr client for .NET) right now is to add support for POSTing when querying.
Let me explain: queries are serialized by SolrNet and sent to Solr via HTTP. Normally, queries are issued with a GET request and the query itself goes in the query string part of the URL. A simple query URL might look like this: http://localhost:8983/solr/select?q=id:123 .
The problem arises when the query is too long to fit in the query string. Even though the HTTP protocol does not place any a priori limit on the length of a URI, most (all?) servers do, for performance and security reasons.
Here's a little program that reproduces this issue:
internal class Program { private const string serverURL = "http://localhost:8983/solr"; private static void Main(string[] args) { Startup.Init<Dictionary<string, object>>(serverURL); var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>(); solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray())); } }
This creates the query "id:0 OR id:1 OR ... OR id:999", it's about 10KB after encoding, more than enough for our tests. Running this against Solr on Jetty 6 makes Jetty throw:
2010-12-13 17:52:33.362::WARN: handle failed java.io.IOException: FULL at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Not very graceful... it should probably respond with 414 Request-URI Too Long instead of throwing like this, but clients shouldn't send such long URIs anyway.
Steven Livingston has a good blog post describing a patch modifying some classes in SolrNet to deal with this issue. However, even though I never foresaw this problem when writing SolrNet, solving it does not really require any changes to the existing codebase.
In this particular case, what we need to do concretely is override the Get() method of the ISolrConnection service and make it issue POST requests instead of GET. We can write a decorator to achieve this:
public class PostSolrConnection : ISolrConnection { private readonly ISolrConnection conn; private readonly string serverUrl; public PostSolrConnection(ISolrConnection conn, string serverUrl) { this.conn = conn; this.serverUrl = serverUrl; } public string Post(string relativeUrl, string s) { return conn.Post(relativeUrl, s); } public string Get(string relativeUrl, IEnumerable<KeyValuePair<string, string>> parameters) { var u = new UriBuilder(serverUrl); u.Path += relativeUrl; var request = (HttpWebRequest) WebRequest.Create(u.Uri); request.Method = "POST"; request.ContentType = "application/x-www-form-urlencoded"; var qs = string.Join("&", parameters .Select(kv => string.Format("{0}={1}", HttpUtility.UrlEncode(kv.Key), HttpUtility.UrlEncode(kv.Value))) .ToArray()); request.ContentLength = Encoding.UTF8.GetByteCount(qs); request.ProtocolVersion = HttpVersion.Version11; request.KeepAlive = true; try { using (var postParams = request.GetRequestStream()) using (var sw = new StreamWriter(postParams)) sw.Write(qs); using (var response = request.GetResponse()) using (var responseStream = response.GetResponseStream()) using (var sr = new StreamReader(responseStream, Encoding.UTF8, true)) return sr.ReadToEnd(); } catch (WebException e) { throw new SolrConnectionException(e); } } }
Now we have to apply this decorator:
private static void Main(string[] args) { Startup.Init<Dictionary<string, object>>(new PostSolrConnection(new SolrConnection(serverURL), serverURL)); var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>(); solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray())); }
That's it! If you're using Windsor, applying the decorator looks like this:
private static void Main(string[] args) { var container = new WindsorContainer(); container.Register(Component.For<ISolrConnection>() .ImplementedBy<PostSolrConnection>() .Parameters(Parameter.ForKey("serverUrl").Eq(serverURL))); container.AddFacility("solr", new SolrNetFacility(serverURL)); var solr = container.Resolve<ISolrOperations<Dictionary<string, object>>>(); solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray())); }
This is the real benefit of writing decoupled code. Not testability, but flexibility. Testability is nice of course, but not the primary purpose.
When your code is decoupled, you can even implement entire features mostly by rearranging the object graph. This is pretty much how I implemented multicore support in SolrNet.
The PostSolrConnection implementation above works with SolrNet 0.3.0 and probably also 0.2.3. PostSolrConnection is not the default because: a) it needs to be tested thoroughly, and b) Solr doesn't emit cache headers when POSTing so it precludes caching.