Bug squash: solrnet

Showing posts with label solrnet. Show all posts

Sunday, December 4, 2011

SolrNet 0.4.0 beta 1 released

SolrNet is a Solr client for .NET, and I just released version 0.4.0 beta 1.

Core functionality is stable.
New features are unit-tested, but may not be battle-tested.
Feature-frozen, no new features will be allowed between this beta and the final release. You can of course still send in pull requests for new features, they will be included in the next release, just not in 0.4.0; and in fact there is a pending pull request implementing sharding that I have to review, work on and eventually merge.
API-frozen, unless there's some critical flaw or an obvious win with a small change.
Little to no documentation about new features or changes. Tests are the primary or maybe only source of usage.

New features

MoreLikeThis handler queries.
TermsComponent support.
ClusteringComponent support.
Optionally disable quoting in SolrQueryByField.
Ngroups for result grouping.
Multicore support for Autofac module.
Friendlier highlighting results.

Bugfixes

Fixed an intermitent bug in NHibernate integration.
Fixed bug with LocalParams in date facets.
Fixed pre/post delimiters for fastVectorHighlighter.
Fixed exception when using SolrNet in Application_Start under IIS 7+

Breaking changes

Removed query result interfaces (ISolrQueryResults). Just replace it with SolrQueryResults in your code.
Deprecated Add(IEnumerable<T>). Use AddRange() instead.

Other stuff

Upgraded to NHibernate 3.2
Upgraded to Autofac 2.5.2

Contributors since 0.4.0 alpha 1

Huge kudos to them all! Also thanks to Stephen Pope and Paige Cook for picking up many questions about SolrNet in the mailing list and Stackoverflow.

Download

Binaries available on Google Code as usual.

If you're upgrading from 0.3.1, see also the 0.4.0 alpha 1 release notes.

Sunday, June 19, 2011

SolrNet 0.4.0 alpha 1 released

SolrNet is a Solr client for .NET, and I just released version 0.4.0 alpha 1.

Before detailing the changes, I'd like to clarify what alpha means to me:

Core functionality is stable (otherwise I wouldn't even bother releasing)
New features are unit-tested (otherwise I wouldn't even admit them into the repo), but may not be battle-tested
Not feature-frozen (i.e. future alphas/betas might get new features)
Not interface-frozen (I make no guarantees about breaking changes between this alpha and the final release)
Little to no documentation about new features or changes. Tests are the primary or maybe only source of usage.

The goal of this release is to get real-world testing and feedback on the new features and changes.

New features

Solr 4.0 grouping (this used to be called "field collapsing", but was completely overhauled)
Solr 4.0 pivot faceting
Autofac integration module
Unity integration
Fluent interface: added index-time document boosting
Fluent interface: added easier way to set Solr URL and timeout
SolrQueryByDistance (spatial search)
Support for ExtractingRequestHandler (i.e. binary/proprietary document indexing, like MS Word, PDF, etc)
Rollback (it was implemented but missing in ISolrOperations)
CommitWithin and Overwrite parameters for document add
Mixed exclusive/inclusive range queries (Solr 4.0 only)

Bugfixes

Fixed support for nullable enum properties with empty Solr field value.

Breaking changes

Breaking change for IReadOnlyMappingManager implementors: it now indexes fields by name to speed up lookup.
Breaking change: SolrQueryByField now quotes '*' and '?'
Minor breaking change for direct users of SolrConnection: removed constructor with IHttpWebRequestFactory parameter and made it a property.

Other stuff

Upgraded to Windsor 2.5.3
Upgraded to Ninject 2.2.1.0
Upgraded to NHibernate 3.1.0
Upgraded to StructureMap 2.6.2
Upgraded to .NET 3.5

Contributors

I'm very happy to say that the project is getting more and more contributors all the time, and they're doing a great job! Here are the contributors to this release:

A huge thank you to them all!

Binaries available on google code.

Thursday, March 31, 2011

SolrNet 0.3.1 released

SolrNet is a Solr client for .NET. I just released SolrNet 0.3.1.

This is a bugfix release. It's a drop-in replacement for 0.3.0, there are no breaking changes.

Here's the changelog:

Fixed fixed parsing of decimals with exponential notation
Fixed SolrQueryInList with empty strings
Fixed facet.missing=true
Added support for nullable Guid properties
Fixed date faceting for Solr 3.x by ignoring 'start' element
Fixed NullReferenceException with facet.missing=true
Null in range queries translate to *
Ignored LocalParams for facet field parameters, it generated an invalid query.

Thanks to everyone who reported these bugs and offered solutions!

Binaries are available on Google Code and NuGet.

Monday, December 13, 2010

Customizing SolrNet

One of the most voted enhancement requests for SolrNet (an Apache Solr client for .NET) right now is to add support for POSTing when querying.

Let me explain: queries are serialized by SolrNet and sent to Solr via HTTP. Normally, queries are issued with a GET request and the query itself goes in the query string part of the URL. A simple query URL might look like this: http://localhost:8983/solr/select?q=id:123 .

The problem arises when the query is too long to fit in the query string. Even though the HTTP protocol does not place any a priori limit on the length of a URI, most (all?) servers do, for performance and security reasons.

Here's a little program that reproduces this issue:

internal class Program {
    private const string serverURL = "http://localhost:8983/solr";

    private static void Main(string[] args) {
        Startup.Init<Dictionary<string, object>>(serverURL);
        var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
        solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
    }
}

This creates the query "id:0 OR id:1 OR ... OR id:999", it's about 10KB after encoding, more than enough for our tests. Running this against Solr on Jetty 6 makes Jetty throw:

2010-12-13 17:52:33.362::WARN:  handle failed 
java.io.IOException: FULL 
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) 
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) 
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) 
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) 
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Not very graceful... it should probably respond with 414 Request-URI Too Long instead of throwing like this, but clients shouldn't send such long URIs anyway.

Steven Livingston has a good blog post describing a patch modifying some classes in SolrNet to deal with this issue. However, even though I never foresaw this problem when writing SolrNet, solving it does not really require any changes to the existing codebase.

In this particular case, what we need to do concretely is override the Get() method of the ISolrConnection service and make it issue POST requests instead of GET. We can write a decorator to achieve this:

public class PostSolrConnection : ISolrConnection {
    private readonly ISolrConnection conn;
    private readonly string serverUrl;

    public PostSolrConnection(ISolrConnection conn, string serverUrl) {
        this.conn = conn;
        this.serverUrl = serverUrl;
    }

    public string Post(string relativeUrl, string s) {
        return conn.Post(relativeUrl, s);
    }

    public string Get(string relativeUrl, IEnumerable<KeyValuePair<string, string>> parameters) {
        var u = new UriBuilder(serverUrl);
        u.Path += relativeUrl;
        var request = (HttpWebRequest) WebRequest.Create(u.Uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";
        var qs = string.Join("&", parameters
            .Select(kv => string.Format("{0}={1}", HttpUtility.UrlEncode(kv.Key), HttpUtility.UrlEncode(kv.Value)))
            .ToArray());
        request.ContentLength = Encoding.UTF8.GetByteCount(qs);
        request.ProtocolVersion = HttpVersion.Version11;
        request.KeepAlive = true;
        try {
            using (var postParams = request.GetRequestStream())
            using (var sw = new StreamWriter(postParams))
                sw.Write(qs);
            using (var response = request.GetResponse())
            using (var responseStream = response.GetResponseStream())
            using (var sr = new StreamReader(responseStream, Encoding.UTF8, true))
                return sr.ReadToEnd();
        } catch (WebException e) {
            throw new SolrConnectionException(e);
        }
    }
}

Now we have to apply this decorator:

private static void Main(string[] args) {
    Startup.Init<Dictionary<string, object>>(new PostSolrConnection(new SolrConnection(serverURL), serverURL));
    var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

That's it! If you're using Windsor, applying the decorator looks like this:

private static void Main(string[] args) {
    var container = new WindsorContainer();
    container.Register(Component.For<ISolrConnection>()
        .ImplementedBy<PostSolrConnection>()
        .Parameters(Parameter.ForKey("serverUrl").Eq(serverURL)));
    container.AddFacility("solr", new SolrNetFacility(serverURL));
    var solr = container.Resolve<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

This is the real benefit of writing decoupled code. Not testability, but flexibility. Testability is nice of course, but not the primary purpose.
When your code is decoupled, you can even implement entire features mostly by rearranging the object graph. This is pretty much how I implemented multicore support in SolrNet.

The PostSolrConnection implementation above works with SolrNet 0.3.0 and probably also 0.2.3. PostSolrConnection is not the default because: a) it needs to be tested thoroughly, and b) Solr doesn't emit cache headers when POSTing so it precludes caching.

Monday, December 6, 2010

SolrNet 0.3.0 released

SolrNet is a Solr client for .NET. I just released SolrNet 0.3.0.

Longest. Beta. Ever. I know. But I really wanted to document everything and change many things in the release package, and in order to do that I had to get rid of MSBuild first.

There aren't many changes in the library itself:

Upgraded Ninject module to Ninject 2.0 RTM
Upgraded StructureMap registry to StructureMap 2.6.1
Upgraded Windsor facility to Windsor 2.5.2
Added support for multi-core for StructureMap
Improved response parsing performance
Fixed a couple of minor bugs

If you're upgrading from 0.2.3 please read about the breaking changes.

Also, this will be the last release to support .NET 2.0.

As for the package, it now contains:

Merged and unmerged assemblies (only the merged assembly with all IoC integrations was included before).
PDBs for all assemblies.
All assemblies are now signed with a strong name.
Doxygen-compiled documentation (replacing Sandcastle which was too bloated).
Note explaining exactly what assemblies you need depending on how you integrate the library. Hopefully this will help clear up the confusion.

I'd like to thank the following people who contributed to this release:

Stephen/mRg contributed many patches and documentation.
Graham Hay helped with the StructureMap integration.
Mick Delaney upgraded projects to VS2010.

Binaries, docs and sample app downloads here. You can also get it via NuGet packages.

Tuesday, November 16, 2010

Migrating to FAKE

I finally finished migrating the SolrNet build scripts from MSBuild to FAKE. I did not do this out of a whim or just because I was bored, but because the MSBuild script was getting out of hand. At only 246 lines, it became unmaintainable. I admit I'm not a MSBuild expert, but a build script shouldn't be that hard. Just visually parsing the script was a daunting task.

This screenshot compares the FAKE and the equivalent MSBuild scripts side by side:

fake-vs-msbuild

Even if you can't read it, I bet you can tell which is which.

XML-based DSLs like MSBuild and NAnt have only one advantage: being easily parsable by tools. But if you're going to do any sort of manual maintenance of the script, embedded DSLs in a real language will beat XML progamming every time.

I wrote the original build script for SolrNet in MSBuild for two reasons: no external dependencies, so it wouldn't place any burden on potential contributors; and being a NAnt user, I wanted to learn MSBuild.

But I should have known better: most of my NAnt scripts have evolved over the years to be mostly calls to MSBuild to compile solutions and embedded Boo to handle any logic.

Speaking of Boo, Phantom looks very nice. I picked FAKE over Phantom kind of arbitrarily, mostly because I'm really digging F# right now. But I'm keeping an eye on Phantom. Boo's macros and compiler extensibility are used in Phantom, and Boo's type inference and optional duck-typing make it an excellent language for a build system. And Boo is so small that you can fit the runtime and the compiler in under 2MB, so it's not an issue to distribute it.

Another option is Rake. Albacore and rake-dotnet implement the common .net-building tasks in Rake. But I really don't want to introduce a dependency on Ruby just to build the project.

Then there's PSake. PowerShell is quite powerful and I really like the concept of piping objects instead of lines of text as in unix shells. In the context of a build script, one of the main advantages of using a shell is that it's dead easy to call external programs, so "integrating" with ILMerge, Zip, etc is just a matter of calling the executable in question with the parameters you need, just as you would do it on the command line. No wrappers needed. And PowerShell is ubiquitous enough as to not consider it a dependency. But... I still can't get myself to like PowerShell's syntax. I know this is totally subjective, but I feel just like Jim here. Scripts involving .net are particularly ugly. And my first experience with PowerShell was bumpy to say the least.

Another (lesser for me) issue with PowerShell is that it currently doesn't run on Mono. It's not like I'm building SolrNet on Mono right now, but I certainly don't want to knowingly work against it.

So why did I pick FAKE and F# instead of all the others I mentioned?

F# is statically typed and has global type inference. If there is a type error in my script, it won't run. At the same time, type inference gives the code a scripty feel. I didn't have to declare a single type in the build script (you can see it here).
I can write my build script on VS2010, complete with intellisense, automatic error checking and debugging.
FAKE defines a simple, terse, functional EDSL to manage common build tasks.
FAKE is trivially extensible: just write a regular F# function (or .net method in any language) and invoke it. No particular structure or attributes needed. In just a few lines I defined some helper functions to manipulate XML and start/stop Solr to run integration tests.
F# works on Mono.
No external dependencies: F# is included in any default VS2010 install. And now that F# is Apache-licensed, it will soon be included in Mono (therefore also Ubuntu) and MonoDevelop.

I've also been contributing a few things to FAKE lately:

Enhancements to the MSBuild integration.
Enhancements to the ILMerge integration.
Gallio integration.
Simpler shell exec functions and some shell-like functions similar to Ruby's FileUtils.
A couple of bugfixes.

Bottom line: a few years ago all there was to build .net projects was MSBuild and NAnt. Nowadays there's a lot of choice. Make sure you do your research and pick the build system that's right for your project and for you.

Tuesday, June 8, 2010

SolrNet 0.3.0 beta1

SolrNet is a Solr client for .NET.
I finally managed to close all pending issues and released SolrNet 0.3.0 beta1.

There are quite a few changes and new features. Comparing the last release (already 6 months ago!) to this one using NDepend:

# IL instructions: 16161 to 21394 (+32.6%)
# Types: 210 to 287 (+36.7%)

On to the...

Breaking changes

Even though there were a lot of changes, it's not very likely that any single developer will see more than one or two breaking changes (if any) since nobody uses all features. Here are the details:

Field collapsing changes

What: Field collapsing parameters and results have completely changed.

Who this affects: Everyone using the Field collapsing feature of Solr.

Change required: Until this is properly documented, see the new CollapseParameters and CollapseResults.

Why: Solr changed this completely since it's an unreleased feature.

Changes in ISolrConnection

What: The ISolrConnection interface no longer has the ServerURL and Version properties.

Who this affects: Everyone implementing a decorator for ISolrConnection (e.g. LoggingConnection).

Change required in the application: Remove these properties from your decorator.

Why: They served no purpose in the interface, it's something that belongs to the implementation.

Indexed results (Highlighting, MoreLikeThis)

What: Indexed results are no longer indexed by the document entity. Now they're indexed by the unique key, represented by a string.

Who this affects: Almost everyone using the Highlighting or MoreLikeThis features.

Change required in the application: instead of:

MyDocument doc = ...
ISolrQueryResults<MyDocument> results = ...
results.Highlighting[doc]...

do:

MyDocument doc = ...
ISolrQueryResults<MyDocument> results = ...
results.Highlighting[doc.Id.ToString()]...

Why: Needed to implement loose mapping (see below), was a potential performance hog, didn't add much value anyway.

Highlight results

What: The Highlight results type changed from IDictionary<string, string> to IDictionary<string, ICollection<string>>

Who this affects: Everyone using the Highlighting feature.

Change required in the application: If you were relying on getting only a single snippet, just get the first element in the collection.

Why: the way it was before this change, it wasn't returning multiple snippets.

Chainable methods on ISolrOperations

What: Methods on ISolrOperations that used to return ISolrOperations (i.e. chainable methods) are no longer chainable.

Who this affects: Everyone chaining methods or implementing a decorator for ISolrOperations.

Change required in the application: put each method in its own line. E.g. instead of:

solr.Add(new MyDocument {Text = "something"}).Commit();

do:

solr.Add(new MyDocument {Text = "something"});  
solr.Commit();

Why: needed to return the Solr response timings.

Removed NoUniqueKeyException

Who this affects: Everyone catching this exception (very rare) or implementing a custom IReadOnlyMappingManager (also rare)

Change required in the application: If you're catching this exception, check for a null result. If you're implementing a custom IReadOnlyMappingManager, return null instead of throwing.

Why: needed for the mapping validator (see below)

Removed obsolete exceptions

What: Removed obsolete exceptions BadMappingException, CollectionTypeNotSupportedException, FieldNotFoundException

Who this affects: Everyone catching these exceptions (very rare).

Change required in the application: catch SolrNetException instead.

Why: these exceptions were often misleading.

Removed ISolrDocument interface

Who this affects: Everyone implementing this interface in their document types.

Change required in the application: just remove the interface implementation.

Why: no longer needed, it was marked obsolete since 0.2.0

Renamed WaitOptions to CommitOptions

Who this affects: Everyone using the Commit() or Optimize() parameters.

Change required in the application: rename "WaitOptions" to "CommitOptions".

Why: due to recent new features in Solr, these options are now much more than just WaitOptions so the name didn't fit anymore.

New features

Here's a quick overview of the main new features:

LocalParams

This feature lets you add metadata to a piece of query, Solr then interprets this metadata in various ways, for example for multi-faceting

Semi loose mapping

Map fields using a dictionary:

public class ProductLoose {
	[SolrUniqueKey("id")]
	public string Id { get; set; }

	[SolrField("name")]
	public string Name { get; set; }

	[SolrField("*")]
	public IDictionary<string, object> OtherFields { get; set; }
}

Here, OtherFields contains fields other than Id and Name. The key of this dictionary is the Solr field name; the value corresponds to the field value.

Loose mapping

Instead of writing a document class and mapping it using attributes, etc, you can just use a Dictionary<string, object>, where the key is the Solr field name:

Startup.Init<Dictionary<string, object>>("http://localhost:8983/solr");
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Dictionary<string, object>>>();
solr.Add(new Dictionary<string, object> {
	{"id", "id1234"},
	{"manu", "Asus"},
	{"popularity", 6},
	{"features", new[] {"Onboard video", "Onboard audio", "8 USB ports"}}
});

Mapping validation

This feature detects mapping problems, i.e. type mismatches between the .net property and the Solr field
It returns a list of warnings and errors.

ISolrOperations<MyDocument> solr = ...
solr.EnumerateValidationResults().ToList().ForEach(e => Console.WriteLine(e.Message));

HTTP-level cache

As of 1.3, Solr honors the standard HTTP cache headers: If-None-Match, If-Modified-Since, etc; and outputs correct Last-Modified and ETag headers.
Normally, the way to take advantage of this is to set up an HTTP caching proxy (like Squid) between Solr and your application.
With this release, SolrNet includes an optional cache so you don't need to set up that extra proxy. Unfortunately, Solr generates incorrect headers when running distributed searches, so I won't make this a default option until it's fixed. However, if you're not distributing Solr (i.e. sharding), it's perfectly safe to use this cache to get a major performance boost (depending on how often you repeat queries, of course) at the cost of some memory.
To use the cache, you only need to register a component implementing the ISolrCache interface at startup. For example, if you're using Windsor:

container.Register(Component.For<ISolrCache>().ImplementedBy<HttpRuntimeCache>());

The HttpRuntimeCache implementation uses the ASP.NET Cache with a default sliding expiration of 10 minutes.

StructureMap integration

In addition to the built-in container and the Windsor facility and Ninject module, now you can use manage SolrNet with StructureMap. See this article by Mark Unsworth for details.

Index-time field boosting

You can now define a boost factor for each field, to be used at index-time, e.g:

public class TestDocWithFieldBoost 
{
   [SolrField("text", Boost = 20)] 
   public string Body { get;set; } 
}

Improved multi-core / multi-instance configuration for the Windsor facility

The Windsor facility now has an AddCore() method to help wire the internal components of SolrNet to manage multiple cores/instances of Solr. Here's an example:

var solrFacility = new SolrNetFacility("http://localhost:8983/solr/defaultCore");
solrFacility.AddCore("core0-id", typeof(Document), "http://localhost:8983/solr/core0");
solrFacility.AddCore("core1-id", typeof(Document), "http://localhost:8983/solr/core1");
solrFacility.AddCore("core2-id", typeof(Core1Entity), "http://localhost:8983/solr/core1");
var container = new WindsorContainer();
container.AddFacility("solr", solrFacility);
ISolrOperations<Document> solr0 = container.Resolve<ISolrOperations<Document>>("core0-id");

Of course, you usually don't Resolve() like that (this is just to demo the feature) but you would use service overrides to inject the proper ISolrOperations into your services.

There are some other minor new features, see the changelog for details.

Contributors

I want to thank the following people who contributed to this release (I wish they'd use their real names so I can give them proper credit!):

Olle de Zwart: implemented the schema validator, delete by id and query in the same request, helped with integration tests.
mRg: implemented index-time field boosting, new commit/optimize parameters.
ironjelly2: updated the code to work with the new field collapsing patch.
Mark Unsworth: wrote the StructureMap adapter.
mr.snuffle: fixed a performance issue in SolrMultipleCriteriaQuery.

I'd also like to thank everyone who submitted a bug report.

Feel free to join the project's mailing list if you have any questions about SolrNet.

Friday, March 12, 2010

Low-level SolrNet

I recently got a question about how to handle multi-faceting in SolrNet, a nice feature of Solr that can be very useful to the end-user. eBay uses a kind of multi-faceting interface.
If you know nothing about Solr or SolrNet, read on, this article isn't so much about Solr as API design.

The Solr wiki has an example query with multi-faceting:

q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype

For those of you that are not into Solr, this is just a regular URL query string that is passed to the Solr endpoint. The final URL looks like this (modulo encoding):

http://localhost:9983/solr/select/?q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype

And this is how you represent this query in the SolrNet object model:

var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Document>>(); 
ISolrQueryResults<Document> results = solr.Query("mainquery", new QueryOptions { 
    FilterQueries = new[] { 
        Query.Field("status").Is("public"), 
        new LocalParams {{"tag", "dt"}} + Query.Field("doctype").Is("pdf") 
    }, 
    Facet = new FacetParameters { 
        Queries = new[] { 
            new SolrFacetFieldQuery(new LocalParams {{"ex", "dt"}} + "doctype") 
        } 
    } 
});

We build object models like this one because they're programmable, objects and methods can be programmatically combined to build (or compose) our intention. Like Hibernate's Criteria API.

Opposed to this is the string, and most of the time we hate it because it's opaque, it doesn't have any syntactical meaning within our object-oriented code. It has no programmability, no composability. We use very generic classes to build strings, like StringBuilders or StringWriters, which don't convey any syntactical information about what we're actually doing. If we need to extract information from a string, we have to write a parser, which is not a trivial task. But the string also has its advantages: it's naturally serializable (or should I say already serialized), it can be more readable and more concise. And those are some of the reasons why Hibernate also provides the HQL API. You might be thinking that this dichotomy of objects and strings is really a matter of serialization and deserialization, but I'm talking about human-readable strings here, whereas a serialized format is frequently for machine consumption only.

So if we already know what the query string is, how can we simplify the chunk of code above? Thanks to IoC, we can easily tap into some of SolrNet's "internal" components without worrying about what dependencies they need:

Func<string, string, KeyValuePair<string, string>> kv = (k, v) => new KeyValuePair<string, string>(k, v); 
var connection = ServiceLocator.Current.GetInstance<ISolrConnection>(); 
var xml = connection.Get("/select", new[] { 
    kv("q", "mainquery"), 
    kv("fq", "status:public"), 
    kv("fq", "{!tag=dt}doctype:pdf"), 
    kv("facet", "on"), 
    kv("facet.field", "{!ex=dt}doctype"), 
}); 
var parser = ServiceLocator.Current.GetInstance<ISolrQueryResultParser<Document>>(); 
ISolrQueryResults<Document> results = parser.Parse(xml);

ISolrConnection is just a wrapper over the HTTP request, we give it the querystring parameters and get Solr's XML response, then we feed the response to the parser component and voilà, we have our results.

And since it's just a regular HTTP request, we can go even lower:

using (var web = new WebClient()) { 
    var xml = web.DownloadString("http://localhost:9983/solr/select/?q=mainquery&fq=status%3Apublic&fq=%7B!tag%3Ddt%7Ddoctype%3Apdf&facet=on&facet.field=%7B!ex%3Ddt%7Ddoctype");
    var parser = ServiceLocator.Current.GetInstance<ISolrQueryResultParser<Document>>(); 
    ISolrQueryResults<Document> results = parser.Parse(xml); 
}

I'll leave it to you to decide which one to use. Like the choice between HQL and Criteria, sometimes you might prefer one over the other depending on the context. Just keep in mind that these components' interfaces are not as stable as the "really public" documented interfaces, they might have breaking changes more often.

Wednesday, February 10, 2010

Indexing millions of documents with Solr and SolrNet

When working with Solr, it's not uncommon to see indexes with hundreds of thousands of even millions of documents. Say you build those millions of documents from a RDBMS, which is a common case.

Solr has a tool to do this: the Data Import Handler. It's configurable with XML, like so many things in Java. Problem is, when you need to do some complex processing, it quickly turns into executable XML, and nobody likes that. More importantly, the process is not testable: you can't run a unit test that doesn't involve the actual database and the actual Solr instance. So I prefer to import data to Solr with code. More precisely: .NET code, using SolrNet.

Since adding documents one by one would be terribly inefficient (1000000 documents would mean 1000000 HTTP requests), SolrNet has a specific method to add documents in batch: Add(IEnumerable<T> documents). Let's try adding a huge amount of documents with this.

Setup

To keep this post focused, I'll abstract away the database. So, first thing I'll do is set up some fake documents ^[1]:

string text = new string('x', 1000); 
IEnumerable<Dictionary<string, object>> docs = Enumerable.Range(0, 150000)
    .Select(i => new Dictionary<string, object> { 
        {"id", i.ToString()}, 
        {"title", text} 
    });

This sets up 150000 documents, each one with a size of about 1 KB, lazily. They don't exist anywhere yet, until we start enumerating docs.

Tests

After setting up SolrNet we call:

solr.Add(docs);

and shortly after executing it the process grows its memory usage to some gigabytes and then crashes with an OutOfMemoryException. Holy crap!^[2]

Reducing the amount of documents to 100000 completed the process successfully, but it took 32s (3125 docs/s) and the peak memory usage was 850MB. This clearly isn't working!

What happened is that SolrNet tried to fit all the documents in a single HTTP request. Not very smart, eh? But that's out of SolrNet's scope, at least for now. What we need to do is feed it with manageable chunks of documents. So we grab a partition function like this one, courtesy of Jon Skeet^[3]. Armed with this function we partition the 100000 docs into chunks of 1000 docs:

foreach (var group in Partition(docs, 1000))
   solr.Add(group);

This completes in 34s which is slightly worse than without grouping, but memory usage is pretty constant at 50MB. Now we're getting somewhere!

But wait! What if we parallelize these groups? The Task Parallel Library (TPL)^[4] makes it very easy to do so:

Parallel.ForEach(Partition(docs, 1000), group => solr.Add(group));

This one took 21.2s to complete on my dual-core CPU but peak memory usage was 140MB since it has to keep several groups in memory simultaneously. This is pretty much what SolrJ (the Java Solr client) does with its StreamingUpdateSolrServer, except the Java folks had to manually queue and manage the threads, while we can just leverage the TPL in a single line of code.

Playing a bit with the group size I ended up with these charts of memory size and throughput:

Memory size seems to increase linearly with group size, while throughput shows an asymptotic growth.

By now I bet you must be saying: "Hey, wait a minute! The title of the post promised millions of documents but you only show us a mere 100000! Where's the rest of it?!?". Well, I did benchmark a million documents as well, and with group size = 1000, in parallel, it took 3:57 minutes. For these tests I used 100000 documents instead to keep times down.

Conclusion and final notes

In this experiment I left a lot of variables fixed: document size, network throughput and latency (I used a local Solr instance so there is no network), CPU (since I ran Solr on the same box as the tests, they competed for CPU)... With a quad-core CPU I would expect this to consume more memory but it would also be faster. Bigger documents would also increase memory usage and make the whole process more network-sensitive. Is memory more important to you than throughput? Then you would use the non-parallel approach. So I prefer to leave these things out of SolrNet's scope for now. It depends too much on the structure of your particular data and setup to just pick some default values. And I don't want to take a dependency on the TPL yet.

Some general advice:

Keep your process as linear (O(n)) and as lazy as possible.
While increasing the group size can increase the throughput (and memory), also keep in mind that with big groups you'll start to see timeouts from Solr.
When fetching data from the database, always do it with a forward-only enumerator, like a IDataReader or a LINQ2SQL enumerable. Loading the whole resultset in a List or DataTable will simply kill your memory and performance.
It can also make sense to fetch the data from the database in several groups (I just assumed a single IEnumerable as an origin to keep it simple) and parallelize on that.

Footnotes:

Dictionary documents for SolrNet is implemented in trunk, it will be included in the next release
I know that even though this approach isn't scalable at all, it shouldn't throw OOM with only 150000 docs.
I chose that particular Partition() function because it's one-pass. If you write a partitioning function with LINQ's groupby you'll traverse your IEnumerable (at least) twice. If you use a forward-only enumerable (e.g. LINQ2SQL), which I recommend, you only get to enumerate the result once.
You can get the latest version of the TPL for .NET 3.5 SP1 from the Reactive Extensions.

Tuesday, December 29, 2009

SolrNet 0.2.3 released

I just released SolrNet 0.2.3. There aren't any significant changes from beta1, I just filled in some missing documentation bits and a couple of tests.

For the next release I'll focus on performance improvements and making multi-core access easier.

Downloads:

Tuesday, November 3, 2009

Powered by SolrNet

Some websites and products that use SolrNet:

http://www.fancydressoutfitters.co.uk by EMC Consulting
The core architecture is based on the S#arp Architecture framework (with added support for multiple databases, ViewModels etc...)
Integrated other OSS projects into the solution:
- N2CMS
- Spark View Engine
- xVal Validation Framework
- AutoMapper (for entity / view model mapping)
- PostSharp (for cross cutting concerns - logging, performance counters, caching)
- MBUnit / Gallio for our BDD Framework
- Over 700 BDD Specs
- Solution performs very well - 1000 concurrent users per web server, generating around 180 pages per second across 2x single quad core 64bit servers.
http://jobhits.co.uk by Duc Lai
http://www.leasetransfer.com by me (not my web design)
http://www.bedriftsoket.no by Steinar Asbjørnsen
EPiSolr by LBi

I'll keep an updated list on the project's wiki. If you have a public website using SolrNet, let me know!

Monday, October 12, 2009

Untangling the mess: Solr, SolrNet, NHibernate, Lucene

I've recently received several questions about the relationship between Solr, SolrNet, NHibernate, Lucene, Lucene.Net, etc, how they fit together, how they should be used, what features does each provide. Here's an attempt at elucidating the topic:

Let's start from the bottom up:

RDBMS: every programmer knows what these are. Oracle, SQL Server, MySQL, etc. Everyone uses them, to the point that it's often used as a Golden Hammer. RDBMS can be stand-alone programs (client-server architecture) or embedded (running within your application).
Lucene was written to do full-text indexing and searching. The most known example of full-text searching is Google. You throw words at it and it returns a ranked set of documents that match those words.
In terms of data structures, Lucene at its core implements an inverted index, while relational databases use B-tree variants. Fundamentally different beasts.
Lucene is a Java library, this means that it's not a stand-alone application but instead embedded in your program.
Full-text functions in relational databases: nowadays almost all major RDBMS offer some full-text capabilities: MySQL, SQL Server, Oracle, etc. As far as I know, they are all behind Lucene in terms of performance and features. They can be easier to use at first, but they're proprietary. If you ever need some advanced feature, switching to Lucene could be a PITA.
Lucene.Net is a port of Java Lucene to the .Net platform. Nothing more, nothing less. It aims to be fully API compatible so all docs on Java Lucene can be applied to Lucene.Net with minimal translation effort. Index format is also the same, so indices created with Java Lucene can be used by Lucene.Net and vice versa.
NHibernate is a port of Java Hibernate to the .Net platform. It's an ORM (object-relational mapper), which basically means that it talks to relational databases and maps your query results as objects for easier consumption in object-oriented languages.
NHibernate.Search is a NHibernate contrib project that integrates NHibernate with Lucene.Net. It's a port of the Java Hibernate Search project. It keeps a Lucene index in sync with a relational database and hides some of the complexity of raw Lucene, making it easier to index and query.
This article explains its basic usage.
Solr is a search server. It's a stand-alone Java application that uses Lucene to provide full-text indexing and searching through a XML/HTTP interface. This means that it can be used from any platform/language. It can be embedded in your own Java programs, but it's not its primary design purpose.
While very flexible, it's easier to use than raw Lucene and provides features commonly used in search applications, like faceted search and hit highlighting. It also handles caching, replication, sharding, and has a nice web admin interface.
This article is a very good tour of Solr's basic features.
SolrNet is a library to talk to a Solr instance from a .Net application. It provides an object-oriented interface to Solr's operations. It also acts as an object-Solr mapper: query results are mapped to POCOs.
The latest version also includes Solr-NHibernate integration. This is similar to NHibernate.Search: it keeps a Solr index in sync with a relational database and lets you query Solr from the NHibernate interface.
Unlike NHibernate and NHibernate.Search, which can respectively create a DB schema and a Lucene index, SolrNet can't automatically create the Solr schema. Solr does not have this capability yet. You have to manually configure Solr and set up its schema.

In case this wasn't totally clear, here's a diagram depicting a possible NHibernate-SolrNet architecture:

Monday, September 14, 2009

SolrNet 0.2.3 beta1

Just released SolrNet 0.2.3 beta1. Here's the changelog:

Fixed minor date parsing bug
Added support for field collapsing
Added support for date-faceting
Upgraded to Ninject trunk
Upgraded sample app's Solr to nightly
Added StatsComponent support
Added index-time document boosting
Added query-time document boosting
Bugfix: response parsing was not fully culture-independent
All exceptions are now serializable
Fixed potential timeout issue
NHibernate integration
Fixed Not() query operator returning wrong type

These are the interesting new features:

Field collapsing

This is a very cool feature that isn't even included in the Solr trunk. It's currently only available as a patch, but hopefully it will make its way to trunk soon. It allows you to filter query results based on a document field, thus making a flexible duplicate detection.

StatsComponent

This one is a Solr 1.4 feature (currently only available from trunk or nightly builds). Like the name says, it gives you statistics about your numeric fields within your query results. The statistics are: min, max, sum, count, missing (i.e. no value), sum of squares, mean, standard deviation. The cool thing about this is that you can facet it, thus getting separate stats for each value of the field.

Date faceting

This allows you to trigger faceting based on date ranges, i.e. you can create a facet for each day from 8/1/2009 to 9/1/2009.

NHibernate integration

This is similar to the NHibernate.Search project. It synchronizes a database with Solr (if the Solr document fields are similar to a NHibernate entity fields) and it allows you to issue Solr queries from a regular NHibernate ISession (well, actually a ISession wrapper). You can see more details about its usage in the wiki.

Contributors to this release: Derek Watson, Matt Mondok, Juuso Kosonen.

Get it here:

I'll probably call this a GA release in a couple of weeks if there aren't any serious bugs and once I get the wiki updated.

P.S.: I'll take this opportunity to clear some things up about the project. SolrNet started, like many open source projects, as a way to scratch my own itch. But in the last few months, it has grown beyond that. As a result, I don't have a use for many new features so I am not so motivated to implement them and I don't have any chance to test them in the wild to iron out any bugs and to make sure they are release-quality. This means that I need help from the community (yes, that includes you! ;-) in the form of:

patches for new features, bugfixes, documentation, code samples
bug reports
feature requests
general suggestions (e.g. "it would be cooler to do x like this instead of how it's currently done")
voting for issues you consider important/useful in http://code.google.com/p/solrnet/issues/list might boost their priority.
general usage feedback (e.g. "we've been using SolrNet for 3 months now at www.example.com. The features we especially use are: facets, ninject module, more like this")

Trunk is very stable, right now there are 368 tests that cover around 80% of the code. I strongly encourage you to get new builds from the build server (see artifacts links) and let me know how it works out for you (both positive and negative constructive feedback are useful).

Finally, I could offer some basic commercial support if you need some feature urgently and don't have the resources to code it.

Tuesday, June 9, 2009

Analyzing SolrNet with NDepend

I've been using NDepend off and on for the last couple of years. Mainly to diagnose build issues and help me write build scripts for complex legacy applications, since legacy applications with no automated build often fall into cyclic dependencies between different solutions. NDepend's assembly build order feature is great for this. But I've never had the time to really dive into the myriad other features this product offers.

A couple of weeks ago Patrick Smacchia kindly sent me a Pro license for NDepend (thanks Patrick!), so I thought it would be a great opportunity to use it to analyze SolrNet. Now, SolrNet is by all measures a tiny project (ohloh stats are inflated due to Sandcastle, SHFB, etc being in the trunk), but that doesn't mean it can't benefit from NDepend.

First of all, if you're analyzing a library, I recommend that you include your tests in the analysis, so that NDepend can see how the library is actually used. Otherwise, NDepend will suggest that you mark some things as private when it shouldn't. Don't worry about the tests raising warnings in the analysis, you can filter them out as I'll explain later. Plus, having the ability to analyze the tests can be pretty handy too.

For example, we can easily issue some CQL queries to get the code/test ratio:

`SELECT ASSEMBLIES WHERE NameLike "Tests"`	1972 LOC (as defined here)
`SELECT ASSEMBLIES WHERE !NameLike "Tests"`	1550 LOC

So the code:test ratio is 1:1.27, more LOC of tests than code! However, keep in mind that this metric alone doesn't imply a correct coverage.

Browsing the default queries, I find that "Fields that could be declared internal" caught a

public readonly string query;

Oops, fixing right away!

Under "Purity / Immutability / Side-Effects", "Fields should be marked as ReadOnly when possible" showed 29 results, which seemed strange since I always try to make my objects immutable. 24 of these were <>l__initialThreadId fields, which is one of the fields of the iterator that the C# compiler builds when you use yield return. This also happened with the "Methods too big", "Methods too complex" and "Potentially unused methods" metrics.

Of course, you can edit or delete the default CQL queries. For example, the "Potentially unused methods" is defined by default as:

// <Name>Potentially unused methods</Name>
WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE 
 MethodCa == 0 AND            // Ca=0 -> No Afferent Coupling -> The method is not used in the context of this application.
 !IsPublic AND                // Public methods might be used by client applications of your assemblies.
 !IsEntryPoint AND            // Main() method is not used by-design.
 !IsExplicitInterfaceImpl AND // The IL code never explicitely calls explicit interface methods implementation.
 !IsClassConstructor AND      // The IL code never explicitely calls class constructors.
 !IsFinalizer                 // The IL code never explicitely calls finalizers.

We can easily add another condition so that these methods don't bother us: AND !FullNameLike "__"

One of the most useful features of NDepend is comparing two versions of a project. I compared the latest release of SolrNet against trunk. Here's a chart of added methods (83 in total, in blue):

83 methods added

34 methods changed:

methods-changed

API breaking changes (no graphic here, just a list):

Don't worry, these are all internal breaking changes, they won't affect the library consumer...

From these graphics you can immediately see that there aren't really many changes, and they aren't focused. The reason is that most of the changes are minor bugfixes and a couple of minor added features.

I only scratched the surface of what's possible with NDepend, but as you can see, small projects can also profit from it, so check it out!

Friday, May 8, 2009

SolrNet 0.2.2 released

SolrNet is a Solr client for .NET

Here's the changelog from 0.2.1:

Changelog

Bugfix: semicolons are now correctly escaped in queries
Bugfix: invalid xml characters (control chars) are now correctly filtered
Deleting a list (IEnumerable) of documents now uses a single request (requires unique key and Solr 1.3+)
Added support for arbitrary parameters, using the QueryOptions.ExtraParams dictionary. These parameters are pass-through to Solr's query string, so you can use this feature to select a different request handler (using "qt") or use LocalSolr.
Added per-field facet parameters

Breaking change: as a consequence of the previous change, facet queries and other facet parameters were moved to FacetParameters. Instead of:

var r = solr.Query(new SolrQuery("blabla"), new QueryOptions {
	FacetQueries = new ISolrFacetQuery[] {
		new SolrFacetFieldQuery("id") {Limit = 3}
	}
});

Now it's:

var r = solr.Query(new SolrQuery("blabla"), new QueryOptions {
	Facet = new FacetParameters {
	  Queries = new ISolrFacetQuery[] {
	  	new SolrFacetFieldQuery("id") {Limit = 3}
	  }
	}
});

Added a couple of fluenty QueryOptions building methods. Some self-explanatory samples:

new QueryOptions().AddFields("f1", "f2");
new QueryOptions().AddOrder(new SortOrder("f1"), new SortOrder("f2", Order.ASC));
new QueryOptions().AddFilterQueries(new SolrQuery("a"), new SolrQueryByField("f1", "v"));
new QueryOptions().AddFacets(new SolrFacetFieldQuery("f1"), new SolrFacetQuery(new SolrQuery("q")));

Added dictionary mapping support (thanks Jeff Crowder). The defined field name is used as the prefix of the actual Solr field to match. An example:
```
public class TestDoc {
    [SolrUniqueKey]
    public int Id { get; set; }

    [SolrField]
    public IDictionary<string, int> Dict { get; set; }
}
```
With this mapping, a field named "Dictone" will be mapped to Dict["one"], "Dictblabla" to Dict["blabla"] and so on.
Upgraded Windsor facility, now it uses the recently released Windsor 2.0
Merged all SolrNet assemblies (SolrNet, SolrNet.DSL, the Castle facility, the Ninject module and the internal HttpWebAdapters). It was getting too annoying having to reference all those assemblies.
Windsor and Ninject are not packaged anymore. If you use Windsor or Ninject, you already have them in your app so the I'm not packaging them anymore. Only Microsoft.Practices.ServiceLocation.dll is now included, for users that don't use any IoC container (actually they use the built-in container).

Last but not least, don't forget there's a google group for the project, so if you have any issues, suggestions or doubts, feel free to join!

Downloads

Wednesday, April 29, 2009

SolrNet under continuous integration

SolrNet just got hosted at the CodeBetter TeamCity servers for open source projects! 326 tests passed, 20 ignored (most of the latter are integration tests that need a running Solr instance). Now I just need to organize the targets and make the successful builds downloadable.

A big thank you to the people at CodeBetter, JetBrains, IdeaVine and Devlicio.us for this wonderful initiative.

Thursday, February 26, 2009

SolrNet 0.2.1 released

SolrNet is a Solr client for .NET

Here's the changelog from 0.2.0:

Added a couple of Solr 1.3 features:

Spell checking
More like this
Random sorting (actually, this was always possible, I just made it more convenient)

Other enhancements:

"Has any value" queries:
It's often convenient to see what documents have a field defined or not:
```
var q = new SolrHasValueQuery("name"); // translates to name:[* TO *]
```

Fluent interface for query building:

Query.Simple("name:solr"); // translates to name:solr
Query.Field("name").Is("solr"); // name:solr
Query.Field("price").From(10).To(20); // price:[10 TO 20]
Query.Field("price").In(10, 20, 30); // price:10 OR price:20 OR price:30
Query.Field("name").HasAnyValue(); // name:[* TO *]

You can see the spell checker and random sorting in action in the sample web app:

Download links:

Thursday, February 19, 2009

SolrNet 0.2 released

SolrNet is a Solr client for .Net.

Changelog from version 0.1:

Deprecated ISolrDocument interface
Dropped query by example
Dropped random sorting
Added several ways to map solr fields to properties:
- Attribute-based (default)
- All properties
- Manual mapping
Added highlighting
Added filter queries
Added ping
Added sample application
Added Windsor facility
Added Ninject module
Added operator overloading for queries
Added MSDN-style docs
Added more code samples, better organized wiki
Changed initialization and instantiation of the service:
This is the biggest breaking change. As explained in a previous post, newing up SolrServer<T> is no longer supported. Instead, you have to ask the IoC container (either the built-in or whatever container you use) for the ISolrOperations<T> service.

The sample application is pretty basic right now, it includes only the basic features, but it shows how it all fits together. Here's a screenshot with the features highlighted:

For the next releases I'll look into adding Solr 1.3 features like multi-core and spell-checking, as well as making it easier to express queries (maybe build a LINQ provider).

Code is hosted at Google code.

Direct download links:

Wednesday, August 20, 2008

SolrNet with faceting support

I finally added support for facets in SolrNet. There are basically two kinds of facet queries:

querying by field
arbitrary facet queries

Querying by field

ISolrOperations<TestDocument> solr = ...
ISolrQueryResults<TestDocument> r = solr.Query("brand:samsung", new QueryOptions {
    FacetQueries = new ISolrFacetQuery[] {
        new SolrFacetFieldQuery("category")
    }
});

Yeah, kind of verbose, right? The DSL makes it shorter:

ISolrQueryResults<TestDocument> r = Solr.Query<TestDocument>().By("brand").Is("samsung").WithFacetField("id").Run();

To get the facet results, you get a property FacetFields in ISolrQueryResults<T> that is a IDictionary<string, ICollection<KeyValuePair<string, int>>>. The key of this dictionary is the facet field you have queried. The value is a collection of pairs where the key is the value found and the value is the count of ocurrences. Sounds complex? It's not. Let's see an example:

Let's assume that eBay used SolrNet to do its queries (please bear with me :-) ). Let's say a user enters the category Maps, Atlases & Globes, so you want the items within that category, as well as the item count on each subcategory ("Europe", "India", etc) that shows up as "Narrow your results". You could express such a query like this:

var results = Solr.Query<EbayItem>().By("category").Is("maps-atlases-globes")
    .WithFacetField("subcategory")
    .Run();

Now to print the subcategory count:

foreach (var facet in results.FacetFields["subcategory"]) {
    Console.WriteLine("{0}: {1}", facet.Key, facet.Value);
}

Which would print something like this:

United States (Pre-1900): 2123
Europe: 916
World Maps: 650
...

and so on. See? Told you it wasn't hard :-)

Note that by default, Solr orders facet field results by count (descending), which makes sense since most of the time you want the most populated/important terms first. If you want to override that:

ISolrQueryResults<EbayItem> results = Solr.Query<EbayItem>().By("category").Is("maps-atlases-globes")
    .WithFacetField("subcategory").DontSortByCount()
    .Run();

There are other options for facet field queries, I copied the docs from the official Solr documentation.

Arbitrary facet queries

Support for arbitrary queries is not very nice at the moment, but it works:

var priceLessThan500 = "price:[* TO 500]";
var priceMoreThan500 = "price:[500 TO *]";
var results = Solr.Query<TestDocument>().By("category").Is("maps-atlases-globes")
    .WithFacetQuery(priceLessThan500)
    .WithFacetQuery(priceMoreThan500)
    .Run();

Then results.FacetQueries[priceLessThan500] and results.FacetQueries[priceMoreThan500] get you the respective result counts.

Code is hosted at googlecode

Thursday, November 22, 2007

Introducing SolrNet

UPDATE 2/19/2009: by now most of this is obsolete, please check out more recent releases

Last month I've been working my a** off integrating Solr to our main site. The first step was to find out how to communicate with the Solr server. Naturally, I came to SolrSharp. But I found it to be really IoC-unfriendly: lots of inheritance, no interfaces, no unit-tests, so it would have been a real PITA to integrate it to Castle. So, instead of wrapping it, I built SolrNet.

Before explaining how it works, a disclaimer: I'm a complete newbie to Solr, Lucene and full-text searching in general. The code works on my machine and does what I need it to do for the task that I have at hand. This project is not, and might never be, feature complete like SolrSharp. Currently it doesn't support facets (UPDATE 8/20/08: I added facet support) or highlights, and maybe some other stuff. If you absolutely need those features right now, either use SolrSharp or write a patch for SolrNet. However, the next step in the integration is implementing faceted search, so I will definitely implement facets sooner or later.

Usage

First we have to map the Solr document to a class (Solr supports only one document type per instance at the moment). Let's use a subset of the default schema that comes with the Solr distribution:

public class TestDocument : ISolrDocument {
    private ICollection<string> cat;
    private ICollection<string> features;
    private string id;
    private bool inStock;
    private string manu;
    private string name;
    private int popularity;
    private double price;
    private string sku;

    [SolrField("cat")]
    public ICollection<string> Cat {
        get { return cat; }
        set { cat = value; }
    }

    [SolrField("features")]
    public ICollection<string> Features {
        get { return features; }
        set { features = value; }
    }

    [SolrUniqueKey]
    [SolrField("id")]
    public string Id {
        get { return id; }
        set { id = value; }
    }

    [SolrField("inStock")]
    public bool InStock {
        get { return inStock; }
        set { inStock = value; }
    }

    [SolrField("manu")]
    public string Manu {
        get { return manu; }
        set { manu = value; }
    }

    [SolrField("name")]
    public string Name {
        get { return name; }
        set { name = value; }
    }

    [SolrField("popularity")]
    public int Popularity {
        get { return popularity; }
        set { popularity = value; }
    }

    [SolrField("price")]
    public double Price {
        get { return price; }
        set { price = value; }
    }

    [SolrField("sku")]
    public string Sku {
        get { return sku; }
        set { sku = value; }
    } 
}

It's just a POCO with a marker interface (ISolrDocument)^[1] and some attributes: SolrField maps the attribute to a Solr field and SolrUniqueKey (optional) maps an attribute to a Solr unique key field. Let's add a document (make sure you have a running Solr instance first):

[Test]
public void AddOne() {
    ISolrOperations<TestDocument> solr = new SolrServer<TestDocument>("http://localhost:8983/solr");
    TestDocument doc = new TestDocument();
    doc.Id = "123456";
    doc.Name = "some name";
    doc.Cat = new string[] {"cat1", "cat2"};
    solr.Add(doc);
    solr.Commit();
}

Let's see if the document is there:

[Test]
public void QueryAll() {
    ISolrOperations<TestDocument> solr = new SolrServer<TestDocument>("http://localhost:8983/solr");
    ISolrQueryResults<TestDocument> r = solr.Query("*:*");
    Assert.AreEqual("123456", r[0].Id);
}

For more examples, see the tests.

DSL

Since DSLs are such a hot topic nowadays, I decided to give it a try to see what happened. I just defined the syntax I wanted in a test, then wrote the interfaces to comply to the syntax and chain the methods, then built the implementations for those interfaces. The result is pretty much self-explanatory:

[SetUp]
public void setup() {
    Solr.Connection = new SolrConnection("http://localhost:8983/solr");
}

[Test]
public void QueryById() {    
    ISolrQueryResults<TestDocument> r = Solr.Query<TestDocument>().By("id").Is("123456").Run();
}

[Test]
public void QueryByRange() {
    ISolrQueryResults<TestDocument> r = Solr.Query<TestDocument>().By("id").Between(123).And(456).OrderBy("id", Order.ASC).Run();
}

[Test]
public void DeleteByQuery() {
    Solr.Delete.ByQuery<TestDocument>("id:123456");
}

Run() is the explicit kicker method ^[1]. The DSL is defined in a separate DLL, in case you don't want/need it. There are some more examples in the tests.

I TDDd most of the project, so the code coverage is near 75%. I'll add the remaining tests if/when I have the time. Of course, as usual, patches/bugfixes are more than welcome :-)

[1] I might drop this requirement in the future.

Sunday, December 4, 2011

New features

Bugfixes

Breaking changes

Other stuff

Contributors since 0.4.0 alpha 1

Download

Sunday, June 19, 2011

New features

Bugfixes

Breaking changes

Other stuff

Contributors

Thursday, March 31, 2011

Monday, December 13, 2010

Monday, December 6, 2010

Tuesday, November 16, 2010

Tuesday, June 8, 2010

Breaking changes

Field collapsing changes

Changes in ISolrConnection

Indexed results (Highlighting, MoreLikeThis)

Highlight results

Chainable methods on ISolrOperations

Removed NoUniqueKeyException

Removed obsolete exceptions

Removed ISolrDocument interface

Renamed WaitOptions to CommitOptions

New features

LocalParams

Semi loose mapping

Loose mapping

Mapping validation

HTTP-level cache

StructureMap integration

Index-time field boosting

Improved multi-core / multi-instance configuration for the Windsor facility

Contributors

Friday, March 12, 2010

Wednesday, February 10, 2010

Setup

Tests

Conclusion and final notes

Tuesday, December 29, 2009

Tuesday, November 3, 2009

Monday, October 12, 2009

Monday, September 14, 2009

Field collapsing

StatsComponent

Date faceting

NHibernate integration

Tuesday, June 9, 2009

Friday, May 8, 2009

Changelog

Downloads

Wednesday, April 29, 2009

Thursday, February 26, 2009

Thursday, February 19, 2009

Wednesday, August 20, 2008

Querying by field

Arbitrary facet queries

Thursday, November 22, 2007

Usage

DSL

About Me

Contact

Labels

Blog Archive

License