Saturday, October 31, 2009

Visualizing Windsor components dependencies

When working with Windsor on complex applications, it's pretty common to have hundreds or even thousands of components managed by the container. The dependencies can get quite intricate and sometimes you need to see the big picture. Static analyzers like NDepend don't cut it since components are wired by Windsor at runtime.

So here's a little guide to output a PNG of a container's components and dependencies, using QuickGraph and GLEE.

First we need to model the components and dependencies as a regular Dictionary. The key of this dictionary will be a description of the component (a regular string) and the value of this dictionary will be a list of dependencies.

var container = new WindsorContainer();

// ...add components...

var dependencyDict = container.Kernel.GraphNodes
    .Distinct(new EqComparer<GraphNode>((a, b) => a.Describe() == b.Describe(), n => n.Describe().GetHashCode()))
    .ToDictionary(n => n.Describe(), n => n.Dependents.Select(a => a.Describe()));

EqComparer is just a functional IEqualityComparer. Distinct() is needed because when using forwarded types you'd get duplicate dictionary keys.

Now we dump this into a QuickGraph digraph:

var graph = dependencyDict.ToVertexAndEdgeListGraph(kv => kv.Value.Select(n => new SEquatableEdge<string>(kv.Key, n)));

and finally we render the digraph to a PNG using GLEE:


Here's a sample output from SolrNet's inner components (click to zoom):



Only problem with this is that it doesn't take custom subdependency resolvers (like an ArrayResolver) into account, since they can't be modeled in the kernel's GraphNodes.

Here's what it looks like after some ad-hoc fixes:


Here's the full code.

Related articles:


Sunday, October 18, 2009

Git filter-branch with GitSharp

I'm currently working on migrating the Castle project Subversion repository to git. There are already a couple of svn mirrors on github, but this migration is intended to eventually replace svn as the official repository. With over 6000 commits in 5 years of history, it's not a trivial migration.

One of the issues is that all subprojects are currently being split from the main trunk to make them more independent. I'll leave that for another post.

Another issue is the committer mapping. Each svn username needs to be mapped to a github account (name + email). Roelof Blom kindly provided this map, so I was set to import with git-svn. Two days later git-svn finished and I pushed the repository to github.
Much to my dismay, I found that some committers on my repository weren't matching their github accounts. The Ken Egozi on my repo wasn't the same Ken Egozi on github!

I had two options: either fix the user mappings and re-run git-svn or change the committers with git filter-branch. I went with filter-branch as described in the Pro Git book. It was processing about 1 commit/s so I left it working and went to sleep.

The next morning I went to see and it had segfaulted halfway through. Now that did it. I updated my GitSharp fork and wrote this kind of specific filter-branch:

internal class Program {
    private static void Main(string[] args) {
        var committerMap = new Dictionary<string, string> {
            {"Ken Egozi", ""},
            {"Krzysztof Ko┼║mic", ""},
        var repo = Repository.Open(args[0]);
        var refs = repo.getAllRefs()
            .Where(x => x.Key.StartsWith("refs/heads") || x.Key.StartsWith("refs/tags"))
            .ToDictionary(x => x.Key, x => x.Value);

        var commitMap = new Dictionary<string, string>();

        foreach (var r in refs) {
            Console.WriteLine("Processing ref {0}", r.Key);
            var startCommit = repo.MapCommit(r.Value.ObjectId);
            var newHead = commitMap.ContainsKey(r.Value.ObjectId.Name) ? 
                repo.MapCommit(commitMap[r.Value.ObjectId.Name]) : 
                Rewrite(repo, startCommit, commitMap, committerMap);
            var newRef = repo.UpdateRef(r.Value.Name);
            newRef.NewObjectId = newHead.CommitId;
            newRef.IsForceUpdate = true;

    public static Commit Rewrite(Repository repo, Commit startCommit, Dictionary<string, string> commitMap, Dictionary<string, string> committerMap) {
        Commit lastCommit = null;
        var walker = new RevWalk(repo);
        foreach (var rcommit in walker.iterator().AsEnumerable()) {
            var commit = rcommit.AsCommit(walker);
            if (commitMap.ContainsKey(commit.CommitId.Name)) {
                lastCommit = repo.MapCommit(commitMap[commit.CommitId.Name]);
                Console.WriteLine("{0} already visited, skipping", commit.CommitId.Name);
            if (committerMap.ContainsKey(commit.Author.Name))
                commit.Author = new PersonIdent(commit.Author.Name, committerMap[commit.Author.Name], commit.Author.When, commit.Author.TimeZoneOffset);
            var newCommit = new Commit(repo) {
                TreeId = commit.TreeId,
                Author = commit.Author,
                Committer = commit.Author,
                Message = commit.Message,
                ParentIds = commit.ParentIds.Select(x => repo.MapCommit(commitMap[x.Name]).CommitId).ToArray(),
            commitMap[commit.CommitId.Name] = newCommit.CommitId.Name;
            lastCommit = newCommit;
        return lastCommit;

It took this little code 3 minutes to rewrite the whole repository. That's 34 commits/s !

When time allows, I'll try and clean this up, then merge it into the new GitSharp.CLI project. Instead of using a bash script to define transformations like the original filter-branch, this could use an embedded Boo or IronPython script!

DISCLAIMER: The code shown here is completely throwaway quality. It does not intend to be a reference GitSharp app or anything like that. It does not intend to be a general filter-branch replacement. It works on my machine, etc. Do not run this code on your repositories unless you know what you're doing. It will rewrite your whole repository! You have been warned.

Monday, October 12, 2009

Untangling the mess: Solr, SolrNet, NHibernate, Lucene

I've recently received several questions about the relationship between Solr, SolrNet, NHibernate, Lucene, Lucene.Net, etc, how they fit together, how they should be used, what features does each provide. Here's an attempt at elucidating the topic:

Let's start from the bottom up:

  • RDBMS: every programmer knows what these are. Oracle, SQL Server, MySQL, etc. Everyone uses them, to the point that it's often used as a Golden Hammer. RDBMS can be stand-alone programs (client-server architecture) or embedded (running within your application).
  • Lucene was written to do full-text indexing and searching. The most known example of full-text searching is Google. You throw words at it and it returns a ranked set of documents that match those words.
    In terms of data structures, Lucene at its core implements an inverted index, while relational databases use B-tree variants. Fundamentally different beasts.
    Lucene is a Java library, this means that it's not a stand-alone application but instead embedded in your program.
  • Full-text functions in relational databases: nowadays almost all major RDBMS offer some full-text capabilities: MySQL, SQL Server, Oracle, etc. As far as I know, they are all behind Lucene in terms of performance and features. They can be easier to use at first, but they're proprietary. If you ever need some advanced feature, switching to Lucene could be a PITA.
  • Lucene.Net is a port of Java Lucene to the .Net platform. Nothing more, nothing less. It aims to be fully API compatible so all docs on Java Lucene can be applied to Lucene.Net with minimal translation effort. Index format is also the same, so indices created with Java Lucene can be used by Lucene.Net and vice versa.
  • NHibernate is a port of Java Hibernate to the .Net platform. It's an ORM (object-relational mapper), which basically means that it talks to relational databases and maps your query results as objects for easier consumption in object-oriented languages.
  • NHibernate.Search is a NHibernate contrib project that integrates NHibernate with Lucene.Net. It's a port of the Java Hibernate Search project. It keeps a Lucene index in sync with a relational database and hides some of the complexity of raw Lucene, making it easier to index and query.
    This article explains its basic usage.
  • Solr is a search server. It's a stand-alone Java application that uses Lucene to provide full-text indexing and searching through a XML/HTTP interface. This means that it can be used from any platform/language. It can be embedded in your own Java programs, but it's not its primary design purpose.
    While very flexible, it's easier to use than raw Lucene and provides features commonly used in search applications, like faceted search and hit highlighting. It also handles caching, replication, sharding, and has a nice web admin interface.
    This article is a very good tour of Solr's basic features.
  • SolrNet is a library to talk to a Solr instance from a .Net application. It provides an object-oriented interface to Solr's operations. It also acts as an object-Solr mapper: query results are mapped to POCOs.
    The latest version also includes Solr-NHibernate integration. This is similar to NHibernate.Search: it keeps a Solr index in sync with a relational database and lets you query Solr from the NHibernate interface.
    Unlike NHibernate and NHibernate.Search, which can respectively create a DB schema and a Lucene index, SolrNet can't automatically create the Solr schema. Solr does not have this capability yet. You have to manually configure Solr and set up its schema.

In case this wasn't totally clear, here's a diagram depicting a possible NHibernate-SolrNet architecture:

Diagram made with gliffy!