Bug squash: 2013

Monday, October 21, 2013

Towards a NuGet dependency monitor with OData and F#

Every now and then I bump into the Hackage Dependency Monitor and wish there was something similar for NuGet. Node.js also has something like this with David. As a maintainer of several packages, it would come in handy if something would tell me when a dependency gets outdated, instead of having to find out through a bug report or reading the new release announcement by coincidence.

Since NuGet exposes its repository through an OData feed, and .NET has some good facilities to query these, I thought it’d be fun to try to implement the core of a dependency monitor.

Exploration

I started out with LINQPad, which makes exploration as easy as it gets. Click “add connection”, select the OData driver, and enter the feed URL https://nuget.org/api/v2 . That’s it. Open the connection and it shows the available tables (only "Packages" in this case) and their schema:

For example, we can find out how many projects are hosted on Github:

Packages 
.Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion && x.ProjectUrl.Contains("github")) 
.Count()

Here are the figures for the most popular project hosting services (for the packages that have the ProjectUrl property defined):

Github: 5378
Codeplex: 2658
Google: 483
Bitbucket: 415

We can also see that dependencies are stored as a string… what does the content look like? Let’s find out:

This means that we’ll have to parse this dependency string and run another query to fetch and analyze dependencies. So let’s switch to the F# REPL to write some code more comfortably.

Some helpers

NuGet has some functions that will come in handy to parse these dependency versions:

#r "System.Xml.Linq"
#r @"g:\prg\SolrNet\lib\NuGet.exe"

open NuGet

let split (c: char) (x: string) = x.Split c

let parseDependencies : string -> seq<string * IVersionSpec> = 
    split '|' >> Seq.map (split ':' >> fun x -> x.[0], VersionUtility.ParseVersionSpec x.[1])

To query the NuGet feed we’ll use the OData type provider.

Unfortunately, WCF Data Services doesn’t support Contains(), which we need to get all the dependencies for a package in a single query. As an example, running this on LINQPad:

Packages.Where(x => new[] {"GDataDB", "FSharpx.Core"}.Contains(x.Id))

throws NotSupportedException: the method ‘Contains’ is not supported.

The type provider uses the same query translator underneath so it has the same limitation. OData v3 supports the Any() operator, but it seems that the NuGet feed is OData v2, so that doesn’t work either. Anyway, we can implement this with some quotation manipulation:

open Microsoft.FSharp.Quotations
open Microsoft.FSharp.Quotations.Patterns

let inList (memberr: Expr<'a -> 'b>) (values: 'b list) : Expr<'a -> bool> =
    match memberr with
    | Lambda (_, PropertyGet _) -> 
        match values with
        | [] -> <@ fun _ -> true @>
        | _ -> values |> Seq.map (fun v -> <@ fun a -> (%memberr) a = v @>) |> Seq.reduce (fun a b -> <@ fun x -> (%a) x || (%b) x @>)
    | _ -> failwith "Expression has to be a member"

You can compare this with the equivalent C# code, it really shows how expression splicing makes the F# code much clearer.

Also, just for kicks, let’s run the query asynchronously. This doesn’t make much difference now as we’ll run this in the REPL, but it would be useful if this were to be used in a server.

#r "System.Data.Services.Client"
#r "FSharp.Data.TypeProviders"

open System.Linq
open System.Data.Services.Client

let execQueryAsync (q: _ DataServiceQuery) = 
    Async.FromBeginEnd(q.BeginExecute, q.EndExecute)

Main code

Ok, enough with the helper functions. Here’s the main function:

type nuget = Microsoft.FSharp.Data.TypeProviders.ODataService<"https://nuget.org/api/v2">
type Package = nuget.ServiceTypes.V2FeedPackage
let ctx = nuget.GetDataContext()

let checkDependencies packageId =
    async {
        let packagesQuery = 
            query {
                for p in ctx.Packages do
                where (p.Id = packageId && p.IsAbsoluteLatestVersion && p.IsLatestVersion)
                select p
            } :?> DataServiceQuery<Package>
        let! packages = execQueryAsync packagesQuery

        let package = Seq.exactlyOne packages

        let deps = parseDependencies package.Dependencies |> Seq.toList
        let depIds = List.map fst deps
        let depsQuery = 
            query {
                for p in ctx.Packages do
                where (((%(inList <@ fun (x: Package) -> x.Id @> depIds)) p) && p.IsAbsoluteLatestVersion && p.IsLatestVersion)
                select p
            } :?> DataServiceQuery<Package>
        let! depPackages = execQueryAsync depsQuery
        let depPackagesList = Seq.toList depPackages
        let satisfies =
            deps |> List.map (fun (depId, version) -> 
                                let depPackage = depPackagesList |> Seq.find (fun p -> p.Id = depId)
                                let semVersion = SemanticVersion.Parse depPackage.Version
                                depId, version, version.Satisfies semVersion)
        return satisfies
    }

This returns a list of tuples where the first element is the package ID of the dependency, the required version for that dependency, and a boolean saying if the dependency is outdated (false) or not (true).

Let’s try an example:

checkDependencies "GDataDB" |> Async.RunSynchronously

Gives:

[("Google.GData.Client", [2.1.0.0] {IsMaxInclusive = true;
                                      IsMinInclusive = true;
                                      MaxVersion = 2.1.0.0;
                                      MinVersion = 2.1.0.0;}, false);
   ("Google.GData.Extensions", [2.1.0.0] {IsMaxInclusive = true;
                                          IsMinInclusive = true;
                                          MaxVersion = 2.1.0.0;
                                          MinVersion = 2.1.0.0;}, false);
   ("Google.GData.Documents", [2.1.0.0] {IsMaxInclusive = true;
                                         IsMinInclusive = true;
                                         MaxVersion = 2.1.0.0;
                                         MinVersion = 2.1.0.0;}, false);
   ("Google.GData.Spreadsheets", [2.1.0.0] {IsMaxInclusive = true;
                                            IsMinInclusive = true;
                                            MaxVersion = 2.1.0.0;
                                            MinVersion = 2.1.0.0;}, false)]

Uh-oh, I better update those dependencies!

Conclusion

So this looks simple enough, right? However, I consider this just a spike, it’s not really robust. What happens if the package doesn’t exist? Exception. Ill-defined dependencies? Exception.

Still, it shouldn’t be too hard to make this more robust, then put it in a web server and done! Easier said than done ;-)

Also, it seems that the NuGet v3 API won't be based on OData so this whole experiment might need to be rewritten soon.

Anyway, here's the entire code for this post.

Appendix: WCF Data Services criticism

I generally dislike expression-based translators (i.e. IQueryable) because they’re usually eminently partial, i.e. you have to guess what’s supported, read the docs very carefully, or run your code in a test and see what happens. Otherwise you get exceptions everywhere. The compiler can’t do much and it’s never quite clear what will execute on the client and what will be translated and executed on the server. This hurts your ability to reason about the code, which in turn means more programming by coincidence.

WCF Data Services takes this to pathological levels. While exploring the NuGet feed in LINQPad I found many simple expressions that should have worked but didn’t. A few examples:

Packages.Where(x => new[] {"GDataDB", "FSharpx.Core"}.Contains(x.Id))

This is the one I mentioned earlier. I don’t see why the expression translator couldn’t do what I did and compile this to a chain of OR’ed expressions.

Even the simplest projection fails:

Packages.Select(x => x.Id)

Throwing NotSupportedException: Individual properties can only be selected from a single resource or as part of a type. Specify a key predicate to restrict the entity set to a single instance or project the property into a named or anonymous type.

I have no idea what the first part of that error means, but projecting to an anonymous type works:

Packages.Select(x => new {x.Id})

The official explanation for this is that the OData protocol doesn’t support it, but again, it would seem that this is the job of the expression translator and the protocol or server side of things has little to do with it.

By the way, if you happen to add a Take() operator you get a totally different exception:

Packages.Select(x => x.Id).Take(20)

Throws an InvalidCastException: Unable to cast object of type 'System.Data.Services.Client.NavigationPropertySingletonExpression' to type 'System.Data.Services.Client.ResourceSetExpression'.

Or add a condition and you get yet another different error:

Packages
.Where(x => x.IsLatestVersion && x.IsAbsoluteLatestVersion)
.Select(x => x.Id)
.Take(10)

NotSupportedException: Can only specify query options (orderby, where, take, skip) after last navigation.

Which is incorrect, since replacing the projection in the above expression with an anonymous type works fine.

In this other example, the library doesn’t process negated conditions correctly, which causes a cryptic server-side exception:

Packages
.Where(x => !x.ProjectUrl.Contains("google"))
.Select(x => new {x.Id})

“An error occurred while processing this request. InnerException: Rewriting child expression from type 'System.Nullable`1[System.Boolean]' to type 'System.Boolean' is not allowed, because it would change the meaning of the operation. If this is intentional, override 'VisitUnary' and change it to allow this rewrite.”

The generated URL from this expression is: https://nuget.org/api/v2/Packages()?$filter=not substringof('google',ProjectUrl)&$select=Id which apparently isn’t supported server-side.

But change the condition from using the “not” operator to “== false” and everything is magically fixed:

Packages
.Where(x => x.ProjectUrl.Contains("google") == false)
.Select(x => new {x.Id})

Generated URL: https://nuget.org/api/v2/Packages()?$filter=substringof('google',ProjectUrl) eq false&$top=10&$select=Id

These two expressions are logically equivalent, but one works and the other one fails:

Packages
.Where(x => x.IsLatestVersion)
.Select(x => new {x.IsLatestVersion})

Packages
.Select(x => new {x.IsLatestVersion})
.Where(x => x.IsLatestVersion)

The second one fails with: “NotSupportedException: The filter query option cannot be specified after the select query option.”.

These are all very simple expressions and I found all of these issues in about one hour of experimentation in LINQPad (for reference, LINQPad v4.47.02 using WCF Data Services 5.5), so be very careful and test every single call if you have to use WCF Data Services. And keep in mind that if you use the OData F# type provider, you’re also using WCF Data Services, so the same warning applies.

Wednesday, August 28, 2013

Objects and functional programming

In a recent question on Stackoverflow, someone asked “when to use interfaces and when to use higher-order functions?”. To summarize, it’s about deciding whether to design for an interface or a function to be passed. It’s expressed in F#, but the same question and arguments can be applied to similar languages like C#, VB.NET or Java.

Functional programming is about programming with mathematical functions, which means no side-effects. Some people say that “functional programming” as a paradigm or concept isn’t useful, and all that really matters is being able to reason about your code. The best way I know to reason about my code is to avoid side-effects or isolate them as much as possible.

In any case, none of this says anything about objects, classes or interfaces. You can represent functions however you like. You can write pure code with objects or without them. OOP is effectively orthogonal to functional programming. In this post I'll use the terms 'objects', 'classes', 'interfaces' somewhat interchangeably, the differences don't matter in this context. Hopefully my point still gets across.

Higher-order functions are of course a very useful tool to raise the level of abstraction. However, many perhaps don’t realize that any function receiving some object as argument is effectively a higher-order function. To quote William Cook in “On Understanding Data Abstraction, Revisited”:

“Object interfaces are essentially higher-order types, in the same sense that passing functions as values is higher-order. Any time an object is passed as a value, or returned as a value, the object-oriented program is passing functions as values and returning functions as values. The fact that the functions are collected into records and called methods is irrelevant. As a result, the typical object-oriented program makes far more use of higher-order values than many functional programs.”

So in principle there is little difference between passing an interface and passing a function. The only difference here is that an interface is named and has named functions, while a function is anonymous. The cost of the interface is the additional boilerplate, which also means having to keep track of one more thing.

Even more, since objects typically have many functions, you could say that you’re not just passing functions as values, but passing modules as values. To put it clearly: objects are first-class modules.

As an aside, the term “first-class value” doesn’t have a precise definition, but I find it useful to wield it with the definition given in MSDN or the Wikipedia.

Objects are also parametrizable modules because constructors can take parameters. If a constructor takes some other object as parameter, then you could say that you’re parameterizing a module by another module.

In contrast to this, F# modules (more generally, static classes in .NET) are not first-class modules. They can’t be passed as arguments, you can’t do something like creating a list of modules. And they can’t be parameterized either.

So why do we even bother with modules if they’re not first-class? Because it’s easier to pick just one function out of a module to use or to compose. Object composition is more coarse-grained. As Joe Armstrong famously said: “You wanted a banana but you got a gorilla holding the banana”.

Back to the Stackoverflow question, what’s the difference between:

module UserService =
   let getAll memoize f =
       memoize(fun _ -> f)

   let tryGetByID id f memoize =
       memoize(fun _ -> f id)

   let add evict f name keyToEvict  =
       let result = f name
       evict keyToEvict
       result

and

type UserService(cacheProvider: ICacheProvider, db: ITable<User>) =
   member x.GetAll() = 
       cacheProvider.memoize(fun _ -> db |> List.ofSeq)

   member x.TryGetByID id = 
       cacheProvider.memoize(fun _ -> db |> Query.tryFirst <@ fun z -> z.ID = ID @>)

   member x.Add name = 
       let result = db.Add name
       cacheProvider.evict <@ x.GetAll() @> []
       result

The first one has some more parameters passed in from the caller, but you can imagine what it would look like. I probably wouldn’t arrange things like either of them, but for one, both lack side-effects. To the first one, you can pass pure functions. To the second one, you can pass implementations of ICacheProvider and ITable with pure functions.

However if you take a good look at the second one, you’ll see that every method uses both cacheProvider and db. So in this case it’s not so bad to pass a couple of gorillas. And it gives the reader a lot more information about what’s being composed, as opposed to a signature like

add : evict:('a -> unit) -> f:('b -> 'c) -> name:'b -> keyToEvict:'a -> 'c

To summarize: The beauty of functional programming lies in being able to reason about your code. One of the easiest ways to achieve this is to write code without side-effects. Classes, interfaces, objects are not opposed to this. In object-capable languages, objects can be a useful tool. Here I talked about objects as modules, but they can model other things too, like records or algebraic data types. They can be easily overused though, especially by programmers new to functional programming. Consider carefully if you want to be juggling gorillas rather than bananas!

Saturday, August 3, 2013

Book review: Apache Solr for Indexing Data How-to

A few days ago I kindly received a copy of the book “Apache Solr for Indexing Data How-to” by Alexandre Rafalovitch for review. Here are my impressions about it.

Solr, by now a nine-year old project, is a powerful piece of software, with lots of high-level features and facilities for text-centric data. And it builds on Lucene, itself an 11-year-old stand-alone project.

At 80 pages, “Apache Solr for Indexing Data How-to” doesn’t try to cover all the features. Instead, it focuses on indexing, that is, getting data from some source (Relational database, text files, etc) into Solr. This is of course a major part of using Solr.

When starting out with Solr, most people first follow the official tutorial, but then feel lost when faced with real-world requirements. The official wiki docs have greatly improved in the last few years but there’s still a large gap between the tutorial and the docs. The reference guide is also great but for a novice it may seem daunting at first. You can see this in many questions on Stackoverflow. This book helps close that gap a bit, at least the part about getting your data into Solr.

You can read it like a cookbook, as a guidance for specific indexing scenarios. As a good “how-to” book, each section starts with a short introduction, then a step-by-step guidance on how to get to the goal, and a “how it works” section explaining everything. An additional section adds tips and further references about each subject.

Of course you can also read it like a regular book. It starts with the most basic scenario, picking up where the tutorial leaves off, and then dives into more complex scenarios. All examples are on github so you can follow on a concrete instance of Solr while reading. The book is written for Solr 4.3. As of now Solr 4.4 is already out and 4.5 is coming soon, but don’t worry, the dev team seems to follow Semantic Versioning so there aren’t any breaking changes.

One problem with this kind of books is that often they can’t focus just on the main topic (in this case, indexing) without at least touching on other topics. Indexing is related to the Solr schema, which in turn is a function of the search needs of your application. This book dabbles in faceting and searching when the scenario demands it, but otherwise acknowledges its limited scope and refers the reader to other books or the reference documentation when appropriate, so you never feel lost.

Another issue is the simplification of some scenarios in order to focus on operative topics and avoid scope creep. For example, the section on indexing data from a relational database uses an example where the database has only one table, no foreign keys. In most real-world scenarios you’ll have lots of related database tables which you’ll have to denormalize and flatten depending on your search needs.

Overall, I think “Apache Solr for Indexing Data How-to” is great for a novice in Solr. It’s a simple, concrete guide to indexing which is one the first things you do with Solr. Just don’t expect it to be all-comprehensive: it doesn’t cover all scenarios and you should read it along the docs to truly understand the concepts at work. It’s designed to help you move forward when, as a beginner, everything looks too complex and you have no idea what to do.

The tutorial will get you started, but this book will get you going.

Thursday, June 20, 2013

Optimizing with the help of FsCheck

Recently I needed a function to transpose a jagged array in F#. As I knew Haskell probably had this in its standard library and I'm lazy, I hoogled “transpose” and followed the link to its source code, then translated it to F#:

module List =
    let tryHead = 
        function
        | [] -> None
        | x::_ -> Some x

    let tryTail =
        function
        | _::xs -> Some xs
        | _ -> None

    let rec transpose =
        function
        | [] -> []
        | []::xs -> transpose xs
        | (x::xs)::xss -> (x::(List.choose tryHead xss))::transpose (xs::(List.choose tryTail xss))

The end.

Oh wait, I needed to process arrays, not lists. Well, I suppose we could convert the jagged array to a jagged list, and then back to an array:

let transpose x = x |> Seq.map Array.toList |> Seq.toList |> List.transpose |> Seq.map List.toArray |> Seq.toArray

It works, but it's a bit inefficient. For example, it does a lot of unnecessary allocations. Let's time it with a big array:

let bigArray = Array.init 3000 (fun i -> Array.init i id)
transpose bigArray |> ignore

Real: 00:00:01.369, CPU: 00:00:01.357, GC gen0: 79, gen1: 44, gen2: 1

We can do better. Since we're working with arrays, we could calculate the length of each array and preallocate it, then copy the correponding values. Problem is, that kind of code is very imperative, and tricky to get right.
Enter FsCheck. With FsCheck we can use the inefficient implementation as a model to ensure that the new, more efficient implementation is correct. It's very easy too:

let transpose2 (a: 'a[][]) = a // placeholder for the efficient implementation
FsCheck.Check.Quick ("transpose compare", fun (a: int[][]) -> transpose a = transpose2 a)

Run this and FsCheck will generate jagged arrays to compare both implementations. As the new implementation is merely a placeholder for now, it won't take long to find a value that fails the comparison:

transpose compare-Falsifiable, after 1 test (0 shrinks) (StdGen (1547388233,295729162)):
[|[||]|]

Now that we have a simple but strong test harness we can confidently implement the optimized function. And many failed test runs later, this passes the 100 (by default) test cases generated by FsCheck:

let transpose2 (a: 'a[][]) =
    if a.Length = 0 then 
        [||]
    else
        let r = Array.zeroCreate (a |> Seq.map Array.length |> Seq.max)
        for i in 0 .. r.Length-1 do
            let c = Array.zeroCreate (a |> Seq.filter (fun x -> Array.length x > i) |> Seq.length)
            let mutable k = 0
            for j in 0 .. a.Length-1 do
                if a.[j].Length > i then
                    let v = a.[j].[i]
                    c.[k] <- v
                    k <- k + 1
            r.[i] <- c
        r

See, I told you it would be all imperative and messy! But is it more efficient?

transpose2 bigArray |> ignore

Real: 00:00:00.218, CPU: 00:00:00.218, GC gen0: 3, gen1: 2, gen2: 0

Yes it is (at least with this very unscientific benchmark).

BONUS: are you wondering why the original definition uses List.choose tryHead instead of List.map List.head ? That is:

let rec transpose =
    function
    | [] -> []
    | []::xs -> transpose xs
    | (x::xs)::xss -> (x::(List.map List.head xss))::transpose (xs::(List.map List.tail xss))

FsCheck can help with that too! Simply run this against the generated inputs and ignore the result (a trivial test):

FsCheck.Check.Quick("transpose", List.transpose >> ignore)

And watch it fail:

transpose-Falsifiable, after 1 test (6 shrinks) (StdGen (1278231111,295729178)):
[[true]; []]
with exception:
System.ArgumentException: The input list was empty.
Parameter name: list
   at Microsoft.FSharp.Collections.ListModule.Head[T](FSharpList`1 list)
   at Microsoft.FSharp.Primitives.Basics.List.map[T,TResult](FSharpFunc`2 mapping, FSharpList`1 x)

Note how FsCheck shrinks the value to the simplest possible failing case. This usage is great to find unwanted partial functions.

One last comment: none of this is specific for F#. FsCheck has a nice C# interface! You could also use it from Fuchu.

Friday, February 15, 2013

In defense of VB.NET

At the risk of being unpopular, I'll say this: VB.NET does not suck. At all.
No, I won't try to convince you to switch to VB.NET from C# or F#. I won't even tell you that VB.NET is great. But the amount of hate directed towards VB.NET seems unwarranted. Take a look at these recent quotes gathered from a cursory search:

"We identified the presence of VB .NET as our primary source of pain on the development team and committed ourselves to eliminating it."

I can't believe it, but I'm about to write some vb.net.Would someone please stop by and put me out of my misery?!
— Kevin Brill (@kevinbrill) December 13, 2012

Ah crap, I just remembered we code in vb.net here. #sadface
— Mark Funk (@MarkFunk) January 2, 2013

Lua, Python, Java, bash, TSQL, and (against my will) VB.NET #code2012
— Andrew Anderson (@andmatand) December 31, 2012

I know I sound like a cranky, old man, but I know how to write .NET code and VB is crippled. I would never recommend it.
— H. Alan Stevens (@alanstevens) December 30, 2012

RT @darrinbrandon: Man...its been a long time since I wrote any VB.Net code. // I'm sorry dude.
— David Green (@davidjeet) January 12, 2013

. @emilcardell Vad låååågt! "Emil Cardell has endorsed you for VB.NET!"
— Mikael Östberg (@MikaelOstberg) January 11, 2013

VB.Net… Eso es un lenguage de programación? #NastyShit
— Josep Rivera (@darcjrt) January 10, 2013

RT @plip @blowdart @kendallmiller You don't so much write in VB.NET as wipe your face on the keyboard until the drool fuses it
— Gemma R Cameron (@ruby_gem) January 7, 2013

@julianbirch @hlship @softmodeling And VB.NET ... ah ... perhaps is the ugliest, fatter and more inconsistent language of all.
— Rodrigo Salinas (@rodrigosalinas) November 18, 2012

"C# is far superior for real programmers. VB is far superior for script kitties" (sic)

"To me, Visual Basic seems clumsy, ugly, error-prone, and difficult to read."

C# is to VB.NET what BluRay is to VHS.
— Cari Carpenter (@thecodergirl) February 11, 2013

Ugh.... Have to write something in VB.NET... Someone hurt me....
— Seth Juarez (@sethjuarez) February 11, 2013

It's too bad that few people cite concrete reasons for this hatred. Some mention the verbose syntax as an issue.

Syntax

Well, of course it's better to have a compact syntax to express common ideas, but the thing is, in the great landscape of statically typed programming languages, C# and VB.NET have roughly the same verbosity. Scala and ML languages (including F#) are usually considerably less verbose, and Haskell can be even less verbose.

Whether you like the syntax of a particular language or not is a highly subjective matter. For example, lots of people despise semicolons in C-like syntaxes, the eternal javascript semicolon flamewar is evidence of this. Some other people hate curly braces.
Some particular things I personally find interesting about VB.NET syntax compared to C-like languages:

It doesn't require parens around conditions in ifs. This is similar to Haskell, ML languages, Pascal, Boo, Python.
Type declarations come after variables and function names, as in Nemerle, Pascal, Scala, Boo, ML languages, Gosu, Kotlin, TypeScript.
Martin Odersky, creator of Scala, claims this increases readability. Of course, not everyone likes it.
Uses "Not" as the boolean negation operator instead of '!', which may be missed when reading code. Because of this, some people programming in C-like languages even go as far as preferring "a == false". Other languages that chose to use 'not' instead of a symbol include Pascal, ML languages, Haskell, Boo, Python.
On a related note, by writing "a == false" you risk mistyping "a = false", which is an assignment, not a comparison, yet is usually valid code. For example in C#:
```
bool a = false;
if (a = false) {}
```
You get a warning: Assignment in conditional expression is always constant; did you mean to use == instead of = ?.
In VB.NET it's simply not possible to confuse assigment with comparison in this context:
```
If b = False Then ...
```
is always a comparison.
Uses "mod" instead of '%' for the modulo operation. Wikipedia has a neat table showing how this operation is represented in various languages. I really wonder how a percent sign ended up being the sign for a modulo operation.
Uses the keyword "Inherits" to indicate class inheritance, and "Implements" to indicate interface implementation. C# follows the C++ notation for inheritance, and makes no distinction between interface and class inheritance. VB.NET is like Java in this particular aspect of syntax. F# also makes the distinction between interface and class inheritance.
Has a separate, specific operator for string concatenation ('&'). Using '+' for string concatenation is usually critized since it's not a commutative operation as expected by something represented by a plus sign. Some other languages with a separate string concatenation operator: PHP, Perl ('.'), OCaml ('^'), ISO/ANSI SQL ('||'). Dart even removed '+' for strings. Personally I think it's a shame that VB.NET also allows '+' to do string concatenation.
And/AndAlso, Or/OrElse. Many people seem to trip over And/Or being non-short-circuit. But they seem to forget that C#/Java also have a non-short-circuit boolean operator, which is also shorter than the corresponding short-circuit operator. They're '&' and '|'. VB.NET's AndAlso/OrElse is exactly as in SML and Erlang. Oz has andthen/orelse, Ada has "and then" / "or else".
C-style for-loops are very flexible but also complex and known to be error-prone. I hardly ever use C-style for-loops in C# anymore, preferring foreach (var i in Enumerable.Range(0, 5)) when it's simply about iterating over a range. In VB.NET you just do For i = 0 To 4
Braces in C-like syntaxes are a source of endless style discussions, and also known to be error-prone. There's no such ambiguity in VB.NET.
VB.NET is case-insensitive. Whether this is good or bad has been debated to death. I can only say that personally, after several years of writing code in case-insensitive languages (Basic, Pascal, SQL) and even more years using many case-sensitive languages, I never found any problems with either case sensitivity or insensitivity.

I mention other languages with similar syntax decisions to emphasize that VB.NET isn't really very different or unique, and not everything about VB.NET's syntax is bad. Sometimes, a bit of verbosity can be a good thing. For example, the Haskell and F# communities generally acknowledge that point-free style, while usually more concise, is not always desirable and can lead to unwanted obfuscation.

Features

No, I'm not going to just list all the features of VB.NET. You can check MSDN for that. Instead, I'll try to describe the features I personally use or find interesting and how I use them.

Functional programming

In 2006, Erik Meijer announced that "Functional programming has reached the masses; it's called Visual Basic". In 2007, with the release of .NET 3.5, VB.NET (and C#) gained some features that help with functional programming: better syntax for anonymous functions, syntax sugar for monads (LINQ), local type inference.

While not strictly only about the language itself, having LINQ as part of the standard library is simply terrific. Even Lisp and OCaml code can be more complex and verbose than VB.NET with LINQ. Take a look at this Stackoverflow question. The VB.NET solution is not much more complex than the F# code:

        System.IO.File.ReadLines("ssqHitNum.txt").
            SelectMany(Function(s) s.Split(" "c).Skip(1)).
            GroupBy(Function(x) x).Select(Function(x) New With {x.Key, x.Count}).
            OrderByDescending(Function(x) x.Count).
            Take(18)

Related interesting language features include quotations (i.e. System.Linq.Expressions) and language integrated comonads (more widely known as async/await for Tasks).

Some seem to have taken Erik Meijer's announcement as tongue-in-cheek, but for me it has become a reality: thanks to language support and libraries (LINQ and FSharpx), I regularly use persistent data structures, higher-order functions, Option (Maybe) and Choice (Either) monadically, validation applicative functors, in my day job to write code without side-effects. Granted, it will never be as powerful as Haskell or as convenient as F#, but VB.NET has definitely become a usable functional language. Most of what I wrote about functional programming can be easily translated to VB.NET.

XML literals

The most salient and famous feature of VB.NET compared to C# is probably XML literals. I'm using this to model and transform HTML: it's safer and more composable than string-based "template engines" like Razor. You can see some of this in action in three of my open source projects: NHibernate web console, Quartz.NET web console and Boo web console. Underneath these little projects is a tiny web library I designed specifically for embedded ASP.NET modules. I'll probably blog about it in more detail some other day. Anyway, the "view engine" in this web library is just a 300-line file consisting of a bunch of stand-alone functions around System.Xml.Linq.

Type inference

Type inference seems to be slightly more powerful in VB.NET than C#. We can write this in F#:

let f x = x * x

and F# will infer a function int -> int

In C#, if we try to write:

var f = x => x * x;

the compiler complains that it "cannot assign lambda expression to an implicitly-typed local variable". (int x) => x * x doesn't work either. We have to fully declare the parameter and return types:

Func<int, int> f = x => x * x;

In strict, infer mode VB.NET (what I always use), the compiler will infer the return type for you:

Dim f = Function(x As Integer) x * x

Sure it's no big deal when it's an int, but when the return type is something like FSharpChoice<int, NonEmptyList<string>> it's not funny any more. Luckily, you hack around this C# limitation with a simple helper function (FSharpx includes this by the way).

Constants also enjoy better type inference in VB.NET. Const x = 2 is enough for VB.NET to infer that x is an integer. Not so in C#, you have to be explicit about the type: const int x = 2;

In VB.NET, Nothing doubles as default(T) in C#, that is, the type T is inferred by the compiler.

Project-wide imported namespaces

Aren't you tired of having these lines at the top of all your C# files?

using System;
using System.Collections.Generic;
using System.Linq;

You probably have much more than that on any typical project: System.Web, System.IO, your favorite ORM... VB.NET can cut down the number of annoying repeated open namespaces by storing them once in the project properties. Of course, this should not be abused.

Modules and static imports

VB.NET has modules which are similar to F# modules. They compile to sealed classes with static methods. By default, modules are internal, its functions are public, and other values are private.

When doing functional programming in C#, I end up with a lot of static classes and methods representing pure functions that don't close over any values (otherwise I just write a regular class with readonly fields). It gets annoying having to write "static" all the time. VB.NET modules solve that, giving you a programming model closer to F#. The only annoyance is that VB.NET modules are implicitly opened (as in [<AutoOpen>] in F#) when the containing namespace is open.

You can also import any class and get direct access to its static members without an explicit qualified reference to their class, just as when you open a module in F#. For example, I regularly use static functions in FSharpx so for example I just import FSharpx.FSharpOption to avoid having to write FSharpOption.ParseInt all the time.

Conclusion

If you read this far and are thinking "yeah, but these are still minor differences compared to C#", you are absolutely right. There's little difference between C# and VB.NET. If VB.NET sucks, it sucks as much as C#. Next time you start a new .NET project, consider using F# as the default language, and you'll see a real difference compared to C# / VB.NET.

VB.NET is no toy language. Other than the ones I mentioned, it has lots of features (many inherited from .NET, but still) that we now take for granted: garbage collection, parametric polymorphism with co/contravariance (which was considered by many an academics-only feature not long ago), first-class functions, local type inference.

Other features I didn't mention because I don't find them too special: multiple-method interface implementation, exception filters, and many others in this Stackoverflow question.

I don't use it myself, by VB.NET also works quite nicely as a dynamic programming language. As Erik Meijer put it in this Powerpoint presentation (2006): "static typing where possible, dynamic typing where necessary".

So if you're "stuck" with VB.NET for whatever reason (legacy code, corporate policy), your code does not necessarily have to be crap because of the language.

As a last remark: I find it amazing that today's VB.NET evolved quite directly from a language that looked like this in 1979:

10 INPUT "NAME"; A$  
20 PRINT "HELLO " A$

(this is Commodore BASIC V2, the original BASIC is older than that)

Bug squash

Monday, October 21, 2013

Towards a NuGet dependency monitor with OData and F#

Exploration

Some helpers

Main code

Conclusion

Appendix: WCF Data Services criticism

Wednesday, August 28, 2013

Objects and functional programming

Saturday, August 3, 2013

Book review: Apache Solr for Indexing Data How-to

Thursday, June 20, 2013

Optimizing with the help of FsCheck

Friday, February 15, 2013

In defense of VB.NET

Syntax

Features

Functional programming

XML literals

Type inference

Project-wide imported namespaces

Modules and static imports

Conclusion

About Me

Contact

Labels

Blog Archive

License