Friday, July 15, 2011

Composability in HTML generation

I'm bored... I know, let's write write a web page with Wing Beats to show Seth MacFarlane's (creator of Family Guy) tweets! We'll use this nice jQuery plugin to pull the actual tweets.

open WingBeats.Xml
open WingBeats.Xhtml

e = XhtmlElement() let s = e.Shortcut let page = [ e.DocTypeHTML5 e.Html [ e.Head [ e.Title [ &"Tweets!" ] ] e.Body [ e.P [ s.JavascriptFile "" s.JavascriptFile "jquery.tweet.js" e.Div ["id", "tweet"] [] e.Script [ &(@"$(function(){ $('#tweet').tweet({ username: 'sethmacfarlane', avatar_size: 32, count: 4, loading_text: 'searching twitter...' }); });") ] ] ] ] ]

We can render this to a HTML string and print the result just by saying:

printfn "%s" (Renderer.RenderToString page)

So far so good.

Now, I'm also a big fan of Julius Sharpe (a Family Guy writer), so I want to include his tweets too. To avoid repeating ourselves, we'll create a 'twitter' function, parameterized by user name, and then we'll just call this function in the layout:

let twitter username rnd = 
    let divId = sprintf "tweet-%d" rnd
        s.JavascriptFile ""
        s.JavascriptFile "jquery.tweet.js"
        e.Div ["id", divId] []
        e.Script [
                  $('#"+ divId + @"').tweet({
                    username: '"+ username + @"',
                    avatar_size: 32,
                    count: 4,
                    loading_text: 'searching twitter...'
let page seed = 
    let rnd = Random(seed)
        e.Html [
            e.Head [
                e.Title [ &"Tweets!" ]
            e.Body [
                e.P [
                    yield! twitter "juliussharpe" (rnd.Next())
                    yield! twitter "sethmacfarlane" (rnd.Next())

Note how I also now pass a seed to generate div IDs, to keep the code pure. But we have another, bigger problem: when rendering this page we end up with two references to jQuery and two references to jquery.tweet.js! We could move those references out of the 'twitter' function and put them in the layout, but that would pretty much defeat the purpose of this abstraction, it wouldn't be self-contained anymore, it wouldn't be reusable, it wouldn't be composable.

We've all been here. Some people just give up (or don't think much of it) and put the scripts outside the component. Others write a helper or an asset manager (basically, calling a function that keeps track of assets). This leads to a loss of purity, and with it, composability, unless you're willing to thread along this helper's state, which is not a nice prospect unless your whole code is monadic.

Speaking of monads, that's what Yesod (a Haskell web framework, obviously) does to properly encapsulate CSS and JS in reusable, composable widgets.

Another approach to the problem is to just remove the duplicate <script> tags. This is what Lift (a Scala web framework) does: you can insert <head> elements anywhere you want, and before rendering, Lift will move the contents of all inner <head>s to the one and only <head> that should be in a HTML document, deduplicating elements in the process.

Implementing something similar with Wing Beats is quite easy. Instead of scanning for <head> elements as Lift, we'll just scan for <script> elements. Also, we won't move these elements, we'll just remove the duplicates, i.e. leave the first occurrence of each script. Here's some code that does this:

let isSrc = fst >> (fun n -> n.Name = "src")
let tryGetSrc attr = attr |> List.tryFind isSrc |> snd

let rec deduplicateScripts state =
    | TagPairNode(name, attr, children) ->
        let state, children = deduplicateScriptsForest state children
        let node = TagPairNode(name, attr, children)
        if name.Name <> "script"
            then state, node
                match tryGetSrc attr with
                | Some src -> 
                    if Set.contains src state
                        then state, NoNode
                            let state = Set.add src state
                            state, node
                | _ -> state, node
    | x -> state,x
and deduplicateScriptsForest state nodes =
    let folder (state,nodes) n =
        let state, node = deduplicateScripts state n
        state, node::nodes
    let state, nodes = Seq.fold folder (state,[]) nodes
    state, List.rev nodes

Here's how to use these deduplication functions:

let pp = deduplicateScriptsForest Set.empty (page Environment.TickCount) |> snd
printfn "%s" (Renderer.RenderToString pp)

And the result:

<!DOCTYPE html >
        <script type="text/javascript" src=""></script>
        <script type="text/javascript" src="jquery.tweet.js"></script>
        <div id="tweet-1035493420">
            $(function () {
                    username: 'juliussharpe',
                    avatar_size: 32,
                    count: 4,
                    loading_text: 'searching twitter...'
        <div id="tweet-1634829813">
            $(function () {
                    username: 'sethmacfarlane',
                    avatar_size: 32,
                    count: 4,
                    loading_text: 'searching twitter...'

Now, this deduplication function isn't particularly efficient: no tail calls, List.rev... it will blow the stack if given a sufficiently deeply nested structure (a few tests indicate that it dies at around 2600 nested elements, not a bad number nevertheless). More generically, we'd want to define a generic catamorphism over the Wing Beats tree (check out Brian McNamara's excellent series on catamorphisms) and then write deduplication (or any other kind of tree processing) using that fold.

The point is, as you can see it was pretty easy to manipulate and abstract HTML fragments to simple, reusable, pure functions... because Wing Beats makes HTML elements truly first-class citizens. It models HTML as a tree, and not just as a string.

Erik Meijer already showed 11 years ago that view engines that don't make HTML fragments first-class citizens don't compose. Back to 2011, lots of view engines still suffer from this (Razor included).

Another example: if you want to test a view, you have to either use a string assert on the rendered output, or render and then parse back the output in order to test it like structured HTML. It doesn't make sense. Why not just create structured HTML from the start?

Wing Beats is not the only HTML DSL in .Net: WebSharper includes one, and in C# there's SharpDOM and CityLizard.

You may be thinking that HTML DSLs are ugly and not designer-friendly... but you can also do this kind of things with XML literals in Scala, Nemerle or VB.NET (which is mostly the brainchild of Erik Meijer, not by coincidence). There's even an ASP.NET MVC view engine that uses VB.NET's XML literals.

Bottom line: if you're templating unstructured text, then by all means use a generic text template engine. But if you're writing a web application and dealing with HTML, treating HTML as first-class values instead of unstructured text buys you composability: you can use the full power of the host language and you can handle HTML directly as a tree.

In a future post about this subject, I'll try to categorize the different approaches to HTML generation and analyze them from the point of view of composability.


Ryan said...

Do any of the existing DSL's also allow you to import HTML and generate the code for the markup (aside from WebSharper)? So far, the closest I've found is in the HtmlAgilityPack, but that tries to validate, so there really is no good way to receive markup as an input and still deliver a strong DSL for devs.

Mauricio Scheffer said...

@Ryan: when using XML literals (e.g. VB.NET) you can just copy HTML and paste it in your source code, then assign it to a variable or wrap it in a function, adjust if something's not XML, and you're good to go.
I already migrated three of my projects from NVelocity to VB.NET XML literals like that ( , , )
Other than that, it's possible to just write an external tool to generate the code. There will *always* be some kind of validation in such a tool.