Opinionated Programmer - Jo Liss's musings on enlightened software development.

Broccoli: First Beta Release

Broccoli is a new build tool. It’s comparable to the Rails asset pipeline in scope, though it runs on Node and is backend-agnostic.

After a long slew of 0.0.x alpha releases, I just pushed out the first beta version, Broccoli 0.1.0.

Table of Contents:

  1. Quick Example
  2. Motivation / Features
  3. Architecture
  4. Background / Larger Vision
  5. Comparison With Other Build Tools
  6. What’s Next

1. Quick Example

Here is a sample build definition file (Brocfile.js), presented without commentary just to illustrate the syntax:

Brocfile.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
module.exports = function (broccoli) {
  var filterCoffeeScript = require('broccoli-coffee');
  var compileES6 = require('broccoli-es6-concatenator');

  var sourceTree = broccoli.makeTree('lib');
  sourceTree = filterCoffeeScript(sourceTree);

  var appJs = compileES6(sourceTree, {
    ...
    outputFile: '/assets/app.js'
  });

  var publicFiles = broccoli.makeTree('public');

  return [appJs, publicFiles];
};

Run broccoli serve to watch the source files and continuously serve the build output on localhost. Broccoli is optimized to make broccoli serve as fast as possible, so you should never experience rebuild pauses.

Run broccoli build dist to run a one-off build and place the build output in the dist directory.

For a longer example, see the broccoli-sample-app.

2. Motivation / Features

2.1. Fast Rebuilds

The most important concern when designing Broccoli was enabling fast incremental rebuilds. Here’s why:

Let’s say you’re using Grunt to build an application written with CoffeeScript, Sass, and a few more such compilers. As you develop, you want to edit files and reload the browser, without having to manually rebuild each time. So you use grunt watch, to rebuild automatically. But as your application grows, the build gets slower. Within a few months of development time, your edit-reload cycle has turned into an edit-wait-10-seconds-reload cycle.

So to speed up your build, you try rebuilding only the files that have changed. This is difficult, because sometimes one output file depends on multiple input files. You manually configure some dependency rules, to rebuild the right files depending on which files were modified. But Grunt was never designed to do this well, and your custom rule set won’t reliably rebuild the right files. Sometimes it rebuilds files when it doesn’t have to (making your build slow). Worse, sometimes it doesn’t rebuild files when it should (making your build unreliable).

With Broccoli, once you fire up broccoli serve, it will figure out by itself which files to watch, and only rebuild those that need rebuilding.

In effect, this means that rebuilds tend to be O(1) constant-time with the number of files in your application, as you generally only rebuild one file. I’m aiming for under 200 ms per rebuild with a typical build stack, since that type of delay feels near-instantaneous to the human brain, though anything up to half a second is acceptable in my book.

2.2. Chainable Plugins

Another concern was making plugins composable. Let me show you how easy it is to compile CoffeeScript and then minify the output with Broccoli.

1
2
3
4
var tree = broccoli.makeTree('lib')
tree = compileCoffeeScript(tree)
tree = uglifyJS(tree)
return tree

With Grunt, we’d have to create a temporary directory to store the CoffeeScript output, as well as an output directory. As a result of all this bookkeeping, Gruntfiles tend to grow rather lengthy. With Broccoli, all this is handled automatically.

3. Architecture

For those who are curious, let me tell you about Broccoli’s architecture.

3.1. Trees, Not Files

Broccoli’s unit of abstraction to describe sources and build products is not a file, but rather a tree – that is, a directory with files and subdirectories. So it’s not file-goes-in-file-goes-out, it’s tree-goes-in-tree-goes-out.

If we designed Broccoli around individual files, we’d be able to compile CoffeeScript just fine (as it compiles 1 input file into 1 output file), but the API would be unnatural for compilers like Sass (which needs to read more files as it encounters @import statements, and thus compiles n input files into 1 output file).

On the other hand, with Broccoli’s design around trees, n:1 compilers like Sass are no problem, while 1:1 compilers like CoffeeScript are an easily expressible sub-case. In fact, we have a Filter base class for such 1:1 compilers to make them very easy to implement.

3.2. Plugins Just Return New Trees

This one is slightly more subtle: At first, I had designed Broccoli with two primitives: a “tree”, which represents a directory with files, and a chainable “transform”, which takes an input tree and returns a new compiled tree.

This implies that transforms map trees 1:1. Surprisingly, this is not a good abstraction for all compilers. For instance, the Sass compiler has a notion of “load paths” that it searches when it encounters an @import directive. Similarly, JavaScript concatenators like r.js have a “paths” option to search for imported modules. These load paths are ideally represented as a set of “tree” objects.

As you can see, many real-world compilers actually map n trees into 1 tree. The easiest way to support this is to let plugins deal with their input trees themselves, thereby allowing them to take 0, 1, or n input trees.

But now that we let plugins handle their input trees, we don’t need to know about compilers as first-class objects in Broccoli land anymore. Plugins simply export functions that take zero or more input trees (and perhaps some options), and return an object representing a new tree. For instance:

1
2
3
4
5
broccoli.makeTree('lib') // => a tree
compileCoffeeScript(tree) // => a tree
compileSass(tree, {
  loadPaths: [moreTrees, ...]
}) // => a tree

3.3. The File System Is The API

Remember that because Grunt doesn’t support chaining of plugins, we end up having to manage temporary directories for intermediate build products in our Grunt configurations, making them overly verbose and hard to maintain.

To avoid all this, our first intuition might be to abstract the file system away into an in-memory API, representing trees as collections of streams. Gulp for instance does this. I tried this in an early version of Broccoli, but it turns out to make the code quite complicated: With streams, plugins now have to worry about race conditions and deadlocks. Also, in addition to having a notion of streams and paths, we need file attributes like last-modified time and size in our API. And if we ever need the ability to re-read a file, or seek, or memory-map, or if we need to pass an input tree to another process we’re shelling out to, the stream API fails us and we have to write out the entire tree to the file system first. So much complexity!

But wait. If we’re going to replicate just about every feature of the file system, and in some cases we have to fall back to turning our in-memory representation into an actual tree on the file system and back again, then … why don’t we use the actual file system instead?

Node’s fs module already provides as compact an API to the file system as we could wish for.

The only disadvantage is that we have to manage temporary directories behind the scenes, and clean them up. But that’s easy to do in practice.

People sometimes worry that writing to disk is slower. But even if you hit the actual disk drive (which thanks to paging is rare), the bandwidth of modern SSDs has become so high compared to CPU speed that the overhead tends to be negligible.

3.4. Caching, Not Partial Rebuilding

When I originally tried to solve the problem of incremental rebuilds, I tried to devise a way to check whether each existing output file is stale, so that Broccoli could trigger the rebuild for a subset of its input files. But this “partial rebuild” approach requires that we are able to trace which files an output file depends on, all the way back to the source files, and it also makes file deletion tricky. “Partial rebuilds” is the classical approach of Make, as well as the Rails asset pipeline, Rake::Pipeline, and Brunch, but I’ve come to believe that it’s unnecessarily complicated.

Broccoli’s approach is much simpler: Ask each plugin to cache its build output as appropriate. When we rebuild, start with a blank slate, and re-run the entire build process. Plugins will be able to provide most of their output from their caches, which takes near-zero time.

Broccoli started off providing some caching primitives, but it turned out unnecessary to have this in the core API. Now we just make sure that the general architecture doesn’t stand in the way of caching.

For plugins that map files 1:1, like the CoffeeScript compiler, we can use common caching code (provided by the broccoli-filter package), leaving the plugin code looking very simple. Plugins that map files n:1, like Sass, need to be more careful about invalidating their caches, so they need to provide custom caching logic. I assume that we’ll still be able to extract some common caching logic in the future.

3.5. No Parallelism

If we all suffer from slow builds, should we try to parallelize builds, compiling multiple files in parallel?

My answer is no: The reason is that parallelism makes it possible to have race conditions in plugins, which you might not notice until deploy time. These are the worst kinds of bugs, and avoiding parallel execution eliminates this entire class of bugs.

On the other hand, Amdahl’s law stops us from gaining much performance through parallelizing. For a simplified example, say our build process takes 16 seconds in total. Let’s say 50% of it can be parallelized, and the rest needs to run in sequence (e.g. CoffeeScript-then-concatenate-then-UglifyJS). If we run this on a 4-core machine, the build would take 8 seconds for the sequential part plus 8 / 4 = 2 seconds for the parallel part, still totaling 10 seconds, less than a 40% performance gain.

For incremental rebuilds, which constitute the hot path that we really care about, caching tends to eliminate most of the parallelizable parts of the build process anyway, so we are left with little to no performance gain.

Because of that, in general I believe that parallelizing the build process is not a good trade. In principle you could write a Broccoli plugin that performs some work in a parallel fashion. However, Broccoli’s primitives, as well as the helper code that I’ve published on GitHub, actively encourage deterministic sequential code patterns.

4. Background / Larger Vision

There are two main motivators that made me tackle writing a good build tool.

The first motivator is better productivity, through fast incremental rebuilds.

I generally believe that developer productivity is largely determined by the quality of the libraries and tools we use. The “edit file, reload browser” cycle that we perform hundreds of times a day is probably the core feedback loop when we program. A great way to improve our tooling is getting this edit-reload feedback loop to be as fast as humanly possible.

The second motivator is encouraging an ecosystem of front-end packages.

I believe that Bower and the ES6 module system will help us build a great ecosystem, but Bower by itself is useless unless you have a build tool running on top. This is because Bower is a content-agnostic transport tool that only dumps all your dependencies (and their dependencies, recursively) into the file system—it’s up to you what to do with them. Broccoli aims to become the missing build tool sitting on top.

Note that Broccoli itself is angnostic about Bower or ES6 modules—you can use it for whatever you like. (I am aware there are other stacks, like npm + browserify, or npm + r.js.) I will discuss all of this in more detail in a future blog post.

5. Comparison With Other Build Tools

If you are almost convinced but also wondering how other build tools stack up against Broccoli, let me tell you why I wrote Broccoli instead of using any of the following:

Grunt is a task runner, and it never set out to be a build tool. If you try to (ab)use it as a build tool, you quickly find that because it doesn’t attempt to handle chaining (composition), you end up having to manage temporary directories for intermediate build products yourself, adding a lot of complexity to your Grunt configuration. It also does not support reliable incremental rebuilds, so your rebuilds will tend to be slow and/or unreliable; see section “Fast Rebuilds” above.

That said, Grunt’s utility as a task runner is in providing a cross-platform way to run shell-script type functionality, such as deploying your app or generating scaffolding. Broccoli will be able to act as a Grunt plugin in the future, so that you can call it from your Gruntfile.

Gulp tries to solve the problem of chaining plugins, but in my view it gets the architecture wrong: Rather than passing around trees, it passes around sequences (= event streams) of files (= streams or buffers). This works fine for cases where one input file maps into one output file. But when a plugin needs to follow import statements, and thus needs to access input files out of order, things get complicated. For now, plugins that follow import statements tend to just just bypass the build tool and read directly from the file system. In the future, I hear that there will be helper libraries to turn all the streams into a (virtual) file system and pass that to the compiler. I would claim though that all this complexity is a symptom of an impedance mismatch between the build tool and the compiler. See “Trees, Not Files” above for more on this. I’m also not convinced that abstracting away files behind a stream or buffer API is helpful at all; see “The File System Is The API” above.

Brunch, like Gulp, uses a file-based (not tree-based) in-memory API (see this method signature). Like with Gulp, plugins end up falling back to bypassing the build tool when they need to read more than one file. Brunch also tries to do partial rebuilding rather than caching; see section “Caching, Not Partial Rebuilding” above.

Rake::Pipeline is written in Ruby, which is less ubiquitous than Node in front-end land. It tries to do partial rebuilds as well. Yehuda says it’s not heavily maintained anymore, and that he’s betting on Broccoli.

The Rails asset pipeline uses partial rebuilds as well, and uses very different code paths for development mode and production (precompilation) mode, causing people to have unexpected issues when they deploy. More importantly it’s tied to Rails as a backend.

6. What’s Next

The list of plugins is still small. If they are enough for you, I cautiously recommend giving Broccoli a try right now: https://github.com/joliss/broccoli#installation

I would like to see other people get involved in writing plugins. Wrapping compilers is easy, but the hard and important part is getting caching and performance right. We’ll also want to work on generalizing more caching patterns in addition to broccoli-filter, so that plugins don’t suffer from excessive boilerplate.

Over the next week or two, my plan is to improve the documentation and clean up the code base of Broccoli core and the plugins. We will also have to add a test suite to Broccoli core, and figure out an elegant way to integration-test Broccoli plugins against Broccoli core. Another thing that’s missing with the existing plugins is source map support. This is slightly complicated by performance considerations, as well as the fact that chained plugins need to consume other plugins’ source maps and interoperate properly, so I haven’t found the time to tackle this yet.

Broccoli will see active use in the Ember ecosystem, powering the default stack emitted by ember-cli (an upcoming tool similar in functionality to the rails command). We are also hoping to move the build process used for generating the Ember core and ember-data distributions from Rake::Pipeline and Grunt to Broccoli.

That said, I would love to see Broccoli adopted outside the Ember community as well. JS MVC applications written with frameworks like Angular or Backbone, as well as JavaScript and CSS libraries that require build steps, are all prime candidates for being built by Broccoli.

I don’t currently see any major roadblocks on the path to Broccoli becoming stable. By using it for real-world build scenarios, we should gain confidence in its API, and I’m hoping that we can bump the version to 1.0.0 within a few months’ time.

This blog post is the first comprehensive explanation of Broccoli’s architecture, and the documentation is still somewhat sparse. I’m happy to help you get started, and fix any bugs you encounter. Come find me on #broccolijs on Freenode, or at joliss42@gmail.com on Google Talk. I’ll also respond to any issues you post on GitHub.

Thanks to Jonas Nicklas, Josef Brandl, Paul Miller, Erik Bryn, Yehuda Katz, Jeff Felchner, Chris Willard, Joe Fiorini, Luke Melia, Andrew Davey, and Alex Matchneer for reading and critiquing drafts of this post.

Discuss on Twitter