From Make to Ant to Maven

Sep 5, 2011

I've been using Maven now for quite awhile, having migrated off of Ant in favor of it for its superior dependency management. However, although I've managed to get the hang of it now, I initially found it pretty frustrating - Maven defines a lot of default behavior implicitly, and if you don't know what's going on under the hood, Maven has a nasty tendency to surprise you.

Maven is a build system optimized for building Java-based software projects. Owing to its roots as a Sun project, Java itself has its roots in Unix. In 1991, while working for Sun, James Gosling was working on developing a language for embedded set-top television boxes which he initially called "Oak". All of this happened before Linux dominated Unix, when Sun's flagship product was their Solaris operating system and the custom hardware that it ran on. The Unix roots of Oak - which was later renamed Java - are still evident in the Java Development Kit that you must download in order to do any Java programming. The Unix world was and is dominated by command-line interfaces (the best kind!), where writing a program typically consists of first editing the source code and then running a compiler program to generate an executable binary.

That much hasn't changed - to write a (compiled) program, you must edit its code and then run a compiler. Modern IDEs tend to shield us from that process, but it's still going on behind the scenes. Even Java and C programmers get instant feedback when they make a typographical error - when using command-line only tools, you generally don't realize that you made a mistake until you invoke the compiler. This creates a feedback loop where the developer opens an editor containing the code, adds some functionality, runs the associated compiler, opens the editor again to fix any errors, compiles again, etc. until there are no more errors. In Java, this command- line-tool - which was the only way to compile Java for many years - is called javac. You invoke it like this:

javac YourClassFile.java
Absent any other parameters or syntax errors, this will generate a .class file named YourClassFile.class which can then be executed using the java command-line-tool which creates the JRE and interprets the byte code in the .class file. On the other hand, if there are any syntax errors, they will be listed on the command console, and no .class file will be created.

In any nontrivial application, though, there are going to be lots of source code files, and not all of them are going to be changed for each new feature. In fact, in most applications, it's rare for any single modification to touch even most of the source code files. As such, it's an awful time drain for the developer to recompile all of the potentially hundreds or even thousands of source code files in his project just to regenerate what may work out to be a single-line change.

On the other hand, it's a bookkeeping nightmare for the developer to keep track of exactly which files he changed - although you don't normally change all of them, it's reasonable to change five or six interrelated source code files at a time. This was a problem pre-Java for C programmers. C compilation was similar to Java compilation - you would edit a .c file, compile it to a corresponding .o file, and finally "link" all of the .o files together into an executable. You didn't typically change all of them at one time, so you wanted to just recompile the ones you did change. To keep track of all of this automatically, the Make program was developed by Stuart Feldman in 1977.

The job of the Make program was (conceptually) simple - compare the timestamps of all of the .o files with their corresponding .c files and recompile only those .c files which are newer than their .o files. Make actually took this a step further and allowed its user to declare any number of arbitrary depedencies - a Makefile could be created which would declare, for example, the coolapp execuable was dependent on cool.o, ice.o, snow.o, and refrigerator.o. Each of these, in turn, could be declared to be dependent on a .c and a .h file (following the C convention of separating source files from declarations in header files). Such a simple Makefile might look like this:

coolapp: cool.o ice.o snow.o refrigerator.o
	ld cool.o ice.o snow.o refrigerator.o -o coolapp

cool.o: cool.c cool.h
	cc -c cool.c

ice.o: ice.c ice.h
	cc -c ice.c

...
The Make utility would first check to see if any of the independent files cool.o, ice.o, snow.o or refrigerator.o was newer than coolapp (or if coolapp didn't exist at all) and if so, run the specified ld command which links .o files together. However, it would first check to see if cool.c or cool.h was newer than cool.o and if so, invoke the cc command which compiles C source code. If this happened, then cool.o would have become newer than coolapp, which would then need to be built.

Each of these declarations is referred to as a Make rule. You might notice here that it would become tedious to list all of your source files, one by one, with its dependencies - to ease this burden a bit, Make allows you to declare implicit rules that say that any file named x.o must depend on a source named c.o. In fact, there are quite a few such implicit rules defined by Make "out of the box", to the point that it can actually compile some fairly complex C applications with very little direction.

Make handles all of this beautifully and brilliantly. There's a lot more to it, but my reason for discussing it here is that early Java developers tried to use Make to handle the same process in Java. Unfortunately, they ran into an immediate snag, related to Java's concept of the classpath.

In C, source files don't belong to an inherent namespace. Conscientious developers, of course, split up their source files into logical groupings and kept them in separate directories each with its own Makefile. However, Make was highly optimized for the common case where the dependent and the independent file always shared a directory. In Java, this is generally never the case. All Java classes should be declared in a package like this:

package com.mycompany.coolapp;
The javac compiler will always generate the destination class file under a new directory hierarchy such as com/mycompany/coolapp/Something.class. And, although not strictly required by the language or the compiler, the Java coding convention is to keep the source files themselves in a similar parallel directory structure. Unfortunately, in 1995 when Java was new, there was no way to declare an implicit Make rule where all of the .class files in the target/com/mycompany/coolapp/ subdirectory dependent on the corresponding .java files in the src/com/mycompany/coolapp subdirectory. You could (and some of us did) get around this by listing out each source and class file one by one in the Makefile, but then there was the problem of making sure to add new files into your make rules...

To make matters worse, Java classes depend on one another - so a change to class A might cause a compilation error in class B. However, if you're just recompiling the files that have changed, B won't be recompiled, and you won't realize you made a mistake until you run the application and get a NoSuchMethodError or NoSuchFieldError. In the C programming language, the way to make two source files dependent on one another is to include ones header in the other. In this way, a change to the internals of one didn't trigger a recompilation of the other, but a change to its public interface (declared in the header file), would. However, Java has no concept of header files - the equivalent in Java would be to keep track of which source files are imported by this one and check whether its overall signature has changed.

Make can't do any of this - in spite of its genericity and its appliciability to domains that it wasn't designed for (in some cases, which didn't even exist when it was first designed) it's optimized for file-based changes. Yet the alternative - recompile everything every time anything changes - is even worse. So, a lot of us limped along for quite a few years putting together makeshift Make files (there's a pun in there, I just know it) until James Davidson came up with "Another Neat Tool" that he named ANT.

Ant was highly optimized for Java. It was possible to just tell Ant that "my source files are here, I want my class files here", and Ant would look and figure out which class files needed to be recompiled, recompile them and return. Ant, just like Make, is "rule-based". Its rules are specified in XML format rather than textually, but conceptually it was very similar, but with some extra intelligence for handling classpaths, packages, and some of Java's internals.

Everybody migrated to Ant as soon as it came along. However, although Ant did an admirable job of automating builds for Java-based projects, there were some chinks in its armor. Most notably, there was no built-in support for dependency management. It's typical - in fact, I can't imagine a counterexample - for any software to depend on more than a few third-party libraries. So, if you want your program to compile or run, you have to include the paths to those third party libraries in the compile and run classpath, and you have to make those libraries available to anybody who might want or need to build your software.

In Ant's defense, this was a weakness of Make as well - but as Java progressed and matured, this was a much bigger problem for Java developers than it ever was for C programmers since simple things like logging required third-party add-ons, which themselves required other third party add-ons. This wasn't so much unique to Java as it was to the Java "culture", and it was a major hassle for Java developers - especially ones who wanted to create repeatable build processes.

Enter Maven. Although you wouldn't be able to tell from looking from its official documentation, the real core strength of Maven - and what separates it from Make and Ant - is the repository of library files. Maven assigns every single Java library (e.g. "jar") in the world a unique identifier. Each library is given a groupId and an artifactId. In your build file, rather than identifying paths to each library, you identify their groupId's and artifactId's and Maven will download them from a central repository if it can't find them. The idea behind Maven is that it's more declarative than imperative - you tell Maven what you want done, and Maven "figures out" the best way to do it. This is in stark contrast to the approach taken by Make and Ant where you tell them what to do, step by step. As you'll see, though, Maven's declarative nature can be both a blessing and a curse.

The declaration of what you want done is called a "project object model" or POM in Maven. In the POM, you don't actually tell Maven what to do, but instead instruct its various "plugins" where to look to find their inputs and place their outputs. The most fundamental plugin, of course, is the Java compiler plugin that actually compiles Java code.

Given a typical Java project structure where the source files are located under a subdirectory named src/com/mycompany/coolapp, getting Maven to compile your source files into a new directory named classes would be described by the following project object model:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/4.0.0
    http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mycompany.coolapp</groupId>
  <artifactId>MyApp</artifactId>
  <version>1.0</version>

  <build>
    <sourceDirectory>src</sourceDirectory>
    <outputDirectory>classes</outputDirectory>
  </build>
</project>
This starts with, of course, the overly verbose XML header stuff that is standard with XML these days. The next five lines are standard Maven headers - you must declare a model version (4.0.0 in this case) so that Maven knows how to interpret the rest of the file. Every project must be assigned a groupId, artifactId and a version.

However, notice the end of the file - the actual compiler instructions. Unlike ant, where you would declare, for instance:
<target name="compile">
  <javac srcdir="src" destdir="classes" includes="**/*.java" />
</target>
describing explicitly which task to invoke and how it's configured, Maven assumes that if you have Java source files, you want to compile them using javac, and just needs to know where they are.

In fact, you don't even have to do this, if you don't mind adopting a few (semi-restrictive) naming conventions. If you don't tell Maven where to find the source code files, it will look under src/main/java (relative to the pom.xml file). If you don't tell it where you want the output, it will put it under target/classes. So, if you're willing to restructure your source and target directories, you can get away with a very minimal pom file:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/4.0.0
    http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mycompany.coolapp</groupId>
  <artifactId>MyApp</artifactId>
  <version>1.0</version>
</project>
(And, incidentally, you may as well make your peace with restructuring your source and target directories right now if you're going to be using Maven... trying to follow any convention besides the "standard" one is going to be painful. Just put your sources under src/main/java).

Now, if you've been coding in Java for any length of time, you're probably wondering how to set, for example, the compiler version, enable debugging, optimizations, etc. Although you control the source and target directories under the "build" element of the POM, it's actually the Maven compiler plugin that manages the arguments to the Java compiler plugin. To change its configuration within a single project, you have to include the fairly verbose:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>2.3.2</version>
      <configuration>
        <source>1.5</source>
        <target>1.5</target>
        <debug>false</debug>
        <optimize>true</optimize>
      </configuration>
    </plugin>
  </plugins>
</build>
Notice that the compiler plugin has a groupId, artifactId and a version, just like your project. EVERYTHING in Maven is identified by these triples, and they're always unique (figuring out what they are can sometimes be challenging, though). Once you've identified the plugin itself by way of a groupId/ artifactId/version triple, you can include its configuration, which will vary from one plugin to another. Here, I've shown some of the more useful options for the compiler plugin - you can find them all documented here.

Ok, so far so good. You've got your source files in the "right" place, you've described your project object model (POM), and you've configured your Java compiler. How do you actually run this thing? If you try to invoke mvn from the command line, you'll get the semi-helpful error message:

$ mvn
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.108s
[INFO] Finished at: Tue Aug 30 15:57:28 CDT 2011
[INFO] Final Memory: 2M/81M
[INFO] ------------------------------------------------------------------------
[ERROR] No goals have been specified for this build. You must specify a valid 
lifecycle phase or a goal in the format <plugin-prefix>:<goal> or 
<plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available 
lifecycle phases are: validate, initialize, generate-sources, process-sources, 
generate-resources, process-resources, compile, process-classes, 
generate-test-sources, process-test-sources, generate-test-resources, 
process-test-resources, test-compile, process-test-classes, test, 
prepare-package, package, pre-integration-test, integration-test, 
post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, 
pre-site, site, post-site, site-deploy. -> [Help 1]

This illustrates another "Mavenism". Unlike Make and Ant, which are by nature imperative, Maven makes a lot of assumptions about what you're trying to do - that is, build a software project. In general, this involves a series of stages. For example:

  1. compile the sources
  2. compile the unit tests
  3. run the unit tests
  4. package up the distributable
  5. deploy it somewhere
If any stage fails, the whole process should be aborted. Most Ant scripts define a process like this one - each step is a "target" and each target "depends" on the previous one. Maven makes this process implicit and goes ahead and defines one for you. You have the option of creating a different one, or adding steps, but you'll find that in 99% of all cases, Maven's process is more than sufficient.

Maven calls each of these stages a "phase"; if you want to run Maven, then you have to give it a phase to reach. One useful phase is the "compile" phase:

$ mvn compile
What this tells Maven to do is to execute every plugin that's "bound" to the compile phase and every phase that appears before the compile phase. As you can probably guess, the compiler plugin is bound to the compile phase by default, but you have the option of binding other plugins to the compile phase if you'd like. Maven will look to your POM to figure out how to invoke the compiler plugin.

Another handy phase is the "package" phase. If you run mvn compile, you'll see that, under the target directory, there's a classes subdirectory containing the compiled code. If you run, on the other hand, mvn package, you'll see a couple more subdirectories:

$ ls target
MyApp-1.0.jar  classes        maven-archiver surefire
The most important of these plugins in this case is the jar plugin that jars up the contents of the classes directory for shipping.

By telling maven to achieve the "package" phase, it first ran through the compile phase, executing every plugin that was bound to it, and then it went to the package phase, running every plugin that was bound to it. As it turns out, there are a couple of plugins bound to the package phase. The most important of these is the jar plugin which jars up the contents of your target/classes directory for redistribution.

Strictly speaking, you don't bind plugins to phases, but instead you bind goals to phases, and plugins have goals. So, to really understand Maven, you need to be familiar with three key terms: phases, plugins and goals. Plugins provide functionality by exposing goals, and goals are bound to lifecycle phases, which execute sequentially.

You can see which lifecycle phases exist and which goals are bound to which phases using the "help:describe" goal.

$ mvn help:describe -Dcmd=install
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building MyApp 1.0
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-help-plugin:2.1.1:describe (default-cli) @ MyApp ---
[INFO] 'install' is a phase corresponding to this plugin:
org.apache.maven.plugins:maven-install-plugin:2.3.1:install

It is a part of the lifecycle for the POM packaging 'jar'. This lifecycle 
includes the following phases:
* validate: Not defined
* initialize: Not defined
* generate-sources: Not defined
* process-sources: Not defined
* generate-resources: Not defined
* process-resources: org.apache.maven.plugins:maven-resources-plugin:2.4.3:resources
* compile: org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile
* process-classes: Not defined
* generate-test-sources: Not defined
* process-test-sources: Not defined
* generate-test-resources: Not defined
* process-test-resources: org.apache.maven.plugins:maven-resources-plugin:2.4.3:testResources
* test-compile: org.apache.maven.plugins:maven-compiler-plugin:2.3.2:testCompile
* process-test-classes: Not defined
* test: org.apache.maven.plugins:maven-surefire-plugin:2.7.2:test
* prepare-package: Not defined
* package: org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar
* pre-integration-test: Not defined
* integration-test: Not defined
* post-integration-test: Not defined
* verify: Not defined
* install: org.apache.maven.plugins:maven-install-plugin:2.3.1:install
* deploy: org.apache.maven.plugins:maven-deploy-plugin:2.5:deploy

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.062s
[INFO] Finished at: Thu Sep 01 18:39:13 CDT 2011
[INFO] Final Memory: 5M/81M
[INFO] ------------------------------------------------------------------------

An equivalent Ant file might look like this:

<project name="CoolApp">
  <target name="validate">
  </target>

  <target name="initialize" depends="validate">
  </target>

  <target name="generate-sources" depends="initialize">
  </target>

  <target name="process-sources" depends="generate-sources">
  </target>

  <target name="generate-resources" depends="process-sources">
  </target>

  <target name="process-resources" depends="generate-resources">
    <copy todir="target/classes" filtering="true">
      <fileset dir="src/main/resources" />
    </copy>
  </target>

  <target name="compile" depends="process-resources">
    <javac srcdir="src/main/java" destdir="target/classes" includes="**/*.java" />
  </target>

  <target name="process-classes" depends="compile">
  </target>

  <target name="generate-test-sources" depends="process-classes">
  </target>

  <target name="process-test-resources" depends="generate-test-sources">
    <copy todir="target/test-classes" filtering="true">
      <fileset dir="src/main/resources" />
    </copy>
  </target>

  <target name="test-compile" depends="process-test-resources">
    <javac srcdir="src/test/java" destdir="target/test-classes" includes="**/*.java" />
  </target>

  <target name="process-test-classes" depends="test-compile">
  </target>

  <target name="test" depends="process-test-classes">
    <junit>
      <classpath>
        <pathelement location="target/test-classes" />
      </classpath>
      <batchtest todir="target/surefire-reports">
        <fileset dir="src/test/java" includes="**/*.java" />
        <formatter type="xml" />
      </batchtest>
    </junit>
  </target>

  <target name="prepare-package" depends="test">
  </target>

  <target name="package" depends="prepare-package">
    <jar basedir="target/classes" jarfile="${projectname}.jar" />
  </target>

  ...
</project>
Maven just makes this common lifecycle implicit is all.

So... what is the lifecycle anyway? Well, if you don't specify one, there is a default lifecycle - you can also specify a different lifecycle; Maven defines two: site and clean. This means, when you invoke Maven, the second argument is either a lifecycle, a lifecycle phase, or a specific goal. There's actually no way to tell which one you're specifying; you just need to keep track.

You may be surprised, if you've got any familiarity with Maven, that I've talked so long about Maven without mentioning dependency management - the feature that brought so many people over to Maven in the first place! If you're an Ant user, you must be wondering by now how to set the classpath. Maven's chief strength is that it manages your classpath for you. You configure the classpath by declaring dependencies. Each possible Maven dependency is associated with a groupId and an artifactId - if you want to incorporate, say, spring, in your application, you add a dependencies section before the build section:

...
  <dependencies>
    <dependency>
      <groupId>org.springframework</groupId>
      <artifactId>spring-context</artifactId>
      <version>2.5.6</version>
    </dependency>
  </dependencies>
...

What happens now is that Maven will search your local hard drive for a file named ${home}/.m2/repository/org/springframework/spring-context/2.5.6/spring-context-2.5.6.jar. In general, it's looking for a file named:

${home}/.m2/repository/${groupId}/${artifactId}/${version}/${artifactId}-${version}.jar
after replacing all of the '.'s in ${groupId} with path separators ('/').

If it finds this file, it will be added to the classpath dynamically when the compiler is run. If it doesn't find this file, it will go look under http://repo1.maven.org/maven2/org/springframework/2.5.6/spring-context/2.5.6/spring-context-2.5.6.jar, download that file and place it in the aforementioned location, and then include it in the classpath.

In this way, you can dynamically declare all of your dependencies - Maven will go resolve them from its surprisingly comprehensive central repository when you invoke your build. All of these dependencies are versioned, so you can be assured that if you share your source code and pom with somebody else, they'll get the same build that you meant, without having to ship all of the dependencies.

I often find myself wanting to see what the actual classpath that was used was for some reason or another - there's a handy Maven command:

mvn dependency:build-classpath
that will show you what classpath it used to perform the build.

Very often, you'll depend on a third-party library that itself depends on other libraries. Prior to Maven 2, you as the developer just had to keep track of the dependencies of your dependencies (and their dependencies, and so on), and make sure to include them all in your POM. Maven 2 introduced the concept of "transitive" dependencies - a dependency in the central repository could actually declare its own dependencies, which Maven would download for you at build time. Further, you can declare certain dependencies to be needed at compile time, some to be needed at test time only (for example, mock jars), and some to be needed at run time (if you're packaging up a .war, for instance). The only downside of transitive dependencies is that you sometimes want to build with a different version of the same .jar that a dependency declares transitively — although Maven's pretty good about recognizing and resolving this, you can disable transitive dependencies on a case-by-case basis.

So in the age of IDE's, are these build scripts still necessary? Well, they are if you want any sort of build automation or test automation. In fact, Eclipse now integrates seamlessly with Maven and can generate a POM for you from your eclipse project file, and resolve dependencies directly from your local repository cache.

What about Maven vs. Ant? Well, the jury may still be out on that one. Although I find myself spending a lot of time looking up the Ant documentation to figure out if I'm supposed to type srcdir or src or what the classpath syntax is, I have the opposite problem with Maven - I have to go back and look at the documentation to figure out what it's doing on my behalf. Maven has the central repsitory concept going for it, but with Ant's Ivy plugin available now, that may not be a "competitive" advantage. On the whole, though, I lean toward Maven, especially when I'm working with other people.

There's quite a bit more to Maven; it's well documented, once you have an idea how the whole "build lifecycle" and its associated plugins work. Hopefully this article helped you get a general sense of Maven so that the official documentation makes more sense. If you want to continue with Maven from here, you should definitely check out the site lifecycle, the deploy phase, and the snapshot versioning - but I'll leave all that for the Maven people to explain.

Add a comment:

Completely off-topic or spam comments will be removed at the discretion of the moderator.

Name: Name is required
Email (will not be displayed publicly):
Comment:
Comment is required
Queenie, 2011-12-06
A good many vaullabes you've given me.
Carli, 2011-12-30
Super eixtced to see more of this kind of stuff online.

Past Posts

My Book

I'm the author of the book "Implementing SSL/TLS Using Cryptography and PKI". Like the title says, this is a from-the-ground-up examination of the SSL protocol that provides security, integrity and privacy to most application-level internet protocols, most notably HTTP. I include the source code to a complete working SSL implementation, including the most popular cryptographic algorithms (DES, 3DES, RC4, AES, RSA, DSA, Diffie-Hellman, HMAC, MD5, SHA-1, SHA-256, and ECC), and show how they all fit together to provide transport-layer security.

My Picture

Joshua Davies

Blog Submission Blog Sites
Promote Blog
Blog Community & Blog Directory
Blogs Blog Gadgets Alessandra