From Make to Ant to Maven
Sep 5, 2011
I've been using Maven now for quite awhile, having migrated off of Ant
in
favor of it for its superior dependency management. However, although I've
managed to get the hang of it now, I initially found it pretty frustrating -
Maven defines a lot of default behavior implicitly, and if you don't know
what's going on under the hood, Maven has a nasty tendency to surprise you.
Maven is a build system optimized for building Java-based software projects. Owing to its roots as a Sun project, Java itself has its roots in Unix. In 1991, while working for Sun, James Gosling was working on developing a language for embedded set-top television boxes which he initially called "Oak". All of this happened before Linux dominated Unix, when Sun's flagship product was their Solaris operating system and the custom hardware that it ran on. The Unix roots of Oak - which was later renamed Java - are still evident in the Java Development Kit that you must download in order to do any Java programming. The Unix world was and is dominated by command-line interfaces (the best kind!), where writing a program typically consists of first editing the source code and then running a compiler program to generate an executable binary.
That much hasn't changed - to write a (compiled) program, you must edit its code and
then run a compiler. Modern IDEs tend to shield us from that process, but it's still going on behind the scenes.
Even Java
and C programmers get instant feedback when they make a typographical error - when
using command-line only tools, you generally don't realize that you made a
mistake until you invoke the compiler. This creates a feedback loop where
the developer opens an editor containing the code, adds some functionality,
runs the associated compiler, opens the editor again to fix any errors,
compiles again, etc. until there are no more errors. In Java, this command-
line-tool - which was the only way to compile Java for many years - is
called javac
. You invoke it like this:
javac YourClassFile.java
Absent any other parameters or syntax errors, this will generate a .class
file named YourClassFile.class which can then be executed using the java
command-line-tool which creates the JRE and interprets the byte code in the
.class file. On the other hand, if there are any syntax errors, they will be listed on the
command console, and no .class file will be created.
In any nontrivial application, though, there are going to be lots of source code files, and not all of them are going to be changed for each new feature. In fact, in most applications, it's rare for any single modification to touch even most of the source code files. As such, it's an awful time drain for the developer to recompile all of the potentially hundreds or even thousands of source code files in his project just to regenerate what may work out to be a single-line change.
On the other hand, it's a bookkeeping nightmare for the developer to keep
track of exactly which files he changed - although you don't normally change
all of them, it's reasonable to change five or six interrelated source code
files at a time. This was a problem pre-Java for C programmers. C compilation was similar to Java compilation -
you would edit a .c file, compile it to a corresponding .o file, and finally
"link" all of the .o files together into an executable. You didn't typically
change all of them at one time, so you wanted to just recompile the ones you
did change. To keep track of all of this automatically, the Make
program was developed by Stuart Feldman in 1977.
The job of the Make
program was (conceptually) simple - compare the timestamps
of all of the .o files with their corresponding .c files and recompile only
those .c files which are newer than their .o files. Make
actually took this
a step further and allowed its user to declare any number of arbitrary
depedencies - a Makefile
could be created which would declare, for example,
the coolapp
execuable was dependent on cool.o
, ice.o
, snow.o
,
and refrigerator.o
. Each of these, in turn, could be declared to be dependent
on a .c
and a .h
file (following the C convention of separating source files
from declarations in header files). Such a simple Makefile
might look like
this:
coolapp: cool.o ice.o snow.o refrigerator.o ld cool.o ice.o snow.o refrigerator.o -o coolapp cool.o: cool.c cool.h cc -c cool.c ice.o: ice.c ice.h cc -c ice.c ...
The Make
utility would first check to see if any of the independent files
cool.o
, ice.o
, snow.o
or refrigerator.o
was newer than coolapp
(or if coolapp
didn't exist at all) and if so, run the specified ld
command which links
.o
files together. However, it would first check to see if cool.c
or cool.h
was newer than cool.o
and if so, invoke the cc
command which compiles C source
code. If this happened, then cool.o
would have become newer than coolapp
,
which would then need to be built.
Each of these declarations is referred to as a Make rule
. You might notice
here that it would become tedious to list all of your source files, one by
one, with its dependencies - to ease this burden a bit, Make
allows you to
declare implicit rules that say that any file named x.o
must depend on a source
named c.o
. In fact, there are quite a few such implicit rules defined by
Make
"out of the box", to the point that it can actually compile some fairly
complex C applications with very little direction.
Make
handles all of this beautifully and brilliantly. There's a lot more
to it, but my reason for discussing it here is that early Java developers tried to use Make
to handle the same process in Java. Unfortunately, they ran into an
immediate snag, related to Java's concept of the classpath.
In C, source files don't belong to an inherent namespace. Conscientious developers,
of course, split up their source files into logical groupings and kept them
in separate directories each with its own Makefile
. However, Make
was highly
optimized for the common case where the dependent and the independent file
always shared a directory. In Java, this is generally never the case. All
Java classes should be declared in a package like this:
package com.mycompany.coolapp;
The javac
compiler will always generate the destination class file under
a new directory hierarchy such as com/mycompany/coolapp/Something.class
.
And, although not strictly required by the language or the compiler, the
Java coding convention is to keep the source files themselves in a similar
parallel directory structure.
Unfortunately, in 1995 when Java was new, there was no way to declare an
implicit Make
rule where all of the .class files in the
target/com/mycompany/coolapp/
subdirectory dependent on the corresponding
.java files in the src/com/mycompany/coolapp
subdirectory. You could (and
some of us did) get around this by listing out each source and class file
one by one in the Makefile
, but then there was the problem of
making sure to add new files into your make
rules...
To make matters worse, Java classes depend on one another - so a change to
class A might cause a compilation error in class B. However, if you're just
recompiling the files that have changed, B won't be recompiled, and you
won't realize you made a mistake until you run the application and get
a NoSuchMethodError
or NoSuchFieldError
. In the
C programming language, the way to
make two source files dependent on one another is to include ones header
in the other. In this way, a change to the internals of one didn't trigger
a recompilation of the other, but a change to its public interface (declared
in the header file), would. However, Java has no concept of header files -
the equivalent in Java would be to keep track of which source files are
imported by this one and check whether its overall signature has changed.
Make
can't do any of this - in spite of its genericity and its appliciability
to domains that it wasn't designed for (in some cases, which didn't even
exist when it was first designed) it's optimized for file-based changes.
Yet the alternative - recompile everything every time anything changes - is even
worse. So, a lot of us limped along for quite a few years putting together
makeshift Make
files (there's a pun in there, I just know it) until James
Davidson came up with "Another Neat Tool" that he named ANT.
Ant
was highly optimized for Java. It was possible to just tell Ant
that
"my source files are here, I want my class files here", and Ant
would look and
figure out which class files needed to be recompiled, recompile them and
return. Ant
, just like Make
, is "rule-based". Its rules are specified in XML format
rather than textually, but conceptually it was very similar, but with
some extra intelligence for handling classpaths, packages, and some of Java's internals.
Everybody migrated to Ant
as soon as it came along. However, although Ant
did an admirable job of automating builds for Java-based projects, there
were some chinks in its armor. Most notably, there was no built-in support
for dependency management. It's typical - in fact, I can't imagine a
counterexample - for any software to depend on more than a few third-party
libraries. So, if you want your program to compile or run, you have to
include the paths to those third party libraries in the compile and run
classpath, and you have to make those libraries available to anybody who
might want or need to build your software.
In Ant
's defense, this was a weakness of Make
as well - but as Java progressed and matured, this was a much bigger problem
for Java developers than it ever was for C programmers since simple things
like logging required third-party add-ons, which themselves required other
third party add-ons. This wasn't so much unique to Java as it was to the
Java "culture", and it was a major hassle for Java developers - especially
ones who wanted to create repeatable build processes.
Enter Maven. Although you wouldn't be able to tell from looking from its
official documentation, the real core strength of Maven - and what separates it from
Make
and Ant
- is the repository of library files. Maven assigns every single
Java library (e.g. "jar") in the world a unique identifier. Each library
is given a groupId
and an artifactId
. In your build file, rather than
identifying paths to each library, you identify their groupId's and artifactId's
and Maven will download them from a central repository if it can't find them.
The idea behind Maven is that it's more declarative than imperative - you tell
Maven what you want done, and Maven "figures out" the best way to do it. This
is in stark contrast to the approach taken by Make
and Ant
where you tell them
what to do, step by step. As you'll see, though, Maven's declarative nature
can be both a blessing and a curse.
The declaration of what you want done is called a "project object model" or
POM
in Maven. In the POM
, you don't actually tell Maven what to do, but
instead instruct its various "plugins" where to look to find their inputs and
place their outputs. The most fundamental plugin, of course, is the Java
compiler plugin that actually compiles Java code.
Given a typical Java project structure where the source files are located under
a subdirectory named src/com/mycompany/coolapp
, getting Maven to compile
your source files into a new directory named classes
would
be described by the following project object model:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.mycompany.coolapp</groupId> <artifactId>MyApp</artifactId> <version>1.0</version> <build> <sourceDirectory>src</sourceDirectory> <outputDirectory>classes</outputDirectory> </build> </project>
This starts with, of course, the overly verbose XML header stuff that is standard with XML these days. The next five lines are standard Maven headers - you must declare a model version (4.0.0 in this case) so that Maven knows how to interpret the rest of the file. Every project must be assigned a groupId, artifactId and a version.
However, notice the end of the file - the actual compiler instructions. Unlike ant, where you would declare, for instance:
<target name="compile"> <javac srcdir="src" destdir="classes" includes="**/*.java" /> </target>
describing explicitly which task to invoke and how it's configured, Maven
assumes that if you have Java source files, you want to compile them using
javac
, and just needs to know where they are.
In fact, you don't even have to do this, if you don't mind adopting a few
(semi-restrictive) naming conventions. If you don't tell Maven where to
find the source code files, it will look under src/main/java
(relative to
the pom.xml file). If you don't tell it where you want the output, it will
put it under target/classes
. So, if you're willing to restructure your
source and target directories, you can get away with a very minimal pom
file:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.mycompany.coolapp</groupId> <artifactId>MyApp</artifactId> <version>1.0</version> </project>
(And, incidentally, you may as well make your peace with restructuring your
source and target directories right now if you're going to be using Maven...
trying to follow any convention besides the "standard" one is going to be
painful. Just put your sources under src/main/java
).
Now, if you've been coding in Java for any length of time, you're probably
wondering how to set, for example, the compiler version, enable debugging,
optimizations, etc. Although you control the source and target directories
under the "build" element of the POM
, it's actually the Maven compiler plugin
that manages the arguments to the Java compiler plugin. To change its
configuration within a single project, you have to include the fairly verbose:
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <source>1.5</source> <target>1.5</target> <debug>false</debug> <optimize>true</optimize> </configuration> </plugin> </plugins> </build>
Notice that the compiler plugin has a groupId, artifactId and a version, just like your project. EVERYTHING in Maven is identified by these triples, and they're always unique (figuring out what they are can sometimes be challenging, though). Once you've identified the plugin itself by way of a groupId/ artifactId/version triple, you can include its configuration, which will vary from one plugin to another. Here, I've shown some of the more useful options for the compiler plugin - you can find them all documented here.
Ok, so far so good. You've got your source files in the "right" place,
you've described your project object model (POM
), and you've configured
your Java compiler. How do you actually run this thing? If you try to
invoke mvn
from the command line, you'll get the semi-helpful error
message:
$ mvn [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.108s [INFO] Finished at: Tue Aug 30 15:57:28 CDT 2011 [INFO] Final Memory: 2M/81M [INFO] ------------------------------------------------------------------------ [ERROR] No goals have been specified for this build. You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
This illustrates another "Mavenism". Unlike Make
and Ant
, which are by
nature imperative, Maven makes a lot of assumptions about what you're trying
to do - that is, build a software project. In general, this involves a series
of stages. For example:
- compile the sources
- compile the unit tests
- run the unit tests
- package up the distributable
- deploy it somewhere
If any stage fails, the whole process should be aborted. Most Ant
scripts
define a process like this one - each step is a "target" and each target
"depends" on the previous one. Maven makes this process implicit and goes
ahead and defines one for you. You have the option of creating a different
one, or adding steps, but you'll find that in 99% of all cases, Maven's
process is more than sufficient.
Maven calls each of these stages a "phase"; if you want to run Maven, then you have to give it a phase to reach. One useful phase is the "compile" phase:
$ mvn compile
What this tells Maven to do is to execute every plugin that's "bound" to
the compile phase and every phase that appears before the compile phase. As
you can probably guess, the compiler plugin is bound to the compile phase
by default, but you have the option of binding other plugins to the compile
phase if you'd like. Maven will look to your POM
to figure out how to
invoke the compiler plugin.
Another handy phase is the "package" phase. If you run mvn compile
, you'll
see that, under the target directory, there's a classes
subdirectory
containing the compiled code. If you run, on the other hand, mvn package
, you'll see a couple
more subdirectories:
$ ls target MyApp-1.0.jar classes maven-archiver surefire
The most important of these plugins in this case is the jar
plugin that
jars up the contents of the classes
directory for shipping.
By telling maven to achieve the "package" phase, it first ran through the
compile phase, executing every plugin that was bound to it, and then it
went to the package phase, running every plugin that was bound to it. As
it turns out, there are a couple of plugins bound to the package phase.
The most important of these is the jar plugin which jars up the contents
of your target/classes
directory for redistribution.
Strictly speaking, you don't bind plugins to phases, but instead you bind goals to phases, and plugins have goals. So, to really understand Maven, you need to be familiar with three key terms: phases, plugins and goals. Plugins provide functionality by exposing goals, and goals are bound to lifecycle phases, which execute sequentially.
You can see which lifecycle phases exist and which goals are bound to which phases using the "help:describe" goal.
$ mvn help:describe -Dcmd=install [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building MyApp 1.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-help-plugin:2.1.1:describe (default-cli) @ MyApp --- [INFO] 'install' is a phase corresponding to this plugin: org.apache.maven.plugins:maven-install-plugin:2.3.1:install It is a part of the lifecycle for the POM packaging 'jar'. This lifecycle includes the following phases: * validate: Not defined * initialize: Not defined * generate-sources: Not defined * process-sources: Not defined * generate-resources: Not defined * process-resources: org.apache.maven.plugins:maven-resources-plugin:2.4.3:resources * compile: org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile * process-classes: Not defined * generate-test-sources: Not defined * process-test-sources: Not defined * generate-test-resources: Not defined * process-test-resources: org.apache.maven.plugins:maven-resources-plugin:2.4.3:testResources * test-compile: org.apache.maven.plugins:maven-compiler-plugin:2.3.2:testCompile * process-test-classes: Not defined * test: org.apache.maven.plugins:maven-surefire-plugin:2.7.2:test * prepare-package: Not defined * package: org.apache.maven.plugins:maven-jar-plugin:2.3.1:jar * pre-integration-test: Not defined * integration-test: Not defined * post-integration-test: Not defined * verify: Not defined * install: org.apache.maven.plugins:maven-install-plugin:2.3.1:install * deploy: org.apache.maven.plugins:maven-deploy-plugin:2.5:deploy [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.062s [INFO] Finished at: Thu Sep 01 18:39:13 CDT 2011 [INFO] Final Memory: 5M/81M [INFO] ------------------------------------------------------------------------
An equivalent Ant
file might look like this:
<project name="CoolApp"> <target name="validate"> </target> <target name="initialize" depends="validate"> </target> <target name="generate-sources" depends="initialize"> </target> <target name="process-sources" depends="generate-sources"> </target> <target name="generate-resources" depends="process-sources"> </target> <target name="process-resources" depends="generate-resources"> <copy todir="target/classes" filtering="true"> <fileset dir="src/main/resources" /> </copy> </target> <target name="compile" depends="process-resources"> <javac srcdir="src/main/java" destdir="target/classes" includes="**/*.java" /> </target> <target name="process-classes" depends="compile"> </target> <target name="generate-test-sources" depends="process-classes"> </target> <target name="process-test-resources" depends="generate-test-sources"> <copy todir="target/test-classes" filtering="true"> <fileset dir="src/main/resources" /> </copy> </target> <target name="test-compile" depends="process-test-resources"> <javac srcdir="src/test/java" destdir="target/test-classes" includes="**/*.java" /> </target> <target name="process-test-classes" depends="test-compile"> </target> <target name="test" depends="process-test-classes"> <junit> <classpath> <pathelement location="target/test-classes" /> </classpath> <batchtest todir="target/surefire-reports"> <fileset dir="src/test/java" includes="**/*.java" /> <formatter type="xml" /> </batchtest> </junit> </target> <target name="prepare-package" depends="test"> </target> <target name="package" depends="prepare-package"> <jar basedir="target/classes" jarfile="${projectname}.jar" /> </target> ... </project>
Maven just makes this common lifecycle implicit is all.
So... what is the lifecycle anyway? Well, if you don't specify one, there is a default lifecycle - you can also specify a different lifecycle; Maven defines two: site and clean. This means, when you invoke Maven, the second argument is either a lifecycle, a lifecycle phase, or a specific goal. There's actually no way to tell which one you're specifying; you just need to keep track.
You may be surprised, if you've got any familiarity with Maven, that I've
talked so long about Maven without mentioning dependency management - the
feature that brought so many people over to Maven in the first place! If
you're an Ant
user, you must be wondering by now how to set the classpath.
Maven's chief strength is that it manages your classpath for you. You
configure the classpath by declaring dependencies. Each possible Maven
dependency is associated with a groupId and an artifactId - if you want to
incorporate, say, spring, in your application, you add a dependencies
section
before the build
section:
... <dependencies> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>2.5.6</version> </dependency> </dependencies> ...
What happens now is that Maven will search your local hard drive for a file named ${home}/.m2/repository/org/springframework/spring-context/2.5.6/spring-context-2.5.6.jar. In general, it's looking for a file named:
${home}/.m2/repository/${groupId}/${artifactId}/${version}/${artifactId}-${version}.jar
after replacing all of the '.'s in ${groupId} with path separators ('/').
If it finds this file, it will be added to the classpath dynamically when the compiler is run. If it doesn't find this file, it will go look under http://repo1.maven.org/maven2/org/springframework/2.5.6/spring-context/2.5.6/spring-context-2.5.6.jar, download that file and place it in the aforementioned location, and then include it in the classpath.
In this way, you can dynamically declare all of your dependencies - Maven will go resolve them from its surprisingly comprehensive central repository when you invoke your build. All of these dependencies are versioned, so you can be assured that if you share your source code and pom with somebody else, they'll get the same build that you meant, without having to ship all of the dependencies.
I often find myself wanting to see what the actual classpath that was used was for some reason or another - there's a handy Maven command:
mvn dependency:build-classpath
that will show you what classpath it used to perform the build.
Very often, you'll depend on a third-party library that itself depends on
other libraries. Prior to Maven 2, you as the developer just had to keep
track of the dependencies of your dependencies (and their dependencies, and
so on), and make sure to include them all in your POM
. Maven 2 introduced
the concept of "transitive" dependencies - a dependency in the central
repository could actually declare its own dependencies, which Maven would
download for you at build time. Further, you can declare certain dependencies
to be needed at compile time, some to be needed at test time only (for
example, mock jars), and some to be needed at run time (if you're packaging
up a .war, for instance). The only downside of transitive dependencies is
that you sometimes want to build with a different version of the same .jar
that a dependency declares transitively — although Maven's pretty good
about recognizing and resolving this, you can disable transitive dependencies
on a case-by-case basis.
So in the age of IDE's, are these build scripts still necessary? Well, they
are if you want any sort of build automation or test automation. In fact,
Eclipse now integrates seamlessly with Maven and can generate a POM
for
you from your eclipse project file, and resolve dependencies directly from
your local repository cache.
What about Maven vs. Ant? Well, the jury may still be out on that one.
Although I find myself spending a lot of time looking up the Ant documentation
to figure out if I'm supposed to type srcdir
or src
or what the classpath syntax is, I have the opposite problem with Maven -
I have to go back and look at the documentation to figure out what it's
doing on my behalf. Maven has the central repsitory concept going for it,
but with Ant's Ivy
plugin available now, that may not be a
"competitive" advantage. On the whole, though, I lean toward Maven, especially
when I'm working with other people.
There's quite a bit more to Maven; it's well documented, once you have an idea how the whole "build lifecycle" and its associated plugins work. Hopefully this article helped you get a general sense of Maven so that the official documentation makes more sense. If you want to continue with Maven from here, you should definitely check out the site lifecycle, the deploy phase, and the snapshot versioning - but I'll leave all that for the Maven people to explain.
Add a comment:
Why are those last words in parentheses?
Instead, put it in scarlet red, 80-point type.
Maybe someone out there will pay attention.
My biggest complaint about Maven is that it builds everything all the time. It is always recommend to type "clean install". I think maven should only build what is necessary and only run tests after things that have changed or one of their dependencies has changed.