Maven2: The Devil You Know

Maven is great, isn’t it? You just install it, download your favourite OSS project, type in mvn install and your jar file comes out the other end. Magic. It gets you thinking: Building should always be this easy, right? Your company should be using Maven to manage all its builds. Push-button builds could be just one download away. Right?

Think again.

Maven was designed to make builds portable, to make OSS source code build reliably on the machine of any developer. But when you are developing proprietary software products and services, portability of build is not on the top of your list. You have a whole different set of requirements. If you want to use Maven to manage builds in your company, you are going to have to grab a chair and a whip and start taming it first. Here’s a list of some of the questions that you will ask yourself while circling Maven’s cage:

How should I structure my Maven repositories?

You can just accept Maven’s out-of-the-box settings with regards to which repositories you will use, but before very long you’ll regret it. It’s not just that the public repositories are full of junk (mis-labeled artifacts, with incomplete POMs), they are full of licensing landmines. If you have concerns about certain OSS licenses (and you should - in a later blog entry I’ll tell you why), the default Maven repository policy will do nothing to help you sleep better at night. Unless you take steps to create your own internal ring-fenced Maven repositories, with policies in place to control which artifacts are allowed inside, sooner or later you’ll end up shipping your proprietary product with a viral OSS license ticking inside it. On the other hand, if you completely cut off your developers from public repositories, you’re going to frustrate their efforts to investigate new technologies and slow down development with excessive red tape. Later on, I’ll describe our Maven repository structure which provides safety without preventing innovation.

Where should I put Maven configuration?

There are a number of ways to override Maven defaults: you can set configuration parameters in the Maven installation itself; each user can override the defaults from their home directories; and each project can specify overrides in its POM, or in one of the POMs that it directly or indirectly inherits. Flexibility is a good thing, but with Maven, it comes without any guidance or advice. Your choice of where to throw certain switches has knock-on effects for the security and scalability of your Maven infrastructure. Different companies will have different approaches, and I’m not here to tell you that there is only one correct way to configure Maven – just that you need to spend some time up front thinking about the consequences. Again, later on I will describe how DSI configures Maven, and give reasons why, in the hope that this will be of some use to you when dealing with the same issue.

When and how should I use Parent POMs?

The Project Object Model provides all the information that Maven needs to successfully build your project. Projects tend to have many settings in common so, quite sensibly, Maven allows your POM to inherit a parent POM, avoiding the need to repeat yourself across multiple projects. Parent POMs themselves can also inherit, so you can find yourself with the opportunity of creating hierarchies of POMs but at the risk of strangling yourself with overly complex configuration. I’ll explain our POM hierarchy below, which not only avoids unnecessary complexity, but also makes whatever complexity we must introduce pay for itself in other ways.

How should I use snapshots?

One of Maven’s better known features is the snapshot. During the construction phase of any given iteration, you’ll want a latest version of your build artifact to either share with your colleagues or simply to import into other dependent projects. By setting your POM’s current version to include the word SNAPSHOT, Maven builds will timestamp the resulting artifact. Other projects can depend on that artifact, again employing the keyword SNAPSHOT, and Maven will automatically look for the most recently timestamped artifact. In order to share these artifacts, they need to be distributed via a snapshot repository. As ever, there are choices about who makes and distributes these snapshots. We have opted for a pretty standard configuration in this case, but it’s worth describing that and pointing out its consequences and how it differs from non-Maven scenarios.

How should I integrate with my IDE?

Remember IDE wars? Once upon a time it seems like your chances of success in a Java project hung precariously from the single thread of your choice of Integrated Development Environment. Thankfully things have settled down now. IntelliJ users wouldn’t be seen dead developing with anything else, NetBeans is celebrating its 10th birthday (but is probably nervous about how many people will come to the party) while Eclipse fans have never even heard of the previous two. All of the above offer Maven integration – in fact one of them (hint: Eclipse) offers at least three different ways to use Maven without ever leaving the cocoon. How will you deal with this embarassment of choice? Let us tell you what we’ve done – you may be surprised.

Here’s one we made earlier.

The next few diagrams begin to describe the decisions we’ve taken with regards to our use of Maven. As well as covering the questions raised above, they give an idea of the amount of time and energy that has gone in to our Maven rollout – a process that is not yet finished. These decisions were arrived at through patient and meticulous consultations, spread across many weeks, between development and process staff. Every time we thought we had figured it all out, some other issue would arise. The nature of the problem to be solved is complex, and the nature of Maven itself is not exactly straightforward. Besides the many choices it presents, that must be made in order to stabilize it and make it transparent, Maven’s documentation (notwithstanding two new free books on the subject) does not contain all the answers to the questions you will have.

Take a close look at this diagram:
DSI's Maven Repository Structure

If this seems overly-complex, let me remind you of the famous edict that things should be made as simple as possible, but no simpler. This, folks, was the simplest arrangement of repositories that we could arrive at while keeping our builds safe from undesirable artifacts or licenses, and remaining scalable.

There’s a lot to absorb in this one picture, but let me outline the most important points:

  • Developers can see the unapproved repository, while CI and SCM cannot. While developers have the artisanal freedom to experiment with whatever libraries they want (by accessing proxied external repositories), nothing gets through Continuous Integration or SCM builds unless it has been explicitly approved by being moved to the approved repository and having its license approved for the project (more on that in a moment).
  • All physical repositories are accessed, for download at least, through Archiva‘s Repository Groups, or Virtual Repositories. This layer of indirection allows us to expand the organization of our physical repositories – for example, by including a repo that is specific to one of our customers – without having to reconfigure the POMs and config files that reference them.
  • Explicit configuration is made for Maven Plugin Development. It is highly unlikely that you will use Maven without sooner or later wanting to write your own plugins. This activity has different constraints, with regards to use of licenses and permission to release.
  • Speaking of custom plugins, we have written one that checks the licensing information of the artifacts we use against lists of permitted license types for each project. These permitted license type lists are themselves stored in a physical repository managed by Archiva. The plugin does a great deal more, but we’ll probably devote an entire blog entry to that subject in the near future.
  • Configuration is done in POMs (project and parent) whenever possible, so that it is set centrally and travels to where it is needed. Some configuration must be done in settings.xml files. We have taken the decision to always do this in the user’s home directory rather than in the Maven installation. This is because some of those setting only make sense in the user’s settings (activation of profiles for example). By specifying that all non-POM configuration goes in users’ settings.xml, these settings survive updates to maven installations. The settings files are made accessible to our developers on the company wiki, so even these can be managed centrally.

Here’s another diagram for your consideration. This one deals with uploads to repositories rather than downloads. In other words, it shows how we manage distribution of artifacts:

Configuring access to Maven distros

The salient points here are:

  • While the POMs for certain projects either specify or inherit the location of their distribution repositories, the credentials for accessing these repositories is kept in the settings.xml files of certain specialized users (plugin devs, ci, scm). This protects the repositories from having junk uploaded onto them.
  • Only CI (Bamboo in our case) gets to send snapshots to the snapshot repository. In the past we have shared the latest version of code with our colleagues though the source control system. This has two disadvantages: firstly it forces developers to checkout and build code that they might never have to change, increasing their development footprint; secondly, our developers were not protected from the inevitable mistakes that make it into source control. By sharing shapshots through CI, we reduce the size of everyone’s builds (they only check out and build the modules that they are changing) and only snapshots which pass muster are shared.
  • The architecture team here in DSI has control over which artifacts are approved for use in CI or release builds. The process of approval is managed transparently, using JIRA to see through requests for approval from start to finish. This check is a vital quality control on the 3rd party libraries that we use, and it prevents the accidental use of GPL and other viral license types. As mentioned above, CI double-checks the licensing using our custom Maven plugin. More on licensing in the next diagram.

This last picture shows the flexibility and efficiency of our parent POM hierarchy, and the double-use we get from this hierarchy with respect to license checking:
Parallel hierarchies of parent POMs and allowed license files.

  • Projects can either inherit a company-wide parent POM, or inherit more customer-specific configuration from a customer-level POM (shared across many projects for that customer).
  • Customer-level POMS inherit the company-wide POM in any case.
  • The hierarchy of POMs that we create is also used as a way of managing the allowed licenses for a project: If there is no license file in the license repository corresponding to the project’s own POM, then we look for one corresponding to the parent POM. This search continues until a license file is found – it continues all the way back to the top-level company-wide Parent POM, which always has a corresponding standard set of allowed license types.
  • We only create parent POMs if we really need them – normally the company-wide POM will serve your purposes. Only if there is no choice, do you need to introduce another layer in the hierarchy.
  • This mechanism works for proprietary licenses as well as OSS licenses.

One of the questions above remains unanswered: How do we integrate Maven with our IDEs? As it happens, most of us here use Eclipse – but not all of us. Our choice is to use raw Maven and invoke it either at the command line, or as an external tool from the IDE. We get Eclipse to reflect the content of our POMs by running Maven’s eclipse mojo, which generates an eclipse project file with the correct source structure and library dependencies. It might be considered low-tech but it is beautifully simple, and this simplicity makes it more reliable than any maven eclipse plugin we’ve used so far.

Conclusion

If you’ve stuck with this article until the end, I want to thank you for your time. The length of the article is an indication of the size of the problem that you need to solve when you use Maven to manage your company’s builds. I could just as easily have written twice as much, given the number of issues we had to deal with.

At this point, you may be wondering why we are using Maven at all. When we first started putting a new build process together, four years ago, Maven was rejected in favour of our own fork of the Savant project. This has served us well over the intervening years, but circumstances have changed: Maven2 has come out as the de facto standard for building Java – although there is gathering competition. That fact alone explains 50% of our decision to use it – better the devil you know than the devil you don’t. By using Maven2, we can collaborate more easily with partner companies at the shop floor level. We can enjoy the usual advantage of an active OSS product: regular releases, Maven support for other build-time tools, and a wide choice of third-party plugins.

The bottom line is this: If you want Maven to manage your build, you will have to learn how to manage Maven. And that will take a great deal more time than you might think. This is one devil you’ll get to know very well.

,

  1. #1 by Walter on October 17, 2008 - 3:26 pm

  2. #2 by Brendan Lawlor on October 22, 2008 - 9:26 am

    I hadn’t seen that Walter, but thanks for the link. I guess I’m not the only one out there who has to hold their nose while using Maven.

  3. #3 by Frank Hellwig on November 15, 2008 - 12:21 am

    Regarding the paragraph beginning with “One of the questions above remains unanswered…”

    I simply can’t agree more. This is the message I have been putting out to our developers for quite a while. We have struggled with M2Eclipse and others — trying to figure out the mysteries of what resources get copied by default and which don’t. It’s just not worth the hassle. Running mvn eclipse:eclipse has restored sanity to our process.

    Great article!

  4. #4 by Brendan Lawlor on December 8, 2008 - 2:57 pm

    Thanks for taking the time to reply Frank, and it’s good to hear our conclusion validated. That said, there are still folks here who have had a positive experience with m2eclipse and they are free to use it. If it starts to behave erratically for them, they’ll have to switch back to the eclipse mojo.

  5. #5 by Luke Samad on June 30, 2009 - 7:00 pm

    Why not just use a tool like “Sonatype Nexus” or “Archiva.” You can then mirror all public repositories.

    Anything that is not public you place into a 3rd party repository and use a local repository for your artifacts (like snapshots, or releases).

  6. #6 by Brendan Lawlor on July 1, 2009 - 9:01 am

    Hi Luke,
    We’ve were using Archiva but have switched to Artifactory a few months back. We do mirror public repositories, and developers can build with the artifacts from those repos without restriction, when they want to try something out. But our Continuous Integration engine (Bamboo) is set up to use a profile that excludes these mirrors. The fact is that the quality of the POMs that are available on public repositories is not sufficient. The license information is missing or non-standard, there are often redirects to other repositories, and so on. When we promote an artifact from the unapproved or public mirrors, we generally have to give the POM a good scrubbing first.

  1. The Right Development Infrastructure at DeCare Systems Ireland Blog

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: