Showing posts with label maven. Show all posts
Showing posts with label maven. Show all posts

Saturday, November 28, 2020

All Your Tests Belong to You: Maintaining Mixed JUnit 4/JUnit 5 and Testng/JUnit 5 Test Suites

If you are seasoned Java developer who practices test-driven development (hopefully, everyone does it), it is very likely JUnit 4 has been your one-stop-shop testing toolbox. Personally, I truly loved it and still love: simple, minimal, non-intrusive and intuitive. Along with terrific libraries like Assertj and Hamcrest it makes writing test cases a pleasure.

But time passes by, Java has evolved a lot as a language, however JUnit 4 was not really up for a ride. Around 2015 the development of JUnit 5 has started with ambitious goal to become a next generation of the programmer-friendly testing framework for Java and the JVM. And, to be fair, I think this goal has been reached: many new projects adopt JUnit 5 from the get-go whereas the old ones are already in the process of migration (or at least are thinking about it).

For existing projects, the migration to JUnit 5 will not happen overnight and would probably take some time. In today's post we are going to talk about the ways to maintain mixed JUnit 4 / JUnit 5 and TestNG / JUnit 5 test suites with a help of Apache Maven and Apache Maven Surefire plugin.

To have an example a bit more realistic, we are going to test a UploadDestination class, which basically just provides a single method which says if a particular destination scheme is supported or not:

import java.net.URI;

public class UploadDestination {
    public boolean supports(String location) {
        final String scheme = URI.create(location).getScheme();
        return scheme.equals("http") || scheme.equals("s3") || scheme.equals("sftp");
    }
}

The implementer was kind enough to create a suite of JUnit 4 unit tests to verify that all expected destination schemes are indeed supported.

import static org.junit.Assert.assertTrue;

import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameters;

@RunWith(Parameterized.class)
public class JUnit4TestCase {
    private UploadDestination destination;
    private final String location;

    public JUnit4TestCase(String location) {
        this.location = location;
    }

    @Before
    public void setUp() {
        destination = new UploadDestination();
    }

    @Parameters(name= "{index}: location {0} is supported")
    public static Object[] locations() {
        return new Object[] { "s3://test", "http://host:9000", "sftp://host/tmp" };
    }

    @Test
    public void testLocationIsSupported() {
        assertTrue(destination.supports(location));
    }
}

In the project build, at very least, you need to add JUnit 4 dependency along with Apache Maven Surefire plugin and, optionally Apache Maven Surefire Reporter plugin, to your pom.xml, the snippet below illustrates that.

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-report-plugin</artifactId>
                <version>3.0.0-M5</version>
            </plugin>
        </plugins>
    </build>

No magic here, triggering Apache Maven build would normally run all unit test suites every time.

...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit4TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 s - in com.example.JUnit4TestCase
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
...

Awesome, let us imagine at some point another teammate happens to work on the project and noticed there are no unit tests verifying the unsupported destination schemes so she adds some using JUnit 5.

package com.example;

import static org.junit.jupiter.api.Assertions.assertFalse;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;

class JUnit5TestCase {
    private UploadDestination destination;

    @BeforeEach
    void setUp() {
        destination = new UploadDestination();
    }

    @ParameterizedTest(name = "{index}: location {0} is supported")
    @ValueSource(strings = { "s3a://test", "https://host:9000", "ftp://host/tmp" } )
    public void testLocationIsNotSupported(String location) {
        assertFalse(destination.supports(location));
    }
}

Consequently, another dependency appears in the project's pom.xml to bring JUnit 5 in (since its API is not compatible with JUnit 4).

    <dependencies>
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter</artifactId>
            <version>5.7.0</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

Looks quite legitimate, isn't it? But there is a catch ... the test run results would surprise this time.

...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.076 s - in com.example.JUnit5TestCase
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
...

The JUnit 4 test suites are gone and such a behavior is actually well documentented by Apache Maven Surefire team in the Provider Selection section of the official documentation. So how we could get them back? There are a few possible options but the simplest one by far is to use JUnit Vintage engine in order to run JUnit 4 test suites using JUnit 5 platform.

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
                <dependencies>
                    <dependency>
                        <groupId>org.junit.vintage</groupId>
                        <artifactId>junit-vintage-engine</artifactId>
                        <version>5.7.0</version>
                    </dependency>
                </dependencies>
            </plugin>

With that, both JUnit 4 and JUnit 5 test suites are going to be executed side by side.

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.079 s - in com.example.JUnit5TestCase
[INFO] Running com.example.JUnit4TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 s - in com.example.JUnit4TestCase
[INFO] 
[INFO] Results:
[INFO]
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------

The lesson to learn here: please watch carefully that all your test suites are being executed (the CI/CD usually keeps track of such trends and warns you right away). Especially, be extra careful when migrating to latest Spring Boot or Apache Maven Surefire plugin versions.

Another quite common use case you may run into is mixing the TestNG and JUnit 5 test suites in the scope of one project. The symptoms are pretty much the same, you are going to wonder why only JUnit 5 test suites are being run. The treatment in this case is a bit different and one of the options which seems to work pretty well is to enumerate the test engine providers explicitly.

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
                <dependencies>
                    <dependency>                                      
                        <groupId>org.apache.maven.surefire</groupId>  
                        <artifactId>surefire-junit-platform</artifactId>      
                        <version>3.0.0-M5</version>                   
                    </dependency>
                    <dependency>                                      
                        <groupId>org.apache.maven.surefire</groupId>  
                        <artifactId>surefire-testng</artifactId>      
                        <version>3.0.0-M5</version>                   
                    </dependency>                          
                </dependencies>
            </plugin>

The somewhat undesired effect in this case is the fact that the test suites are run separately (there are other ways though to try out), for example:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.074 s - in com.example.JUnit5TestCase
[INFO] 
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running TestSuite
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.315 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO]
INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] ------------------------------------------------------------------------

To be fair, I think JUnit 5 is a huge step forward towards having modern and concise test suites for your Java (and in general, JVM) projects. These days there are seamless integrations available with mostly any other test framework or library (Mockito, TestContainers, ... ) and the migration path is not that difficult in most cases. Plus, as you have seen, co-existence of JUnit 5 with older test engines is totally feasible.

As always, the complete project samples are available on Github: JUnit 4/JUnit 5, TestNG / JUnit 5.

Tuesday, July 28, 2020

It is never enough of them: enriching Apache Avro generated classes with custom Java annotations

Apache Avro, along with Apache Thrift and Protocol Buffers, is often being used as a platform-neutral extensible mechanism for serializing structured data. In the context of event-driven systems, the Apache Avro's schemas play the role of the language-agnostic contracts, shared between loosely-coupled components of the system, not necessarily written using the same programming language.

Probably, the most widely adopted reference architecture for such systems circles around Apache Kafka backed by Schema Registry and Apache Avro, although many other excellent options are available. Nevertheless, why Apache Avro?

The official documentation page summarizes pretty well the key advantages Apache Avro has over Apache Thrift and Protocol Buffers. But we are going to add another one to the list: biased (in a good sense) support of the Java and JVM platform in general.

Let us imagine that one of the components (or, it has to be said, microservice) takes care of the payment processing. Not every payment may succeed and to propagate such failures, the component broadcasts PaymentRejectedEvent whenever such unfortunate event happens. Here is its Apache Avro schema, persisted in the PaymentRejectedEvent.avsc file.

{
    "type": "record",
    "name": "PaymentRejectedEvent",
    "namespace": "com.example.event",
    "fields": [
        {
            "name": "id",
            "type": {
                "type": "string",
                "logicalType": "uuid"
            }
        },
        {
            "name": "reason",
            "type": {
                "type": "enum",
                "name": "PaymentStatus",
                "namespace": "com.example.event",
                "symbols": [
                    "EXPIRED_CARD",
                    "INSUFFICIENT_FUNDS",
                    "DECLINED"
                ]
            }
        },
        {
            "name": "date",
            "type": {
                "type": "long",
                "logicalType": "local-timestamp-millis"
            }
        }
    ]
}

The event is notoriously kept simple, you can safely assume that in more or less realistic system it has to have considerably more details available. To turn this event into Java class at build time, we could use Apache Avro Maven plugin, it is as easy as it could get.

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.10.0</version>
    <configuration>
        <stringType>String</stringType>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources/avro/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

Once the build finishes, you will get PaymentRejectedEvent Java class generated. But a few annoyances are going to emerge right away:

@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.lang.String id;
   private com.example.event.PaymentStatus reason;
   private long date;
}

The Java's types for id and date fields are not really what we would expect. Luckily, this is easy to fix by specifying customConversions plugin property, for example.

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.10.0</version>
    <configuration>
        <stringType>String</stringType>
        <customConversions>
            org.apache.avro.Conversions$UUIDConversion,org.apache.avro.data.TimeConversions$LocalTimestampMillisConversion
        </customConversions>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources/avro/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

If we build the project this time, the plugin would generate the right types.

@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.util.UUID id;
   private com.example.event.PaymentStatus reason;
   private java.time.LocalDateTime date;
}

It looks much better! But what about next challenge. In Java, annotations are commonly used to associate some additional metadata pieces with a particular language element. What if we have to add a custom, application-specific annotation to all generated event classes? It does not really matter which one, let it be @javax.annotation.Generated, for example. It turns out, with Apache Avro it is not an issue, it has dedicated javaAnnotation property we could benefit from.

{
    "type": "record",
    "name": "PaymentRejectedEvent",
    "namespace": "com.example.event",
    "javaAnnotation": "javax.annotation.Generated(\"avro\")",
    "fields": [
        {
            "name": "id",
            "type": {
                "type": "string",
                "logicalType": "uuid"
            }
        },
        {
            "name": "reason",
            "type": {
                "type": "enum",
                "name": "PaymentStatus",
                "namespace": "com.example.event",
                "symbols": [
                    "EXPIRED_CARD",
                    "INSUFFICIENT_FUNDS",
                    "DECLINED"
                ]
            }
        },
        {
            "name": "date",
            "type": {
                "type": "long",
                "logicalType": "local-timestamp-millis"
            }
        }
    ]
}

When we rebuild the project one more time (hopefully the last one), the generated PaymentRejectedEvent Java class is going to be decorated with the additional custom annotation.

@javax.annotation.Generated("avro")
@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.util.UUID id;
   private com.example.event.PaymentStatus reason;
   private java.time.LocalDateTime date;
}

Obviously, this property has no effect if the schema is used to produce respective constructs in other programming languages but it still feels good to see that Java has privileged support in Apache Avro, thanks for that! As a side note, it is good to see that after some quite long inactivity time the project is expiriencing the second breath, with regular releases and new features delivered constantly.

The complete source code is available on Github.

Thursday, July 18, 2019

Testing Spring Boot conditionals the sane way

If you are more or less experienced Spring Boot user, it is very luckily that at some point you may need to run into the situation when the particular beans or configurations have to be injected conditionally. The mechanics of it is well understood but sometimes the testing such conditions (and their combinations) could get messy. In this post we are going to talk about some possible (arguably, sane) ways to approach that.

Since Spring Boot 1.5.x is still widely used (nonetheless it is racing towards the EOL this August), we would include it along with Spring Boot 2.1.x, both with JUnit 4.x and JUnit 5.x. The techniques we are about to cover are equally applicable to the regular configuration classes as well as auto-configurations classes.

The example we will be playing with would be related to our home-made logging. Let us assume our Spring Boot application requires some bean for dedicated logger with name "sample". In certain circumstances however this logger has to be disabled (or become effectively a noop), so the property logging.enabled serves like a kill switch here. We use Slf4j and Logback in this example, but it is not really important. The LoggingConfiguration snippet below reflects this idea.

@Configuration
public class LoggingConfiguration {
    @Configuration
    @ConditionalOnProperty(name = "logging.enabled", matchIfMissing = true)
    public static class Slf4jConfiguration {
        @Bean
        Logger logger() {
            return LoggerFactory.getLogger("sample");
        }
    }
    
    @Bean
    @ConditionalOnMissingBean
    Logger logger() {
        return new NOPLoggerFactory().getLogger("sample"); 
    }
}

So how would we test that? Spring Boot (and Spring Framework in general) has always offered the outstanding test scaffolding support. The @SpringBootTest and @TestPropertySource annotations allow to quickly bootstrap the application context with the customized properties. There is one issue though: they are applied per test class level, not a per test method. It certainly makes sense but basically requires you to create a test class per combination of conditionals.

If you are still with JUnit 4.x, there is one trick you may found useful which exploits Enclosed runner, the hidden gem of the framework.

@RunWith(Enclosed.class)
public class LoggingConfigurationTest {
    @RunWith(SpringRunner.class)
    @SpringBootTest
    public static class LoggerEnabledTest {
        @Autowired private Logger logger;
        
        @Test
        public void loggerShouldBeSlf4j() {
            assertThat(logger).isInstanceOf(ch.qos.logback.classic.Logger.class);
        }
    }
    
    @RunWith(SpringRunner.class)
    @SpringBootTest
    @TestPropertySource(properties = "logging.enabled=false")
    public static class LoggerDisabledTest {
        @Autowired private Logger logger;
        
        @Test
        public void loggerShouldBeNoop() {
            assertThat(logger).isSameAs(NOPLogger.NOP_LOGGER);
        }
    }
}

You still have the class per condition but at least they are all in the same nest. With JUnit 5.x, some things got easier but not to the level as one might expect. Unfortunately, Spring Boot 1.5.x does not support JUnit 5.x natively, so we have to rely on extension provided by spring-test-junit5 community module. Here are the relevant changes in pom.xml, please notice that junit is explicitly excluded from the spring-boot-starter-test dependencies graph.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
    <exclusions>
        <exclusion>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>com.github.sbrannen</groupId>
    <artifactId>spring-test-junit5</artifactId>
    <version>1.5.0</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-api</artifactId>
    <version>5.5.0</version>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-engine</artifactId>
    <version>5.5.0</version>
    <scope>test</scope>
</dependency>

The test case itself is not very different besides usage of the @Nested annotation, which comes from JUnit 5.x to support tests as inner classes.

public class LoggingConfigurationTest {
    @Nested
    @ExtendWith(SpringExtension.class)
    @SpringBootTest
    @DisplayName("Logging is enabled, expecting Slf4j logger")
    public static class LoggerEnabledTest {
        @Autowired private Logger logger;
        
        @Test
        public void loggerShouldBeSlf4j() {
            assertThat(logger).isInstanceOf(ch.qos.logback.classic.Logger.class);
        }
    }
    
    @Nested
    @ExtendWith(SpringExtension.class)
    @SpringBootTest
    @TestPropertySource(properties = "logging.enabled=false")
    @DisplayName("Logging is disabled, expecting NOOP logger")
    public static class LoggerDisabledTest {
        @Autowired private Logger logger;
        
        @Test
        public void loggerShouldBeNoop() {
            assertThat(logger).isSameAs(NOPLogger.NOP_LOGGER);
        }
    }
}

If you try to run the tests from the command line using Apache Maven and Maven Surefire plugin, you might be surprised to see that none of them were executed during the build. The issue is that ... all nested classes are excluded ... so we need to put in place another workaround.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.22.2</version>
    <configuration>
        <excludes>
            <exclude />
        </excludes>
    </configuration>
</plugin>

With that, things should be rolling smoothly. But enough about legacy, the Spring Boot 2.1.x comes as the complete game changer. The family of the context runners, ApplicationContextRunner, ReactiveWebApplicationContextRunner and WebApplicationContextRunner, provide an easy and straightforward way to tailor the context on per test method level, keeping the test executions incredibly fast.

public class LoggingConfigurationTest {
    private final ApplicationContextRunner runner = new ApplicationContextRunner()
        .withConfiguration(UserConfigurations.of(LoggingConfiguration.class));
    
    @Test
    public void loggerShouldBeSlf4j() {
        runner
            .run(ctx -> 
                assertThat(ctx.getBean(Logger.class)).isInstanceOf(Logger.class)
            );
    }
    
    @Test
    public void loggerShouldBeNoop() {
        runner
            .withPropertyValues("logging.enabled=false")
            .run(ctx -> 
                assertThat(ctx.getBean(Logger.class)).isSameAs(NOPLogger.NOP_LOGGER)
            );
    }
}

It looks really great. The JUnit 5.x support in Spring Boot 2.1.x is much better and with the the upcoming 2.2 release, JUnit 5.x will be the default engine (not to worry, the old JUnit 4.x will still be supported). As of now, the switch to JUnit 5.x needs a bit of work on dependencies side.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
    <exclusions>
        <exclusion>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-api</artifactId>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-engine</artifactId>
    <scope>test</scope>
</dependency>

As an additional step, you may need to use recent Maven Surefire plugin, 2.22.0 or above, with out-of-the box JUnit 5.x support. The the snippet below illustrates that.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.22.2</version>
</plugin>

The sample configuration we have worked with is pretty naive, many of the real-world applications would end up with quite complex contexts built out of many conditionals. The flexibility and enormous opportunities that come out of the context runners, the invaluable addition to the Spring Boot 2.x test scaffolding, are just the live savers, please keep them in mind.

The complete project sources are available on Github.

Sunday, June 17, 2018

In the shoes of the consumer: do you really need to provide the client libraries for your APIs?

The beauty of the RESTful web services and APIs is that any consumer which speaks HTTP protocol will be able to understand and use it. Nonetheless, the same dilemma pops up over and over again: should you accompany your web APis with the client libraries or not? If yes, what languages or/and frameworks should you support?

Quite often this is not really an easy question to answer. So let us take a step back and think about the overall idea for a moment: what are the values which client libraries may bring to the consumers?

Someone may say to lower the barrier for adoption. Indeed, specifically in the case of strongly typed languages, exploring the API contracts from your favorite IDE (syntax highlighting and auto-completion please!) is quite handy. But by and large, the RESTful web APIs are simple enough to start with and good documentation would be certainly more valuable here.

Others may say it is good to shield the consumers from dealing with multiple API versions or rough edges. Also kind of make sense but I would argue it just hides the flaws with the way the web APIs in question are designed and evolve over time.

All in all, no matter how many clients you decide to bundle, the APIs are still going to be accessible by any generic HTTP consumer (curl, HttpClient, RestTemplate, you name it). Giving a choice is great but the price to pay for maintenance could be really high. Could we do it better? And as you may guess already, we certainly have quite a few options hence this post.

The key ingredient of the success here is to maintain an accurate specification of your RESTful web APIs, using OpenAPI v3.0 or even its predecessor, Swagger/OpenAPI 2.0 (or RAML, API Blueprint, does not really matter much). In case of OpenAPI/Swagger, the tooling is the king: one could use Swagger Codegen, a template-driven engine, to generate API clients (and even server stubs) in many different languages, and this is what we are going to talk about in this post.

To simplify the things, we are going to implement the consumer of the people management web API which we have built in the previous post. To begin with, we need to get its OpenAPI v3.0 specification in the YAML (or JSON) format.

java -jar server-openapi/target/server-openapi-0.0.1-SNAPSHOT.jar

And then:

wget http://localhost:8080/api/openapi.yaml

Awesome, the half of the job is done, literally. Now, let us allow Swagger Codegen to take a lead. In order to not complicate the matter, let's assume that consumer is also Java application, so we could understand the mechanics without any difficulties (but Java is only one of the options, the list of supported languages and frameworks is astonishing).

Along this post we are going to use OpenFeign, one of the most advanced Java HTTP client binders. Not only it is exceptionally simple to use, it offers quite a few integrations we are going to benefit from soon.

<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-core</artifactId>
    <version>9.7.0</version>
</dependency>

<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-jackson</artifactId>
    <version>9.7.0</version>
</dependency>

The Swagger Codegen could be run as stand-alone application from command-line, or Apache Maven plugin (the latter is what we are going to use).

<plugin>
    <groupId>io.swagger</groupId>
    <artifactId>swagger-codegen-maven-plugin</artifactId>
    <version>3.0.0-rc1</version>
    <executions>
        <execution>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <inputSpec>/contract/openapi.yaml</inputSpec>
                <apiPackage>com.example.people.api</apiPackage>
                <language>java</language>
                <library>feign</library>
                <modelPackage>com.example.people.model</modelPackage>
                <generateApiDocumentation>false</generateApiDocumentation>
                <generateSupportingFiles>false</generateSupportingFiles>
                <generateApiTests>false</generateApiTests>
                <generateApiDocs>false</generateApiDocs>
                <addCompileSourceRoot>true</addCompileSourceRoot>
                <configOptions>
                    <sourceFolder>/</sourceFolder>
                    <java8>true</java8>
                    <dateLibrary>java8</dateLibrary>
                    <useTags>true</useTags>
                </configOptions>
            </configuration>
        </execution>
    </executions>
</plugin>

If some of the options are not very clear, the Swagger Codegen has pretty good documentation to look for clarifications. The important ones to pay attention to is language and library, which are set to java and feign respectively. The one thing to note though, the support of the OpenAPI v3.0 specification is mostly complete but you may encounter some issues nonetheless (as you noticed, the version is 3.0.0-rc1).

What you will get when your build finishes is the plain old Java interface, PeopleApi, annotated with OpenFeign annotations, which is direct projection of the people management web API specification (which comes from /contract/openapi.yaml). Please notice that all model classes are generated as well.

@javax.annotation.Generated(
    value = "io.swagger.codegen.languages.java.JavaClientCodegen",
    date = "2018-06-17T14:04:23.031Z[Etc/UTC]"
)
public interface PeopleApi extends ApiClient.Api {
    @RequestLine("POST /api/people")
    @Headers({"Content-Type: application/json", "Accept: application/json"})
    Person addPerson(Person body);

    @RequestLine("DELETE /api/people/{email}")
    @Headers({"Content-Type: application/json"})
    void deletePerson(@Param("email") String email);

    @RequestLine("GET /api/people/{email}")
    @Headers({"Accept: application/json"})
    Person findPerson(@Param("email") String email);

    @RequestLine("GET /api/people")
    @Headers({"Accept: application/json"})
    List<Person> getPeople();
}

Let us compare it with Swagger UI interpretation of the same specification, available at http://localhost:8080/api/api-docs?url=/api/openapi.json:

It looks right at the first glance but we have better ensure things work out as expected. Once we have OpenFeign-annotated interface, it could be made functional (in this case, implemented through proxies) using the family of Feign builders, for example:

final PeopleApi api = Feign
    .builder()
    .client(new OkHttpClient())
    .encoder(new JacksonEncoder())
    .decoder(new JacksonDecoder())
    .logLevel(Logger.Level.HEADERS)
    .options(new Request.Options(1000, 2000))
    .target(PeopleApi.class, "http://localhost:8080/");

Great, fluent builder style rocks. Assuming our people management web APIs server is up and running (by default, it is going to be available at http://localhost:8080/):

java -jar server-openapi/target/server-openapi-0.0.1-SNAPSHOT.jar

We could communicate with it by calling freshly built PeopleApi instance methods, as in the code snippet below.:

final Person person = api.addPerson(
        new Person()
            .email("[email protected]")
            .firstName("John")
            .lastName("Smith"));

It is really cool, if we rewind it back a bit, we actually did nothing. Everything is given to us for free with only web API specification available! But let us not stop here and remind ourselves that using Java interfaces will not eliminate the reality that we are dealing with remote systems. And things are going to fail here, sooner or later, no doubts.

Not so long ago we have learned about circuit breakers and how useful they are when properly applied in the context of distributed systems. It would be really awesome to somehow introduce this feature into our OpenFeign-based client. Please welcome another member of the family, HystrixFeign builder, the seamless integration with Hytrix library:

final PeopleApi api = HystrixFeign
    .builder()
    .client(new OkHttpClient())
    .encoder(new JacksonEncoder())
    .decoder(new JacksonDecoder())
    .logLevel(Logger.Level.HEADERS)
    .options(new Request.Options(1000, 2000))
    .target(PeopleApi.class, "http://localhost:8080/");

The only thing we need to do is just to add these two dependencies (strictly speaking hystrix-core is not really needed if you do not mind to stay on older version) to consumer's pom.xml file.

<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-hystrix</artifactId>
    <version>9.7.0</version>
</dependency>

<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-core</artifactId>
    <version>1.5.12</version>
</dependency>

Arguably, this is one of the best examples of how easy and straightforward integration could be. But even that is not the end of the story. Observability in the distributed systems is as important as never and as we have learned a while ago, distributed tracing is tremendously useful in helping us out here. And again, OpenFeign has support for it right out of the box, let us take a look.

OpenFeign fully integrates with OpenTracing-compatible tracer. The Jaeger tracer is one of those, which among other things has really nice web UI front-end to explore traces and dependencies. Let us run it first, luckily it is fully Docker-ized.

docker run -d -e \
    COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
    -p 5775:5775/udp \
    -p 6831:6831/udp \
    -p 6832:6832/udp \
    -p 5778:5778 \
    -p 16686:16686 \
    -p 14268:14268 \
    -p 9411:9411 \
    jaegertracing/all-in-one:latest

A couple of additional dependencies have to be introduced in order to OpenFeign client to be aware of the OpenTracing capabilities.

<dependency>
    <groupId>io.github.openfeign.opentracing</groupId>
    <artifactId>feign-opentracing</artifactId>
    <version>0.1.0</version>
</dependency>

<dependency>
    <groupId>io.jaegertracing</groupId>
    <artifactId>jaeger-core</artifactId>
    <version>0.29.0</version>
</dependency>

From the Feign builder side, the only change (besides the introduction of the tracer instance) is to wrap up the client into TracingClient, like the snippet below demonstrates:

final Tracer tracer = new Configuration("consumer-openapi")
    .withSampler(
        new SamplerConfiguration()
            .withType(ConstSampler.TYPE)
            .withParam(new Float(1.0f)))
    .withReporter(
        new ReporterConfiguration()
            .withSender(
                new SenderConfiguration()
                    .withEndpoint("http://localhost:14268/api/traces")))
    .getTracer();
            
final PeopleApi api = Feign
    .builder()
    .client(new TracingClient(new OkHttpClient(), tracer))
    .encoder(new JacksonEncoder())
    .decoder(new JacksonDecoder())
    .logLevel(Logger.Level.HEADERS)
    .options(new Request.Options(1000, 2000))
    .target(PeopleApi.class, "http://localhost:8080/");

On the server-side we also need to integrate with OpenTracing as well. The Apache CXF has first-class support for it, bundled into cxf-integration-tracing-opentracing module. Let us include it as the dependency, this time to server's pom.xml.

<dependency>
    <groupId>org.apache.cxf</groupId>
    <artifactId>cxf-integration-tracing-opentracing</artifactId>
    <version>3.2.4</version>
</dependency>

Depending on the way your configure your applications, there should be an instance of the tracer available which should be passed later on to the OpenTracingFeature, for example.

// Create tracer
final Tracer tracer = new Configuration(
        "server-openapi", 
        new SamplerConfiguration(ConstSampler.TYPE, 1),
        new ReporterConfiguration(new HttpSender("http://localhost:14268/api/traces"))
    ).getTracer();

// Include OpenTracingFeature feature
final JAXRSServerFactoryBean factory = new JAXRSServerFactoryBean();
factory.setProvider(new OpenTracingFeature(tracer()));
...
factory.create()

From now on, the invocation of the any people management API endpoint through generated OpenFeign client will be fully traceable in the Jaeger web UI, available at http://localhost:16686/search (assuming your Docker host is localhost).

Our scenario is pretty simple but imagine the real applications where dozen of external service calls could happen while the single request travels through the system. Without distributed tracing in place, every issue has a chance to turn into a mystery.

As a side note, if you look closer to the trace from the picture, you may notice that server and consumer use different versions of the Jaeger API. This is not a mistake since the latest released version of Apache CXF is using older OpenTracing API version (and as such, older Jaeger client API) but it does not prevent things to work as expected.

With that, it is time to wrap up. Hopefully, the benefits of contract-based (or even better, contract-first) development in the world of RESTful web services and APIs become more and more apparent: generation of the smart clients, consumer-driven contract test, discoverability and rich documentation are just a few to mention. Please make use of it!

The complete project sources are available on Github.

Monday, January 13, 2014

Your build tool is your good friend: what sbt can do for Java developer

I think for developers picking the right build tool is a very important choice. For years I have been sticking to Apache Maven and, honestly, it does the job well enough, even nowadays it's a good tool to use. But I always feel it could be done much better ... and then Gradle came along ...

Despite many hours I have spent getting accustomed to Gradle way to do things, I finally gave up and switched back to Apache Maven. The reason - I didn't feel comfortable with it, mostly because of Groovy DSL. Anyway, I think Gradle is great, powerful and extensible build tool which is able to perform any task your build process needs.

But engaging myself more and more with Scala, I quickly discovered sbt. Though sbt is acronym for "simple build tool", my first impression was quite a contrary: I found it complicated and hard to understand. For some reasons, I liked it nonetheless and by spending more time reading the documentation (which is getting better and better), many experiments, I finally would say the choice is made. In this post I would like to show up couple of great things sbt can do to make Java developer's life easy (some knowledge of Scala would be very handy, but it's not required).

Before moving on to real example, couple of facts about sbt. It uses Scala as a language for build scenario and requires a launcher which could be downloaded from here (the version we'll be using is 0.13.1). There are several ways to describe build in sbt, the one this post demonstrates is by using Build.scala with single project.

Our example is a simple Spring console application with couple of JUnit test cases: just enough to see how build with external dependencies is structured and tests are run. Application contains only two classes:

package com.example;

import org.springframework.stereotype.Service;

@Service
public class SimpleService {
    public String getResult() {
        return "Result";
    }
}
and
package com.example;

import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.support.GenericApplicationContext;

public class Starter {
    @Configuration
    @ComponentScan( basePackageClasses = SimpleService.class )
    public static class AppConfig {  
    }
 
    public static void main( String[] args ) {
        try( GenericApplicationContext context = new AnnotationConfigApplicationContext( AppConfig.class ) ) {
            final SimpleService service = context.getBean( SimpleService.class );
            System.out.println( service.getResult() );
        }
    }
}

Now, let see how sbt build looks like. By convention, Build.scala should be located in project subfolder. Additionally, there should be present build.properties file with desired sbt version and plugins.sbt with external plugins (we will use sbteclipse plugin to generate Eclipse project files). We will start with build.properties which contains only one line:

sbt.version=0.13.1

and continue with plugins.sbt, which in our case is also just one line:

addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.4.0")

Lastly, let's start with the heart of our build: Build.scala. There would be two parts in it: common settings for all projects in our build (useful for multi-project builds but we have only one now) and here is the snippet of this part:

import sbt._
import Keys._
import com.typesafe.sbteclipse.core.EclipsePlugin._

object ProjectBuild extends Build {
  override val settings = super.settings ++ Seq(
    organization := "com.example",    
    name := "sbt-java",    
    version := "0.0.1-SNAPSHOT",    
    
    scalaVersion := "2.10.3",
    scalacOptions ++= Seq( "-encoding", "UTF-8", "-target:jvm-1.7" ),    
    javacOptions ++= Seq( "-encoding", "UTF-8", "-source", "1.7", "-target", "1.7" ),        
    outputStrategy := Some( StdoutOutput ),
    compileOrder := CompileOrder.JavaThenScala,
    
    resolvers ++= Seq( 
      Resolver.mavenLocal, 
      Resolver.sonatypeRepo( "releases" ), 
      Resolver.typesafeRepo( "releases" )
    ),        

    crossPaths := false,            
    fork in run := true,
    connectInput in run := true,
    
    EclipseKeys.executionEnvironment := Some(EclipseExecutionEnvironment.JavaSE17)
  )
}

The build above looks quite clean and understandable: resolvers is a straight analogy of Apache Maven repositories, EclipseKeys.executionEnvironment is customization for execution environment (Java SE 7) for generated Eclipse project. All these keys are very well documented.

Second part is much smaller and defines our main project in terms of dependencies and main class:

lazy val main = Project( 
  id = "sbt-java",  
  base = file("."), 
  settings = Project.defaultSettings ++ Seq(              
    mainClass := Some( "com.example.Starter" ),
     
    initialCommands in console += """
      import com.example._
      import com.example.Starter._
      import org.springframework.context.annotation._
    """,
       
    libraryDependencies ++= Seq(
      "org.springframework" % "spring-context" % "4.0.0.RELEASE",
      "org.springframework" % "spring-beans" % "4.0.0.RELEASE",
      "org.springframework" % "spring-test" % "4.0.0.RELEASE" % "test",
      "com.novocode" % "junit-interface" % "0.10" % "test",
      "junit" % "junit" % "4.11" % "test"
    )
  ) 
)

The initialCommands requires a bit of explanation here: sbt is able to run Scala console (REPL) and this setting allows to add default import statements so we can use our classes immediately. The dependency to junit-interface allows sbt to run JUnit test cases and it's the first thing we'll do: add some tests. Before creating actual tests, we will start sbt and ask it to run test cases on every code change, just like that:

sbt ~test

While sbt is running, we will add a test case:

package com.example;

import static org.hamcrest.core.IsEqual.equalTo;
import static org.junit.Assert.assertThat;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.support.GenericApplicationContext;

import com.example.Starter.AppConfig;

public class SimpleServiceTestCase {
    private GenericApplicationContext context;
    private SimpleService service;
 
    @Before
    public void setUp() {
        context = new AnnotationConfigApplicationContext( AppConfig.class );
        service = context.getBean( SimpleService.class );
    }
 
    @After
    public void tearDown() {
        context.close();
    }

    @Test
    public void testSampleTest() { 
        assertThat( service.getResult(), equalTo( "Result" ) );
    } 
}

In a console we should see that sbt picked the change automatically and run all test cases. Sadly, because of this issue which is already fixed and should be available in next release of junit-interface, we cannot use @RunWith and @ContextConfiguration annotation to run Spring test cases yet.

For TDD practitioners it's an awesome feature to have. The next terrific feature we are going to look at is Scala console (RELP) which gives as the ability to play with application without actually running it. It could be invoked by typing:

sbt console

and observing something like this in the terminal (as we can see, the imports from initialCommands are automatically included):

At this moment playground is established and we can do a lot of very interesting things, for example: create context, get beans and call any methods on them:

sbt takes care about classpath so all your classes and external dependencies are available for use. I found this way to discover things much faster than by using debugger or other techniques.

At the moment, there is no good support for sbt in Eclipse but it's very easy to generate Eclipse project files by using sbteclipse plugin we've touched before:

sbt eclipse

Awesome! Not to mention other great plugins which are kindly listed here and the ability to import Apache Maven POM files using externalPom() which really simplifies the migration. As a conclusion from my side, if you are looking for better, modern, extensible build tool for your project, please take a look at sbt. It's a great piece of software built on top of awesome, consise language.

Complete project is available on GitHub.

Wednesday, February 2, 2011

Using @Configurable in Spring Framework: inject dependency to any object

Honestly, I like Spring Framework: awesome dependency management combined with rich features and great community. Recently I encountered very nice feature - @Configurable. In short, it allows developer to inject dependency into any object (who defines this annotation) without explicit bean definition. It's pretty cool. Techniques behind - AspectJ with runtime weaving.

Let me show how it works by creating simple test project. Let's start with Maven's project descriptor (pom.xml):

 4.0.0

 spring-configurable
 spring-configurable
 0.0.1-SNAPSHOT
 jar

 spring-configurable
 http://maven.apache.org

 
  UTF-8
  3.0.5.RELEASE
 
 
 
  
   org.aspectj
   aspectjweaver
   1.6.5
  
  
  
   org.springframework
   spring-beans
   ${spring.framework.version}
  
  
  
   org.springframework
   spring-aspects
   ${spring.framework.version}
  
  
  
   org.springframework
   spring-core
   ${spring.framework.version}
  
  
  
   org.springframework
   spring-context
   ${spring.framework.version}
  
  
  
   org.springframework
   spring-jdbc
   ${spring.framework.version}
  
 

Next one, let me create application context (applicationContext.xml) and place it inside src/main/resources:


   
 <context:annotation-config />
 <context:spring-configured />
 <context:load-time-weaver />
 
 <context:component-scan base-package="com.test.configurable" />
 
 <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource" />

Two key declarations here are <context:spring-configured /> and <context:load-time-weaver />. According to documentation, <context:spring-configured /> "... signals the current application context to apply dependency injection to non-managed classes that are
instantiated outside of the Spring bean factory (typically classes annotated with the @Configurable annotation)..."
and <context:load-time-weaver /> turns on runtime weaving.

Respective Java code which uses power of @Configurable annotation is located into Starter.java, which itself is placed inside src/main/java:
package com.test.configurable;

import javax.sql.DataSource;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Configurable;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
import org.springframework.util.Assert;

public class Starter {
 
 @Configurable
 private static class Inner {
  @Autowired private DataSource dataSource;  
 }
 
 public static void main(String[] args) {
  final ApplicationContext context = new ClassPathXmlApplicationContext( "/applicationContext.xml" );
  
  final DataSource dataSource = context.getBean( DataSource.class );
  Assert.notNull( dataSource );
  
  Assert.notNull( new Inner().dataSource );
 } 

}
As we see, class Inner is not a bean, is nested into another class and is created simply by calling new. Nevertheless, dataSource bean is injected as expected. Last but not least, in order this example to work, application should be run using an agent (I am using Spring Framework 3.0.5): -javaagent:spring-instrument-3.0.5.RELEASE.jar

Tuesday, November 30, 2010

Getting started with Apache Mahout

Recently I have got an interesting problem to solve: how to classify text from different sources using automation? Some time ago I read about a project which does this as well as many other text analysis stuff - Apache Mahout. Though it's not a very mature one (current version is 0.4), it's very powerful and scalable. Build on top of another excellent project, Apache Hadoop, it's capable to analyze huge data sets.

So I did a small project in order to understand how Apache Mahout works. I decided to use Apache Maven 2 in order to manage all dependencies so I will start with POM file first.


  4.0.0
  org.acme
  mahout
  0.94
  Mahout Examples
  Scalable machine learning library examples
  jar

  
    UTF-8
    0.4
  
 
  
    
      
        org.apache.maven.plugins
        maven-compiler-plugin
        
          UTF-8
          1.6
          1.6
          true
        
      
    
  

  
    
      org.apache.mahout
      mahout-core
      ${apache.mahout.version}
    

    
      org.apache.mahout
      mahout-math
      ${apache.mahout.version}
    

    
      org.apache.mahout
      mahout-utils
      ${apache.mahout.version}
    


     
      org.slf4j
      slf4j-api
      1.6.0
    

    
      org.slf4j
      slf4j-jcl
      1.6.0
    
  

Then I looked into Apache Mahout examples and algorithms available for text classification problem. The most simple and accurate one is Naive Bayes classifier. Here is a code snippet:
package org.acme;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.FileReader;
import java.util.List;

import org.apache.hadoop.fs.Path;
import org.apache.mahout.classifier.ClassifierResult;
import org.apache.mahout.classifier.bayes.TrainClassifier;
import org.apache.mahout.classifier.bayes.algorithm.BayesAlgorithm;
import org.apache.mahout.classifier.bayes.common.BayesParameters;
import org.apache.mahout.classifier.bayes.datastore.InMemoryBayesDatastore;
import org.apache.mahout.classifier.bayes.exceptions.InvalidDatastoreException;
import org.apache.mahout.classifier.bayes.interfaces.Algorithm;
import org.apache.mahout.classifier.bayes.interfaces.Datastore;
import org.apache.mahout.classifier.bayes.model.ClassifierContext;
import org.apache.mahout.common.nlp.NGrams;

public class Starter {
 public static void main( final String[] args ) {
  final BayesParameters params = new BayesParameters();
  params.setGramSize( 1 );
  params.set( "verbose", "true" );
  params.set( "classifierType", "bayes" );
  params.set( "defaultCat", "OTHER" );
  params.set( "encoding", "UTF-8" );
  params.set( "alpha_i", "1.0" );
  params.set( "dataSource", "hdfs" );
  params.set( "basePath", "/tmp/output" );
  
  try {
      Path input = new Path( "/tmp/input" );
      TrainClassifier.trainNaiveBayes( input, "/tmp/output", params );
   
      Algorithm algorithm = new BayesAlgorithm();
      Datastore datastore = new InMemoryBayesDatastore( params );
      ClassifierContext classifier = new ClassifierContext( algorithm, datastore );
      classifier.initialize();
      
      final BufferedReader reader = new BufferedReader( new FileReader( args[ 0 ] ) );
      String entry = reader.readLine();
      
      while( entry != null ) {
          List< String > document = new NGrams( entry, 
                          Integer.parseInt( params.get( "gramSize" ) ) )
                          .generateNGramsWithoutLabel();

          ClassifierResult result = classifier.classifyDocument( 
                           document.toArray( new String[ document.size() ] ), 
                           params.get( "defaultCat" ) );          

          entry = reader.readLine();
      }
  } catch( final IOException ex ) {
   ex.printStackTrace();
  } catch( final InvalidDatastoreException ex ) {
   ex.printStackTrace();
  }
 }
}
There is one important note here: system must be taught before starting classification. In order to do so, it's necessary to provide examples (more - better) of different text classification. It should be simple files where each line starts with category separated by tab from text itself. F.e.:
SUGGESTION  That's a great suggestion
QUESTION  Do you sell Microsoft Office?
...
More files you can provide, more precise classification you will get. All files must be put to the '/tmp/input' folder, they will be processed by Apache Hadoop first. :)

Monday, April 19, 2010

Using Maven 2 and Ant's XMLTask to modify XML files

When we are talking about software development, it's not only about writing a code (for sure, high-quality code). It's also about a bunch of supporting processes like automated building, testing, deployment, integration, ... In this blog I am trying to touch every aspect so this post starts a series of articles about building Java projects with Apache Maven 2. The Maven's web site has very good documentation so I will skip introductory part and concentrate on some practical issues which arrive quite often.

Suppose, you have XML configuration files and depending on build profile you have to modify some parameters (database server address, JMS endpoints, ...). How to do that with Apache Maven 2? Quite easy using ... Apache Ant integration for Apache Maven 2. Apache Ant has excellent and very powerful plug-in to work with XML files - XMLTask. Let us make use of it!

<profiles>
 <profile>      
  <id>testing</id>
   <build>
    <plugins>
     <plugin>
      <artifactId>maven-antrun-plugin</artifactId>
       <dependencies>
        <dependency>
         <groupId>com.oopsconsultancy</groupId>
         <artifactId>xmltask</artifactId>
         <version>1.14</version>
       </dependency>
      </dependencies>
      <executions>
       <execution>
        <phase>prepare-package</phase>
        <configuration>
         <tasks> 
           <echo message="Using testing configuration" />
            <taskdef name="xmltask"
             classname="com.oopsconsultancy.xmltask.ant.XmlTask"
             classpathref="maven.plugin.classpath"/>
            <xmltask 
             source="${project.basedir}/src/main/webapp/WEB-INF/web.xml" 
             dest="${project.build.directory}/web.xml" 
             preserveType="true">        
            <remove path="//*[@id='<some id here>']" />
           </xmltask>           
         </configuration>
        </executions>
       </execution>
     </plugin>
    </plugins>
   </build>
  </profile>
 </profiles>

What this simple fragment does: for testing builds, it will remove from web.xml all XML elements with id attribute <some id here>. Not very meaningful but gives the idea how it works. XMLTask could do mostly everything you need: insert/removed elements and XML fragments, insert/remove/modify attributes with values and properties, copy/cut/paste XML, and a lot more. I found it extremely useful.