Saturday, November 28, 2020

All Your Tests Belong to You: Maintaining Mixed JUnit 4/JUnit 5 and Testng/JUnit 5 Test Suites

If you are seasoned Java developer who practices test-driven development (hopefully, everyone does it), it is very likely JUnit 4 has been your one-stop-shop testing toolbox. Personally, I truly loved it and still love: simple, minimal, non-intrusive and intuitive. Along with terrific libraries like Assertj and Hamcrest it makes writing test cases a pleasure.

But time passes by, Java has evolved a lot as a language, however JUnit 4 was not really up for a ride. Around 2015 the development of JUnit 5 has started with ambitious goal to become a next generation of the programmer-friendly testing framework for Java and the JVM. And, to be fair, I think this goal has been reached: many new projects adopt JUnit 5 from the get-go whereas the old ones are already in the process of migration (or at least are thinking about it).

For existing projects, the migration to JUnit 5 will not happen overnight and would probably take some time. In today's post we are going to talk about the ways to maintain mixed JUnit 4 / JUnit 5 and TestNG / JUnit 5 test suites with a help of Apache Maven and Apache Maven Surefire plugin.

To have an example a bit more realistic, we are going to test a UploadDestination class, which basically just provides a single method which says if a particular destination scheme is supported or not:

import java.net.URI;

public class UploadDestination {
    public boolean supports(String location) {
        final String scheme = URI.create(location).getScheme();
        return scheme.equals("http") || scheme.equals("s3") || scheme.equals("sftp");
    }
}

The implementer was kind enough to create a suite of JUnit 4 unit tests to verify that all expected destination schemes are indeed supported.

import static org.junit.Assert.assertTrue;

import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameters;

@RunWith(Parameterized.class)
public class JUnit4TestCase {
    private UploadDestination destination;
    private final String location;

    public JUnit4TestCase(String location) {
        this.location = location;
    }

    @Before
    public void setUp() {
        destination = new UploadDestination();
    }

    @Parameters(name= "{index}: location {0} is supported")
    public static Object[] locations() {
        return new Object[] { "s3://test", "http://host:9000", "sftp://host/tmp" };
    }

    @Test
    public void testLocationIsSupported() {
        assertTrue(destination.supports(location));
    }
}

In the project build, at very least, you need to add JUnit 4 dependency along with Apache Maven Surefire plugin and, optionally Apache Maven Surefire Reporter plugin, to your pom.xml, the snippet below illustrates that.

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-report-plugin</artifactId>
                <version>3.0.0-M5</version>
            </plugin>
        </plugins>
    </build>

No magic here, triggering Apache Maven build would normally run all unit test suites every time.

...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit4TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 s - in com.example.JUnit4TestCase
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
...

Awesome, let us imagine at some point another teammate happens to work on the project and noticed there are no unit tests verifying the unsupported destination schemes so she adds some using JUnit 5.

package com.example;

import static org.junit.jupiter.api.Assertions.assertFalse;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;

class JUnit5TestCase {
    private UploadDestination destination;

    @BeforeEach
    void setUp() {
        destination = new UploadDestination();
    }

    @ParameterizedTest(name = "{index}: location {0} is supported")
    @ValueSource(strings = { "s3a://test", "https://host:9000", "ftp://host/tmp" } )
    public void testLocationIsNotSupported(String location) {
        assertFalse(destination.supports(location));
    }
}

Consequently, another dependency appears in the project's pom.xml to bring JUnit 5 in (since its API is not compatible with JUnit 4).

    <dependencies>
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter</artifactId>
            <version>5.7.0</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

Looks quite legitimate, isn't it? But there is a catch ... the test run results would surprise this time.

...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.076 s - in com.example.JUnit5TestCase
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
...

The JUnit 4 test suites are gone and such a behavior is actually well documentented by Apache Maven Surefire team in the Provider Selection section of the official documentation. So how we could get them back? There are a few possible options but the simplest one by far is to use JUnit Vintage engine in order to run JUnit 4 test suites using JUnit 5 platform.

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
                <dependencies>
                    <dependency>
                        <groupId>org.junit.vintage</groupId>
                        <artifactId>junit-vintage-engine</artifactId>
                        <version>5.7.0</version>
                    </dependency>
                </dependencies>
            </plugin>

With that, both JUnit 4 and JUnit 5 test suites are going to be executed side by side.

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.079 s - in com.example.JUnit5TestCase
[INFO] Running com.example.JUnit4TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 s - in com.example.JUnit4TestCase
[INFO] 
[INFO] Results:
[INFO]
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------

The lesson to learn here: please watch carefully that all your test suites are being executed (the CI/CD usually keeps track of such trends and warns you right away). Especially, be extra careful when migrating to latest Spring Boot or Apache Maven Surefire plugin versions.

Another quite common use case you may run into is mixing the TestNG and JUnit 5 test suites in the scope of one project. The symptoms are pretty much the same, you are going to wonder why only JUnit 5 test suites are being run. The treatment in this case is a bit different and one of the options which seems to work pretty well is to enumerate the test engine providers explicitly.

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
                <dependencies>
                    <dependency>                                      
                        <groupId>org.apache.maven.surefire</groupId>  
                        <artifactId>surefire-junit-platform</artifactId>      
                        <version>3.0.0-M5</version>                   
                    </dependency>
                    <dependency>                                      
                        <groupId>org.apache.maven.surefire</groupId>  
                        <artifactId>surefire-testng</artifactId>      
                        <version>3.0.0-M5</version>                   
                    </dependency>                          
                </dependencies>
            </plugin>

The somewhat undesired effect in this case is the fact that the test suites are run separately (there are other ways though to try out), for example:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.074 s - in com.example.JUnit5TestCase
[INFO] 
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running TestSuite
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.315 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO]
INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] ------------------------------------------------------------------------

To be fair, I think JUnit 5 is a huge step forward towards having modern and concise test suites for your Java (and in general, JVM) projects. These days there are seamless integrations available with mostly any other test framework or library (Mockito, TestContainers, ... ) and the migration path is not that difficult in most cases. Plus, as you have seen, co-existence of JUnit 5 with older test engines is totally feasible.

As always, the complete project samples are available on Github: JUnit 4/JUnit 5, TestNG / JUnit 5.

Wednesday, September 30, 2020

For gourmets and practioners: pick your flavour of the reactive stack with JAX-RS and Apache CXF

When JAX-RS 2.1 specification was released back in 2017, one of its true novelties was the introduction of the reactive API extensions. The industry has acknowledged the importance of the modern programming paradigms and specification essentially mandated the first-class support of the asynchronous and reactive programming for the Client API.

But what about the server side? It was not left out, the JAX-RS 2.1 asynchronous processing model has been enriched with Java 8's CompletionStage support, certainly a step in a right direction. Any existing REST web APIs built on top of the JAX-RS 2.1 implementation (like Apache CXF for example) could benefit from such enhancements right away.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public CompletionStage<List<Person>> getPeople() {
        return CompletableFuture
        	.supplyAsync(() -> Arrays.asList(new Person("a@b.com", "Tom", "Knocker")));
    }
}

Udoubtedly, CompletionStage and CompletableFuture are powerful tools but not without own quirks and limitations. The Reactive Streams specification and a number of its implementations offer a considerably better glimpse on how asynchronous and reactive programming should look like on JVM. With that, the logical question pops up: could your JAX-RS web services and APIs take advantage of the modern reactive libraries? And if the answer is positive, what does it take?

If your bets are on Apache CXF, you are certainly well positioned. The latest Apache CXF 3.2.14 / 3.3.7 / 3.4.0 release trains bring a comprehesive support of RxJava3, RxJava2 and Project Reactor. Along this post we are going to see how easy it is to plug your favorite reactive library in and place it at the forefront of your REST web APIs and services.

Since the most applications and services on the JVM are built on top of excellent Spring framework and Spring Boot, we will be developing the reference implementations using those as a foundation. The Spring Boot starter which comes along with Apache CXF distribution is taking care of most of the boring wirings you would have needed to do otherwise.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-spring-boot-starter-jaxrs</artifactId>
	<version>3.4.0</version>
</dependency>

The Project Reactor is the number one choice as the reactive foundation for Spring-based applications and services, so let us just start from that.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-rt-rs-extension-reactor</artifactId>
	<version>3.4.0</version>
</dependency>

Great, believe it or not, we are mostly done here. In order to teach Apache CXF to understand Project Reactor types like Mono or/and Flux we need to tune the configuration just a bit using ReactorCustomizer instance.

import org.apache.cxf.Bus;
import org.apache.cxf.endpoint.Server;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.reactor.server.ReactorCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider;

@Configuration
public class AppConfig {
    @Bean
    public Server rsServer(Bus bus, PeopleRestService service) {
        final JAXRSServerFactoryBean bean = new JAXRSServerFactoryBean();
        bean.getProperties(true).put("useStreamingSubscriber", true);
        bean.setBus(bus);
        bean.setAddress("/");
        bean.setServiceBean(service);
        bean.setProvider(new JacksonJsonProvider());
        new ReactorCustomizer().customize(bean);
        return bean.create();
    }
}

With such customization in-place, our JAX-RS web services and APIs could freely utilize Project Reactor primitives in a streaming fashion, for example.

import reactor.core.publisher.Flux;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Flux<Person> getPeople() {
        return Flux.just(new Person("a@b.com", "Tom", "Knocker"));
    }
}

As you probably noticed, the implementation purposely does not do anything complicated. However, once the reactive types are put at work, you could unleash the full power of the library of your choice (and Project Reactor is really good at that).

Now, when you undestand the principle, it comes the turn of the RxJava3, the last generation of the pioneering reactive library for the JVM platform.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-rt-rs-extension-rx3</artifactId>
	<version>3.4.0</version>
</dependency>

The configuration tuning is mostly identical to the one we have seen with Project Reactor, the customizer instance, ReactiveIOCustomizer, is all that changes.

import org.apache.cxf.Bus;
import org.apache.cxf.endpoint.Server;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.rx3.server.ReactiveIOCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider;

@Configuration
public class AppConfig {
    @Bean
    public Server rsServer(Bus bus, PeopleRestService service) {
        final JAXRSServerFactoryBean bean = new JAXRSServerFactoryBean();
        bean.getProperties(true).put("useStreamingSubscriber", true);
        bean.setBus(bus);
        bean.setAddress("/");
        bean.setServiceBean(service);
        bean.setProvider(new JacksonJsonProvider());
        new ReactiveIOCustomizer().customize(bean);
        return bean.create();
    }
}

The list of supported types includes Flowable, Single and Observable, the equivalent implementation in terms of RxJava3 primitives may look like this.


import io.reactivex.rxjava3.core.Flowable;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Flowable<Person> getPeople() {
        return Flowable.just(new Person("a@b.com", "Tom", "Knocker"));
    }
}

Pretty simple, isn't it? If you stuck with an older generation, RxJava2, nothing to worry about, Apache CXF has you covered.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-rt-rs-extension-rx2</artifactId>
	<version>3.4.0</version>
</dependency>

The same configuration trick with applying the customizer (which may look annoying at this point to be fair) is all that is required.

import org.apache.cxf.Bus;
import org.apache.cxf.endpoint.Server;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.rx2.server.ReactiveIOCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider;

@Configuration
public class AppConfig {
    @Bean
    public Server rsServer(Bus bus, PeopleRestService service) {
        final JAXRSServerFactoryBean bean = new JAXRSServerFactoryBean();
        bean.getProperties(true).put("useStreamingSubscriber", true);
        bean.setBus(bus);
        bean.setAddress("/");
        bean.setServiceBean(service);
        bean.setProvider(new JacksonJsonProvider());
        new ReactiveIOCustomizer().customize(bean);
        return bean.create();
    }
}

And we are good to go, ready to use the familiar reactive types Observable, Flowable and Single.

import io.reactivex.Flowable;
import io.reactivex.Observable;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Observable<Person> getPeople() {
        return Flowable
            .just(new Person("a@b.com", "Tom", "Knocker"))
            .toObservable();
    }
}

Last but not least, if you happens to be using the first generation of RxJava, it is also available with Apache CXF but certainly not recommended for production (as it has EOLed a couple of years ago).

Reactive programming paradigm is steadily getting more and more traction. It is great to see that the ecosystem embraces that and frameworks like Apache CXF are not an exception. If you are looking for robust foundation to build reactive and/or asynchronous REST web APIs on JVM, Apache CXF is worth considering, please give it a try!

The complete source code is available on Github.

Tuesday, July 28, 2020

It is never enough of them: enriching Apache Avro generated classes with custom Java annotations

Apache Avro, along with Apache Thrift and Protocol Buffers, is often being used as a platform-neutral extensible mechanism for serializing structured data. In the context of event-driven systems, the Apache Avro's schemas play the role of the language-agnostic contracts, shared between loosely-coupled components of the system, not necessarily written using the same programming language.

Probably, the most widely adopted reference architecture for such systems circles around Apache Kafka backed by Schema Registry and Apache Avro, although many other excellent options are available. Nevertheless, why Apache Avro?

The official documentation page summarizes pretty well the key advantages Apache Avro has over Apache Thrift and Protocol Buffers. But we are going to add another one to the list: biased (in a good sense) support of the Java and JVM platform in general.

Let us imagine that one of the components (or, it has to be said, microservice) takes care of the payment processing. Not every payment may succeed and to propagate such failures, the component broadcasts PaymentRejectedEvent whenever such unfortunate event happens. Here is its Apache Avro schema, persisted in the PaymentRejectedEvent.avsc file.

{
    "type": "record",
    "name": "PaymentRejectedEvent",
    "namespace": "com.example.event",
    "fields": [
        {
            "name": "id",
            "type": {
                "type": "string",
                "logicalType": "uuid"
            }
        },
        {
            "name": "reason",
            "type": {
                "type": "enum",
                "name": "PaymentStatus",
                "namespace": "com.example.event",
                "symbols": [
                    "EXPIRED_CARD",
                    "INSUFFICIENT_FUNDS",
                    "DECLINED"
                ]
            }
        },
        {
            "name": "date",
            "type": {
                "type": "long",
                "logicalType": "local-timestamp-millis"
            }
        }
    ]
}

The event is notoriously kept simple, you can safely assume that in more or less realistic system it has to have considerably more details available. To turn this event into Java class at build time, we could use Apache Avro Maven plugin, it is as easy as it could get.

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.10.0</version>
    <configuration>
        <stringType>String</stringType>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources/avro/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

Once the build finishes, you will get PaymentRejectedEvent Java class generated. But a few annoyances are going to emerge right away:

@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.lang.String id;
   private com.example.event.PaymentStatus reason;
   private long date;
}

The Java's types for id and date fields are not really what we would expect. Luckily, this is easy to fix by specifying customConversions plugin property, for example.

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.10.0</version>
    <configuration>
        <stringType>String</stringType>
        <customConversions>
            org.apache.avro.Conversions$UUIDConversion,org.apache.avro.data.TimeConversions$LocalTimestampMillisConversion
        </customConversions>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources/avro/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

If we build the project this time, the plugin would generate the right types.

@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.util.UUID id;
   private com.example.event.PaymentStatus reason;
   private java.time.LocalDateTime date;
}

It looks much better! But what about next challenge. In Java, annotations are commonly used to associate some additional metadata pieces with a particular language element. What if we have to add a custom, application-specific annotation to all generated event classes? It does not really matter which one, let it be @javax.annotation.Generated, for example. It turns out, with Apache Avro it is not an issue, it has dedicated javaAnnotation property we could benefit from.

{
    "type": "record",
    "name": "PaymentRejectedEvent",
    "namespace": "com.example.event",
    "javaAnnotation": "javax.annotation.Generated(\"avro\")",
    "fields": [
        {
            "name": "id",
            "type": {
                "type": "string",
                "logicalType": "uuid"
            }
        },
        {
            "name": "reason",
            "type": {
                "type": "enum",
                "name": "PaymentStatus",
                "namespace": "com.example.event",
                "symbols": [
                    "EXPIRED_CARD",
                    "INSUFFICIENT_FUNDS",
                    "DECLINED"
                ]
            }
        },
        {
            "name": "date",
            "type": {
                "type": "long",
                "logicalType": "local-timestamp-millis"
            }
        }
    ]
}

When we rebuild the project one more time (hopefully the last one), the generated PaymentRejectedEvent Java class is going to be decorated with the additional custom annotation.

@javax.annotation.Generated("avro")
@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.util.UUID id;
   private com.example.event.PaymentStatus reason;
   private java.time.LocalDateTime date;
}

Obviously, this property has no effect if the schema is used to produce respective constructs in other programming languages but it still feels good to see that Java has privileged support in Apache Avro, thanks for that! As a side note, it is good to see that after some quite long inactivity time the project is expiriencing the second breath, with regular releases and new features delivered constantly.

The complete source code is available on Github.

Thursday, April 30, 2020

The crypto quirks using JDK's Cipher streams (and what to do about that)

In our day-to-day job we often run into the recurrent theme of transferring data (for example, files) from one location to another. It sounds like a really simple task but let us make it a bit more difficult by stating the fact that these files may contain confidential information and could be transferred over non-secure communication channels.

One of the solutions which comes to mind first is to use encryption algorithms. Since the files could be really large, hundreds of megabytes or tens of gigabytes, using the symmetric encryption scheme like AES would probably make a lot of sense. Besides just encryption it would be great to make sure that the data is not tampered in transit. Fortunately, there is a thing called authenticated encryption which simultaneously provides to us confidentiality, integrity, and authenticity guarantees. Galois/Counter Mode (GCM) is one of the most popular modes that supports authenticated encryption and could be used along with AES. These thoughts lead us to use AES256-GCM128, a sufficiently strong encryption scheme.

In case you are on JVM platform, you should feel lucky since AES and GCM are supported by Java Cryptography Architecture (JCA) out of the box. With that being said, let us see how far we could go.

The first thing we have to do is to generate a new AES256 key. As always, OWASP has a number of recommendations on using JCA/JCE APIs properly.

final SecureRandom secureRandom = new SecureRandom();
        
final byte[] key = new byte[32];
secureRandom.nextBytes(key);

final SecretKey secretKey = new SecretKeySpec(key, "AES");

Also, to initialize AES/GCM cipher we need to generate random initialization vector (or shortly, IV). As per NIST recommendations, its length should be 12 bytes (96 bits).

For IVs, it is recommended that implementations restrict support to the length of 96 bits, to promote interoperability, efficiency, and simplicity of design. - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC

So here we are:

final byte[] iv = new byte[12];
secureRandom.nextBytes(iv);

Having the AES key and IV ready, we could create a cipher instance and actually perform the encryption part. Dealing with large files assumes the reliance on streaming, therefore we use BufferedInputStream / BufferedOutputStream combined with CipherOutputStream for encryption.

public static void encrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
    final GCMParameterSpec parameterSpec = new GCMParameterSpec(128, iv);
    cipher.init(Cipher.ENCRYPT_MODE, secretKey, parameterSpec);

    try (final BufferedInputStream in = new BufferedInputStream(new FileInputStream(input))) {
        try (final BufferedOutputStream out = new BufferedOutputStream(new CipherOutputStream(new FileOutputStream(output), cipher))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];

            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

Please note how we specify GCM cipher parameters with the tag size of 128 bits and initialize it in encryption mode (be aware of some GCM limitations when dealing with files over 64Gb). The decryption part is no different besides the fact the cipher is initialized in decryption mode.

public static void decrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
    final GCMParameterSpec parameterSpec = new GCMParameterSpec(128, iv);
    cipher.init(Cipher.DECRYPT_MODE, secretKey, parameterSpec);
        
    try (BufferedInputStream in = new BufferedInputStream(new CipherInputStream(new FileInputStream(input), cipher))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];
                
            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

It seems like we are done, right? Unfortunately, not really, encrypting and decrypting the small files takes just a few moments but dealing with more or less realistic data samples gives shocking results.

Mostly 8 minutes to process a ~42Mb file (and as you may guess, larger is the file, longer it takes), the quick analysis reveals that most of that time is spent while decrypting the data (please note by no means this is a benchmark, merely a test). The search for possible culprits points out to the long-standing list of issues with AES/GCM and CipherInputStream / CipherOutputStream in JCA implementation here, here, here and here.

So what are the alternatives? It seems like it is possible to sacrifice the CipherInputStream / CipherOutputStream, refactor the implementation to use ciphers directly and make the encryption / decryption work using JCA primitives. But arguably there is a better way by bringing in battle-tested BouncyCastle library.

From the implementation perspective, the solutions are looking mostly identical. Indeed, although the naming conventions are unchanged, the CipherOutputStream / CipherInputStream in the snippet below are coming from BouncyCastle.

public static void encrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
    cipher.init(true, new AEADParameters(new KeyParameter(secretKey.getEncoded()), 128, iv));

    try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(input))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new CipherOutputStream(new FileOutputStream(output), cipher))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];

            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

public static void decrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
    cipher.init(false, new AEADParameters(new KeyParameter(secretKey.getEncoded()), 128, iv));

    try (BufferedInputStream in = new BufferedInputStream(new CipherInputStream(new FileInputStream(input), cipher))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];
                
            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

Re-runing the previous encryption/decryption tests using BouncyCastle crypto primitives yields the completely different picture.

To be fair, the file encryption / decryption on the JVM platform looked like a solved problem at first but turned out to be full of surprising discoveries. Nonetheless, thanks to BouncyCastle, some shortcomings of JCA implementation are addressed in efficient and clean way.

Please find the complete sources available on Github.

Tuesday, February 18, 2020

In praise of the thoughful design: how property-based testing helps me to be a better developer

The developer's testing toolbox is one of these things which rarely stays unchanged. For sure, some testing practices have proven to be more valuable than others but still, we are constantly looking for better, faster and more expressive ways to test our code. Property-based testing, largely unknown to Java community, is yet another gem crafted by Haskell folks and described in QuickCheck paper.

The power of this testing technique has been quickly realized by Scala community (where the ScalaCheck library was born) and many others but the Java ecosystem has lacked the interest into adopting property-based testing for quite some time. Luckily, since the jqwik appearance, the things are slowly changing for better.

For many, it is quite difficult to grasp what property-based testing is and how it could be exploited. The excellent presentation Property-based Testing for Better Code by Jessica Kerr and comprehensive An introduction to property-based testing, Property-based Testing Patterns series of articles are excellent sources to get you hooked, but in today's post we are going to try discovering the practical side of the property-based testing for typical Java developer using jqwik.

To start with, what the name property-based testing actually implies? The first thought of every Java developer would be it aims to test all getters and setters (hello 100% coverage)? Not really, although for some data structures it could be useful. Instead, we should identify the high-level characteristics, if you will, of the component, data structure, or even individual function and efficiently test them by formulating the hypothesis.

Our first example falls into category "There and back again": serialization and deserialization into JSON representation. The class under the test is User POJO, although trivial, please notice that it has one temporal property of type OffsetDateTime.

public class User {
    private String username;
    @JsonFormat(pattern = "yyyy-MM-dd'T'HH:mm:ss[.SSS[SSS]]XXX", shape = Shape.STRING)
    private OffsetDateTime created;
    
    // ...
}

It is surprising to see how often manipulation with date/time properties are causing issues these days since everyone tries to use own representation. As you could spot, our contract is using ISO-8601 interchange format with optional milliseconds part. What we would like to make sure is that any valid instance of User could be serialized into JSON and desearialized back into Java object without loosing any date/time precision. As an exercise, let us try to express that in pseudo code first:

For any user
  Serialize user instance to JSON
  Deserialize user instance back from JSON
  Two user instances must be identical

Looks simple enough but here comes the surprising part: let us take a look at how this pseudo code projects into real test case using jqwik library. It gets as close to our pseudo code as it possibly could.

@Property
void serdes(@ForAll("users") User user) throws JsonProcessingException {
    final String json = serdes.serialize(user);

    assertThat(serdes.deserialize(json))
        .satisfies(other -> {
            assertThat(user.getUsername()).isEqualTo(other.getUsername());
            assertThat(user.getCreated().isEqual(other.getCreated())).isTrue();
        });
        
    Statistics.collect(user.getCreated().getOffset());
}

The test case reads very easy, mostly natural, but obviously, there is some background hidden behind jqwik's @Property and @ForAll annotations. Let us start from @ForAll and clear out where all these User instances are coming from. As you may guess, these instances must be generated, preferably in a randomized fashion.

For most of the built-in data types jqwik has a rich set of data providers (Arbitraries), but since we are dealing with application-specific class, we have to supply our own generation strategy. It should be able to emit User class instances with the wide range of usernames and the date/time instants for different set of timezones and offsets. Let us do a sneak peek at the provider implementation first and discuss it in details right after.

@Provide
Arbitrary<User> users() {
    final Arbitrary<String> usernames = Arbitraries.strings().alpha().ofMaxLength(64);
 
    final Arbitrary<OffsetDateTime> dates = Arbitraries
        .of(List.copyOf(ZoneId.getAvailableZoneIds()))
        .flatMap(zone -> Arbitraries
            .longs()
            .between(1266258398000L, 1897410427000L) // ~ +/- 10 years
            .unique()
            .map(epochMilli -> Instant.ofEpochMilli(epochMilli))
            .map(instant -> OffsetDateTime.from(instant.atZone(ZoneId.of(zone)))));

    return Combinators
        .combine(usernames, dates)
        .as((username, created) -> new User(username).created(created));

}

The source of usernames is easy: just random strings. The source of dates basically could be any date/time between 2010 and 2030 whereas the timezone part (thus the offset) is randomly picked from all available region-based zone identifiers. For example, below are some samples jqwik came up with.

{"username":"zrAazzaDZ","created":"2020-05-06T01:36:07.496496+03:00"}
{"username":"AZztZaZZWAaNaqagPLzZiz","created":"2023-03-20T00:48:22.737737+08:00"}
{"username":"aazGZZzaoAAEAGZUIzaaDEm","created":"2019-03-12T08:22:12.658658+04:00"}
{"username":"Ezw","created":"2011-10-28T08:07:33.542542Z"}
{"username":"AFaAzaOLAZOjsZqlaZZixZaZzyZzxrda","created":"2022-07-09T14:04:20.849849+02:00"}
{"username":"aaYeZzkhAzAazJ","created":"2016-07-22T22:20:25.162162+06:00"}
{"username":"BzkoNGzBcaWcrDaaazzCZAaaPd","created":"2020-08-12T22:23:56.902902+08:45"}
{"username":"MazNzaTZZAEhXoz","created":"2027-09-26T17:12:34.872872+11:00"}
{"username":"zqZzZYamO","created":"2023-01-10T03:16:41.879879-03:00"}
{"username":"GaaUazzldqGJZsqksRZuaNAqzANLAAlj","created":"2015-03-19T04:16:24.098098Z"}
...

By default, jqwik will run the test against 1000 different sets of parameter values (randomized User instances). The quite helpful Statistics container allows to collect whatever distribution insights you are curious about. Just in case, why not to collect the distribution by zone offsets?

    ...
    -04:00 (94) :  9.40 %
    -03:00 (76) :  7.60 %
    +02:00 (75) :  7.50 %
    -05:00 (74) :  7.40 %
    +01:00 (72) :  7.20 %
    +03:00 (69) :  6.90 %
    Z      (62) :  6.20 %
    -06:00 (54) :  5.40 %
    +11:00 (42) :  4.20 %
    -07:00 (39) :  3.90 %
    +08:00 (37) :  3.70 %
    +07:00 (34) :  3.40 %
    +10:00 (34) :  3.40 %
    +06:00 (26) :  2.60 %
    +12:00 (23) :  2.30 %
    +05:00 (23) :  2.30 %
    -08:00 (20) :  2.00 %
    ...    

Let us consider another example. Imagine at some point we decided to reimplement the equality for User class (which in Java means, overriding equals and hashCode) based on username property. With that, for any pair of User class instances the following invariants must hold true:

  • if two User instances have the same username, they are equal and must have same hash code
  • if two User instances have different usernames, they are not equal (but hash code may not necessarily be different)
It is the perfect fit for property-based testing and jqwik in particular makes such kind of tests trivial to write and maintain.

@Provide
Arbitrary<String> usernames() {
    return Arbitraries.strings().alpha().ofMaxLength(64);
}

@Property
void equals(@ForAll("usernames") String username, @ForAll("usernames") String other) {
    Assume.that(!username.equals(other));
        
    assertThat(new User(username))
        .isEqualTo(new User(username))
        .isNotEqualTo(new User(other))
        .extracting(User::hashCode)
        .isEqualTo(new User(username).hashCode());
}

The assumptions expressed through Assume allow to put additional constraints on the generated parameters since we introduce two sources of the usernames, it could happen that both of them emit the identical username at the same run so the test would fail.

The question you may be holding up to now is: what is the point? It is surely possible to test serialization / deserialization or equals/hashCode without embarking on property-based testing and using jqwik, so why even bother? Fair enough, but the answer to this question basically lies deeply in how we approach the design of our software systems.

By and large, property-based testing is heavily influenced by functional programming, not a first thing which comes into mind with respect to Java (at least, not yet), to say it mildly. The randomized generation of test data is not novel idea per se, however what property-based testing is encouraging you to do, at least in my opinion, is to think in more abstract terms, focus not on individual operations (equals, compare, add, sort, serialize, ...) but what kind of properties, characteristics, laws and/or invariants they come with to obey. It certainly feels like an alien technique, paradigm shift if you will, encourages to spend more time on designing the right thing. It does not mean that from now on all your tests must be property-based but I believe it certainly deserves the place in the front row of our testing toolboxes.

Please find the complete project sources available on Github.