Wednesday, March 31, 2021

Chasing Java's release train, from 8 to 16. Part 1: JDK 9, the last big banger ...

A lot has been said about JDK's new 6 months release cadence cycle, and arguably, this is one of the most profound decisions being made in the recent history of the language and JVM platform in general. But if you, like me and other ~60% of the Java developers out there, stuck with JDK 8, the new releases mean little to your day to day job. But the language changes, the standard library changes, the JVM changes and so does the tooling around it. More importantly, the ecosystem also changes by aggressively raising the compatibility baselines. Java in 2021 is not the same as it was in 2020, 2017 and all the more in 2014, when Java 8 has seen the light. The soon to be out JDK 17, the next LTS release, would raise the bar even higher.

So the challenge many of us, seasoned Java developers, are facing is how to stay up to date? The risk of your skills becoming outdated is real. Hence this series of the blog posts: to consolidate the changes in the language, JVM, standard library and tooling over each release, primary from the perspective of the developer. At the end, we should be all set to meet the JDK 17 fully armed. To mention upfront, we are not going to talk about incubating or preview features, nor about come-and-go ones (looking at you, jaotc). Whenever it makes sense, the feature would have a corresponding JEP link, but overall there are many changes not covered by dedicated proposals. I will try to include as much as possible but there are chances some useful features may still slip away.

With that, let us open the stage with JDK 8.

JDK 8

Unsurprisingly, JDK 8 does not only receive the security patches and bugfixes but gets some of the new features from the upstream. The two notable ones are:
  • Improved Docker container detection and resource configuration usage: since 8u191, the JVM is fully container-aware (see please JDK-8146115). The support is only available on Linux based platforms and enabled by default.

  • JFR backport: since 8u262, JFR is supported seamlessly (see please JDK-8223147), no need to pass -XX:+UnlockCommercialFeatures and alike anymore

Although not strictly related, thanks to the JFR backport, the latest JDK Mission Control 8 release comes very handy to everyone. Hopefully, your organization does update to the recent JDK patch releases on the regular basis and these features are already available to you.

JDK 9

The JDK 9 release was just huge (90+ JEPs): not only from the perspective of the bundled features, but also from the impact on the platform and ecosystem. It was 2017 but even today the groundwork established by JDK 9 still continues. Let us take a look at the major features which went into it.

The JDK 9 changeset was massive, at the same time - this is the last big bang release. We have covered most interesting pieces of it, at least from the developer's point of view. In the upcoming posts we are going to dissect other releases, one by one, and reveal the hidden gems of each.

Wednesday, January 27, 2021

JEP-396 and You: strong encapsulation of the JDK internals is the default

Since the inception of the Project Jigsaw, one of its goals was to encapsulate most of the JDK internal APIs in order to give the contributors a freedom to move Java forward at faster pace. JEP-260, delivered along JDK 9 release was a first step in this direction. Indeed, the famous WARNING messages like the ones below

...
WARNING: An illegal reflective access operation has occurred
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
...

started to appear all over the place. Intended to give the developers and maintainers the time to switch to the alternative, publicly available APIs, they rather provoked the opposite: most just got used to them. Well, if nothing breaks, why bother?

But ... the things are going to change very soon. I think many of you have seen the code which tries to do clever things by gaining the access to the private method or fields of the classes from standard library. One of the notable examples I have seen often enough is overcoming ThreadPoolExecutor's core pool size / maximum pool size semantics (if curious, please read the documentations and complementaty material) by invoking its internal addWorker method. The example below is a simplified illustration of this idea (please, do not do that, ever).

final ThreadPoolExecutor executor = new ThreadPoolExecutor(5, 10, 0L, TimeUnit.MILLISECONDS,
    new LinkedBlockingQueue>Runnable<());

final Runnable task = ...
executor.submit(task);
    
// Queue is empty, enough workers have been created
if (executor.getQueue().isEmpty()) {
    return;
}
    
// Still have some room to go
if (executor.getActiveCount() < executor.getCorePoolSize()) {
    return;
}
    
// Core pool is full but not maxed out, let us add more workers
if (executor.getActiveCount() < executor.getMaximumPoolSize()) {
    final Method addWorker = ThreadPoolExecutor.class.getDeclaredMethod(
        "addWorker", Runnable.class, Boolean.TYPE);
    addWorker.setAccessible(true);
    addWorker.invoke(executor, null, Boolean.FALSE);
}

To be fair, this code works now on JDK 8, JDK 11 or JDK 15: find a private method, make it accessible, good to go. But it will not going to work smoothly in soon to be out JDK 16 and onwards, generating the InaccessibleObjectException exception at runtime upon setAccessible invocation.

Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make private boolean java.util.concurrent.ThreadPoolExecutor.addWorker(java.lang.Runnable,boolean) accessible: module java.base does not "opens java.util.concurrent" to unnamed module @72ea2f77
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
        at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
        at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
        ...

So what is happening here? The new JEP-396 continues the endeavours of JEP-260 by strongly encapsulating JDK internals by default. It has been integrated into JDK 16 and JDK 17 early builds which essentially means, no abusive access is going to be allowed anymore. Arguably, this is the right move, though it is very likely to be a disruptive one.

Should you be worrying? It is a good question: if you do not use any internal JDK APIs directly, it is very likely one of the libraries you depend upon may not play by the rules (or may not be ready for rule changes). Hopefully by the time JDK 16 is released, the ecosystem will be in a good shape. There is never good time, we were warned for years and the next milestone is about to be reached. If you could help your favorite library or framework, please do.

The complete list of the exported packages that will no longer be open by default was conveniently made available here, a couple to pay attention to:

java.beans.*
java.io
java.lang.*
java.math
java.net
java.nio.*
java.rmi.*
java.security.*
java.sql
java.text.*
java.time.*
java.util.*

Last but not least, you still could overturn the defaults using the --add-opens command line options, but please use it with great caution (or better, do not use it at all):

$ java --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED ...

Please be proactive and test with latest JDKs in advance, luckily the early access builds (JDK 16, JDK 17) are promptly available to everyone.

Saturday, November 28, 2020

All Your Tests Belong to You: Maintaining Mixed JUnit 4/JUnit 5 and Testng/JUnit 5 Test Suites

If you are seasoned Java developer who practices test-driven development (hopefully, everyone does it), it is very likely JUnit 4 has been your one-stop-shop testing toolbox. Personally, I truly loved it and still love: simple, minimal, non-intrusive and intuitive. Along with terrific libraries like Assertj and Hamcrest it makes writing test cases a pleasure.

But time passes by, Java has evolved a lot as a language, however JUnit 4 was not really up for a ride. Around 2015 the development of JUnit 5 has started with ambitious goal to become a next generation of the programmer-friendly testing framework for Java and the JVM. And, to be fair, I think this goal has been reached: many new projects adopt JUnit 5 from the get-go whereas the old ones are already in the process of migration (or at least are thinking about it).

For existing projects, the migration to JUnit 5 will not happen overnight and would probably take some time. In today's post we are going to talk about the ways to maintain mixed JUnit 4 / JUnit 5 and TestNG / JUnit 5 test suites with a help of Apache Maven and Apache Maven Surefire plugin.

To have an example a bit more realistic, we are going to test a UploadDestination class, which basically just provides a single method which says if a particular destination scheme is supported or not:

import java.net.URI;

public class UploadDestination {
    public boolean supports(String location) {
        final String scheme = URI.create(location).getScheme();
        return scheme.equals("http") || scheme.equals("s3") || scheme.equals("sftp");
    }
}

The implementer was kind enough to create a suite of JUnit 4 unit tests to verify that all expected destination schemes are indeed supported.

import static org.junit.Assert.assertTrue;

import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import org.junit.runners.Parameterized.Parameters;

@RunWith(Parameterized.class)
public class JUnit4TestCase {
    private UploadDestination destination;
    private final String location;

    public JUnit4TestCase(String location) {
        this.location = location;
    }

    @Before
    public void setUp() {
        destination = new UploadDestination();
    }

    @Parameters(name= "{index}: location {0} is supported")
    public static Object[] locations() {
        return new Object[] { "s3://test", "http://host:9000", "sftp://host/tmp" };
    }

    @Test
    public void testLocationIsSupported() {
        assertTrue(destination.supports(location));
    }
}

In the project build, at very least, you need to add JUnit 4 dependency along with Apache Maven Surefire plugin and, optionally Apache Maven Surefire Reporter plugin, to your pom.xml, the snippet below illustrates that.

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-report-plugin</artifactId>
                <version>3.0.0-M5</version>
            </plugin>
        </plugins>
    </build>

No magic here, triggering Apache Maven build would normally run all unit test suites every time.

...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit4TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.011 s - in com.example.JUnit4TestCase
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
...

Awesome, let us imagine at some point another teammate happens to work on the project and noticed there are no unit tests verifying the unsupported destination schemes so she adds some using JUnit 5.

package com.example;

import static org.junit.jupiter.api.Assertions.assertFalse;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;

class JUnit5TestCase {
    private UploadDestination destination;

    @BeforeEach
    void setUp() {
        destination = new UploadDestination();
    }

    @ParameterizedTest(name = "{index}: location {0} is supported")
    @ValueSource(strings = { "s3a://test", "https://host:9000", "ftp://host/tmp" } )
    public void testLocationIsNotSupported(String location) {
        assertFalse(destination.supports(location));
    }
}

Consequently, another dependency appears in the project's pom.xml to bring JUnit 5 in (since its API is not compatible with JUnit 4).

    <dependencies>
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter</artifactId>
            <version>5.7.0</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

Looks quite legitimate, isn't it? But there is a catch ... the test run results would surprise this time.

...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.076 s - in com.example.JUnit5TestCase
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
...

The JUnit 4 test suites are gone and such a behavior is actually well documentented by Apache Maven Surefire team in the Provider Selection section of the official documentation. So how we could get them back? There are a few possible options but the simplest one by far is to use JUnit Vintage engine in order to run JUnit 4 test suites using JUnit 5 platform.

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
                <dependencies>
                    <dependency>
                        <groupId>org.junit.vintage</groupId>
                        <artifactId>junit-vintage-engine</artifactId>
                        <version>5.7.0</version>
                    </dependency>
                </dependencies>
            </plugin>

With that, both JUnit 4 and JUnit 5 test suites are going to be executed side by side.

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.079 s - in com.example.JUnit5TestCase
[INFO] Running com.example.JUnit4TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 s - in com.example.JUnit4TestCase
[INFO] 
[INFO] Results:
[INFO]
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------

The lesson to learn here: please watch carefully that all your test suites are being executed (the CI/CD usually keeps track of such trends and warns you right away). Especially, be extra careful when migrating to latest Spring Boot or Apache Maven Surefire plugin versions.

Another quite common use case you may run into is mixing the TestNG and JUnit 5 test suites in the scope of one project. The symptoms are pretty much the same, you are going to wonder why only JUnit 5 test suites are being run. The treatment in this case is a bit different and one of the options which seems to work pretty well is to enumerate the test engine providers explicitly.

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M5</version>
                <dependencies>
                    <dependency>                                      
                        <groupId>org.apache.maven.surefire</groupId>  
                        <artifactId>surefire-junit-platform</artifactId>      
                        <version>3.0.0-M5</version>                   
                    </dependency>
                    <dependency>                                      
                        <groupId>org.apache.maven.surefire</groupId>  
                        <artifactId>surefire-testng</artifactId>      
                        <version>3.0.0-M5</version>                   
                    </dependency>                          
                </dependencies>
            </plugin>

The somewhat undesired effect in this case is the fact that the test suites are run separately (there are other ways though to try out), for example:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running com.example.JUnit5TestCase
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.074 s - in com.example.JUnit5TestCase
[INFO] 
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running TestSuite
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.315 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO]
INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] 
[INFO] ------------------------------------------------------------------------

To be fair, I think JUnit 5 is a huge step forward towards having modern and concise test suites for your Java (and in general, JVM) projects. These days there are seamless integrations available with mostly any other test framework or library (Mockito, TestContainers, ... ) and the migration path is not that difficult in most cases. Plus, as you have seen, co-existence of JUnit 5 with older test engines is totally feasible.

As always, the complete project samples are available on Github: JUnit 4/JUnit 5, TestNG / JUnit 5.

Wednesday, September 30, 2020

For gourmets and practioners: pick your flavour of the reactive stack with JAX-RS and Apache CXF

When JAX-RS 2.1 specification was released back in 2017, one of its true novelties was the introduction of the reactive API extensions. The industry has acknowledged the importance of the modern programming paradigms and specification essentially mandated the first-class support of the asynchronous and reactive programming for the Client API.

But what about the server side? It was not left out, the JAX-RS 2.1 asynchronous processing model has been enriched with Java 8's CompletionStage support, certainly a step in a right direction. Any existing REST web APIs built on top of the JAX-RS 2.1 implementation (like Apache CXF for example) could benefit from such enhancements right away.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionStage;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public CompletionStage<List<Person>> getPeople() {
        return CompletableFuture
        	.supplyAsync(() -> Arrays.asList(new Person("a@b.com", "Tom", "Knocker")));
    }
}

Udoubtedly, CompletionStage and CompletableFuture are powerful tools but not without own quirks and limitations. The Reactive Streams specification and a number of its implementations offer a considerably better glimpse on how asynchronous and reactive programming should look like on JVM. With that, the logical question pops up: could your JAX-RS web services and APIs take advantage of the modern reactive libraries? And if the answer is positive, what does it take?

If your bets are on Apache CXF, you are certainly well positioned. The latest Apache CXF 3.2.14 / 3.3.7 / 3.4.0 release trains bring a comprehesive support of RxJava3, RxJava2 and Project Reactor. Along this post we are going to see how easy it is to plug your favorite reactive library in and place it at the forefront of your REST web APIs and services.

Since the most applications and services on the JVM are built on top of excellent Spring framework and Spring Boot, we will be developing the reference implementations using those as a foundation. The Spring Boot starter which comes along with Apache CXF distribution is taking care of most of the boring wirings you would have needed to do otherwise.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-spring-boot-starter-jaxrs</artifactId>
	<version>3.4.0</version>
</dependency>

The Project Reactor is the number one choice as the reactive foundation for Spring-based applications and services, so let us just start from that.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-rt-rs-extension-reactor</artifactId>
	<version>3.4.0</version>
</dependency>

Great, believe it or not, we are mostly done here. In order to teach Apache CXF to understand Project Reactor types like Mono or/and Flux we need to tune the configuration just a bit using ReactorCustomizer instance.

import org.apache.cxf.Bus;
import org.apache.cxf.endpoint.Server;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.reactor.server.ReactorCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider;

@Configuration
public class AppConfig {
    @Bean
    public Server rsServer(Bus bus, PeopleRestService service) {
        final JAXRSServerFactoryBean bean = new JAXRSServerFactoryBean();
        bean.getProperties(true).put("useStreamingSubscriber", true);
        bean.setBus(bus);
        bean.setAddress("/");
        bean.setServiceBean(service);
        bean.setProvider(new JacksonJsonProvider());
        new ReactorCustomizer().customize(bean);
        return bean.create();
    }
}

With such customization in-place, our JAX-RS web services and APIs could freely utilize Project Reactor primitives in a streaming fashion, for example.

import reactor.core.publisher.Flux;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Flux<Person> getPeople() {
        return Flux.just(new Person("a@b.com", "Tom", "Knocker"));
    }
}

As you probably noticed, the implementation purposely does not do anything complicated. However, once the reactive types are put at work, you could unleash the full power of the library of your choice (and Project Reactor is really good at that).

Now, when you undestand the principle, it comes the turn of the RxJava3, the last generation of the pioneering reactive library for the JVM platform.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-rt-rs-extension-rx3</artifactId>
	<version>3.4.0</version>
</dependency>

The configuration tuning is mostly identical to the one we have seen with Project Reactor, the customizer instance, ReactiveIOCustomizer, is all that changes.

import org.apache.cxf.Bus;
import org.apache.cxf.endpoint.Server;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.rx3.server.ReactiveIOCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider;

@Configuration
public class AppConfig {
    @Bean
    public Server rsServer(Bus bus, PeopleRestService service) {
        final JAXRSServerFactoryBean bean = new JAXRSServerFactoryBean();
        bean.getProperties(true).put("useStreamingSubscriber", true);
        bean.setBus(bus);
        bean.setAddress("/");
        bean.setServiceBean(service);
        bean.setProvider(new JacksonJsonProvider());
        new ReactiveIOCustomizer().customize(bean);
        return bean.create();
    }
}

The list of supported types includes Flowable, Single and Observable, the equivalent implementation in terms of RxJava3 primitives may look like this.


import io.reactivex.rxjava3.core.Flowable;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Flowable<Person> getPeople() {
        return Flowable.just(new Person("a@b.com", "Tom", "Knocker"));
    }
}

Pretty simple, isn't it? If you stuck with an older generation, RxJava2, nothing to worry about, Apache CXF has you covered.

<dependency>
	<groupId>org.apache.cxf</groupId>
	<artifactId>cxf-rt-rs-extension-rx2</artifactId>
	<version>3.4.0</version>
</dependency>

The same configuration trick with applying the customizer (which may look annoying at this point to be fair) is all that is required.

import org.apache.cxf.Bus;
import org.apache.cxf.endpoint.Server;
import org.apache.cxf.jaxrs.JAXRSServerFactoryBean;
import org.apache.cxf.jaxrs.rx2.server.ReactiveIOCustomizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider;

@Configuration
public class AppConfig {
    @Bean
    public Server rsServer(Bus bus, PeopleRestService service) {
        final JAXRSServerFactoryBean bean = new JAXRSServerFactoryBean();
        bean.getProperties(true).put("useStreamingSubscriber", true);
        bean.setBus(bus);
        bean.setAddress("/");
        bean.setServiceBean(service);
        bean.setProvider(new JacksonJsonProvider());
        new ReactiveIOCustomizer().customize(bean);
        return bean.create();
    }
}

And we are good to go, ready to use the familiar reactive types Observable, Flowable and Single.

import io.reactivex.Flowable;
import io.reactivex.Observable;

@Service
@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Observable<Person> getPeople() {
        return Flowable
            .just(new Person("a@b.com", "Tom", "Knocker"))
            .toObservable();
    }
}

Last but not least, if you happens to be using the first generation of RxJava, it is also available with Apache CXF but certainly not recommended for production (as it has EOLed a couple of years ago).

Reactive programming paradigm is steadily getting more and more traction. It is great to see that the ecosystem embraces that and frameworks like Apache CXF are not an exception. If you are looking for robust foundation to build reactive and/or asynchronous REST web APIs on JVM, Apache CXF is worth considering, please give it a try!

The complete source code is available on Github.

Tuesday, July 28, 2020

It is never enough of them: enriching Apache Avro generated classes with custom Java annotations

Apache Avro, along with Apache Thrift and Protocol Buffers, is often being used as a platform-neutral extensible mechanism for serializing structured data. In the context of event-driven systems, the Apache Avro's schemas play the role of the language-agnostic contracts, shared between loosely-coupled components of the system, not necessarily written using the same programming language.

Probably, the most widely adopted reference architecture for such systems circles around Apache Kafka backed by Schema Registry and Apache Avro, although many other excellent options are available. Nevertheless, why Apache Avro?

The official documentation page summarizes pretty well the key advantages Apache Avro has over Apache Thrift and Protocol Buffers. But we are going to add another one to the list: biased (in a good sense) support of the Java and JVM platform in general.

Let us imagine that one of the components (or, it has to be said, microservice) takes care of the payment processing. Not every payment may succeed and to propagate such failures, the component broadcasts PaymentRejectedEvent whenever such unfortunate event happens. Here is its Apache Avro schema, persisted in the PaymentRejectedEvent.avsc file.

{
    "type": "record",
    "name": "PaymentRejectedEvent",
    "namespace": "com.example.event",
    "fields": [
        {
            "name": "id",
            "type": {
                "type": "string",
                "logicalType": "uuid"
            }
        },
        {
            "name": "reason",
            "type": {
                "type": "enum",
                "name": "PaymentStatus",
                "namespace": "com.example.event",
                "symbols": [
                    "EXPIRED_CARD",
                    "INSUFFICIENT_FUNDS",
                    "DECLINED"
                ]
            }
        },
        {
            "name": "date",
            "type": {
                "type": "long",
                "logicalType": "local-timestamp-millis"
            }
        }
    ]
}

The event is notoriously kept simple, you can safely assume that in more or less realistic system it has to have considerably more details available. To turn this event into Java class at build time, we could use Apache Avro Maven plugin, it is as easy as it could get.

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.10.0</version>
    <configuration>
        <stringType>String</stringType>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources/avro/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

Once the build finishes, you will get PaymentRejectedEvent Java class generated. But a few annoyances are going to emerge right away:

@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.lang.String id;
   private com.example.event.PaymentStatus reason;
   private long date;
}

The Java's types for id and date fields are not really what we would expect. Luckily, this is easy to fix by specifying customConversions plugin property, for example.

<plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.10.0</version>
    <configuration>
        <stringType>String</stringType>
        <customConversions>
            org.apache.avro.Conversions$UUIDConversion,org.apache.avro.data.TimeConversions$LocalTimestampMillisConversion
        </customConversions>
    </configuration>
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>schema</goal>
            </goals>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
                <outputDirectory>${project.build.directory}/generated-sources/avro/</outputDirectory>
            </configuration>
        </execution>
    </executions>
</plugin>

If we build the project this time, the plugin would generate the right types.

@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.util.UUID id;
   private com.example.event.PaymentStatus reason;
   private java.time.LocalDateTime date;
}

It looks much better! But what about next challenge. In Java, annotations are commonly used to associate some additional metadata pieces with a particular language element. What if we have to add a custom, application-specific annotation to all generated event classes? It does not really matter which one, let it be @javax.annotation.Generated, for example. It turns out, with Apache Avro it is not an issue, it has dedicated javaAnnotation property we could benefit from.

{
    "type": "record",
    "name": "PaymentRejectedEvent",
    "namespace": "com.example.event",
    "javaAnnotation": "javax.annotation.Generated(\"avro\")",
    "fields": [
        {
            "name": "id",
            "type": {
                "type": "string",
                "logicalType": "uuid"
            }
        },
        {
            "name": "reason",
            "type": {
                "type": "enum",
                "name": "PaymentStatus",
                "namespace": "com.example.event",
                "symbols": [
                    "EXPIRED_CARD",
                    "INSUFFICIENT_FUNDS",
                    "DECLINED"
                ]
            }
        },
        {
            "name": "date",
            "type": {
                "type": "long",
                "logicalType": "local-timestamp-millis"
            }
        }
    ]
}

When we rebuild the project one more time (hopefully the last one), the generated PaymentRejectedEvent Java class is going to be decorated with the additional custom annotation.

@javax.annotation.Generated("avro")
@org.apache.avro.specific.AvroGenerated
public class PaymentRejectedEvent extends ... {
   private java.util.UUID id;
   private com.example.event.PaymentStatus reason;
   private java.time.LocalDateTime date;
}

Obviously, this property has no effect if the schema is used to produce respective constructs in other programming languages but it still feels good to see that Java has privileged support in Apache Avro, thanks for that! As a side note, it is good to see that after some quite long inactivity time the project is expiriencing the second breath, with regular releases and new features delivered constantly.

The complete source code is available on Github.

Thursday, April 30, 2020

The crypto quirks using JDK's Cipher streams (and what to do about that)

In our day-to-day job we often run into the recurrent theme of transferring data (for example, files) from one location to another. It sounds like a really simple task but let us make it a bit more difficult by stating the fact that these files may contain confidential information and could be transferred over non-secure communication channels.

One of the solutions which comes to mind first is to use encryption algorithms. Since the files could be really large, hundreds of megabytes or tens of gigabytes, using the symmetric encryption scheme like AES would probably make a lot of sense. Besides just encryption it would be great to make sure that the data is not tampered in transit. Fortunately, there is a thing called authenticated encryption which simultaneously provides to us confidentiality, integrity, and authenticity guarantees. Galois/Counter Mode (GCM) is one of the most popular modes that supports authenticated encryption and could be used along with AES. These thoughts lead us to use AES256-GCM128, a sufficiently strong encryption scheme.

In case you are on JVM platform, you should feel lucky since AES and GCM are supported by Java Cryptography Architecture (JCA) out of the box. With that being said, let us see how far we could go.

The first thing we have to do is to generate a new AES256 key. As always, OWASP has a number of recommendations on using JCA/JCE APIs properly.

final SecureRandom secureRandom = new SecureRandom();
        
final byte[] key = new byte[32];
secureRandom.nextBytes(key);

final SecretKey secretKey = new SecretKeySpec(key, "AES");

Also, to initialize AES/GCM cipher we need to generate random initialization vector (or shortly, IV). As per NIST recommendations, its length should be 12 bytes (96 bits).

For IVs, it is recommended that implementations restrict support to the length of 96 bits, to promote interoperability, efficiency, and simplicity of design. - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC

So here we are:

final byte[] iv = new byte[12];
secureRandom.nextBytes(iv);

Having the AES key and IV ready, we could create a cipher instance and actually perform the encryption part. Dealing with large files assumes the reliance on streaming, therefore we use BufferedInputStream / BufferedOutputStream combined with CipherOutputStream for encryption.

public static void encrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
    final GCMParameterSpec parameterSpec = new GCMParameterSpec(128, iv);
    cipher.init(Cipher.ENCRYPT_MODE, secretKey, parameterSpec);

    try (final BufferedInputStream in = new BufferedInputStream(new FileInputStream(input))) {
        try (final BufferedOutputStream out = new BufferedOutputStream(new CipherOutputStream(new FileOutputStream(output), cipher))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];

            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

Please note how we specify GCM cipher parameters with the tag size of 128 bits and initialize it in encryption mode (be aware of some GCM limitations when dealing with files over 64Gb). The decryption part is no different besides the fact the cipher is initialized in decryption mode.

public static void decrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
    final GCMParameterSpec parameterSpec = new GCMParameterSpec(128, iv);
    cipher.init(Cipher.DECRYPT_MODE, secretKey, parameterSpec);
        
    try (BufferedInputStream in = new BufferedInputStream(new CipherInputStream(new FileInputStream(input), cipher))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];
                
            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

It seems like we are done, right? Unfortunately, not really, encrypting and decrypting the small files takes just a few moments but dealing with more or less realistic data samples gives shocking results.

Mostly 8 minutes to process a ~42Mb file (and as you may guess, larger is the file, longer it takes), the quick analysis reveals that most of that time is spent while decrypting the data (please note by no means this is a benchmark, merely a test). The search for possible culprits points out to the long-standing list of issues with AES/GCM and CipherInputStream / CipherOutputStream in JCA implementation here, here, here and here.

So what are the alternatives? It seems like it is possible to sacrifice the CipherInputStream / CipherOutputStream, refactor the implementation to use ciphers directly and make the encryption / decryption work using JCA primitives. But arguably there is a better way by bringing in battle-tested BouncyCastle library.

From the implementation perspective, the solutions are looking mostly identical. Indeed, although the naming conventions are unchanged, the CipherOutputStream / CipherInputStream in the snippet below are coming from BouncyCastle.

public static void encrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
    cipher.init(true, new AEADParameters(new KeyParameter(secretKey.getEncoded()), 128, iv));

    try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(input))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new CipherOutputStream(new FileOutputStream(output), cipher))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];

            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

public static void decrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
    cipher.init(false, new AEADParameters(new KeyParameter(secretKey.getEncoded()), 128, iv));

    try (BufferedInputStream in = new BufferedInputStream(new CipherInputStream(new FileInputStream(input), cipher))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];
                
            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

Re-runing the previous encryption/decryption tests using BouncyCastle crypto primitives yields the completely different picture.

To be fair, the file encryption / decryption on the JVM platform looked like a solved problem at first but turned out to be full of surprising discoveries. Nonetheless, thanks to BouncyCastle, some shortcomings of JCA implementation are addressed in efficient and clean way.

Please find the complete sources available on Github.