Thursday, April 30, 2020

The crypto quirks using JDK's Cipher streams (and what to do about that)

In our day-to-day job we often run into the recurrent theme of transferring data (for example, files) from one location to another. It sounds like a really simple task but let us make it a bit more difficult by stating the fact that these files may contain confidential information and could be transferred over non-secure communication channels.

One of the solutions which comes to mind first is to use encryption algorithms. Since the files could be really large, hundreds of megabytes or tens of gigabytes, using the symmetric encryption scheme like AES would probably make a lot of sense. Besides just encryption it would be great to make sure that the data is not tampered in transit. Fortunately, there is a thing called authenticated encryption which simultaneously provides to us confidentiality, integrity, and authenticity guarantees. Galois/Counter Mode (GCM) is one of the most popular modes that supports authenticated encryption and could be used along with AES. These thoughts lead us to use AES256-GCM128, a sufficiently strong encryption scheme.

In case you are on JVM platform, you should feel lucky since AES and GCM are supported by Java Cryptography Architecture (JCA) out of the box. With that being said, let us see how far we could go.

The first thing we have to do is to generate a new AES256 key. As always, OWASP has a number of recommendations on using JCA/JCE APIs properly.

final SecureRandom secureRandom = new SecureRandom();
        
final byte[] key = new byte[32];
secureRandom.nextBytes(key);

final SecretKey secretKey = new SecretKeySpec(key, "AES");

Also, to initialize AES/GCM cipher we need to generate random initialization vector (or shortly, IV). As per NIST recommendations, its length should be 12 bytes (96 bits).

For IVs, it is recommended that implementations restrict support to the length of 96 bits, to promote interoperability, efficiency, and simplicity of design. - Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC

So here we are:

final byte[] iv = new byte[12];
secureRandom.nextBytes(iv);

Having the AES key and IV ready, we could create a cipher instance and actually perform the encryption part. Dealing with large files assumes the reliance on streaming, therefore we use BufferedInputStream / BufferedOutputStream combined with CipherOutputStream for encryption.

public static void encrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
    final GCMParameterSpec parameterSpec = new GCMParameterSpec(128, iv);
    cipher.init(Cipher.ENCRYPT_MODE, secretKey, parameterSpec);

    try (final BufferedInputStream in = new BufferedInputStream(new FileInputStream(input))) {
        try (final BufferedOutputStream out = new BufferedOutputStream(new CipherOutputStream(new FileOutputStream(output), cipher))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];

            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

Please note how we specify GCM cipher parameters with the tag size of 128 bits and initialize it in encryption mode (be aware of some GCM limitations when dealing with files over 64Gb). The decryption part is no different besides the fact the cipher is initialized in decryption mode.

public static void decrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");
    final GCMParameterSpec parameterSpec = new GCMParameterSpec(128, iv);
    cipher.init(Cipher.DECRYPT_MODE, secretKey, parameterSpec);
        
    try (BufferedInputStream in = new BufferedInputStream(new CipherInputStream(new FileInputStream(input), cipher))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];
                
            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

It seems like we are done, right? Unfortunately, not really, encrypting and decrypting the small files takes just a few moments but dealing with more or less realistic data samples gives shocking results.

Mostly 8 minutes to process a ~42Mb file (and as you may guess, larger is the file, longer it takes), the quick analysis reveals that most of that time is spent while decrypting the data (please note by no means this is a benchmark, merely a test). The search for possible culprits points out to the long-standing list of issues with AES/GCM and CipherInputStream / CipherOutputStream in JCA implementation here, here, here and here.

So what are the alternatives? It seems like it is possible to sacrifice the CipherInputStream / CipherOutputStream, refactor the implementation to use ciphers directly and make the encryption / decryption work using JCA primitives. But arguably there is a better way by bringing in battle-tested BouncyCastle library.

From the implementation perspective, the solutions are looking mostly identical. Indeed, although the naming conventions are unchanged, the CipherOutputStream / CipherInputStream in the snippet below are coming from BouncyCastle.

public static void encrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
    cipher.init(true, new AEADParameters(new KeyParameter(secretKey.getEncoded()), 128, iv));

    try (BufferedInputStream in = new BufferedInputStream(new FileInputStream(input))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new CipherOutputStream(new FileOutputStream(output), cipher))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];

            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

public static void decrypt(SecretKey secretKey, byte[] iv, final File input, 
        final File output) throws Throwable {

    final GCMBlockCipher cipher = new GCMBlockCipher(new AESEngine());
    cipher.init(false, new AEADParameters(new KeyParameter(secretKey.getEncoded()), 128, iv));

    try (BufferedInputStream in = new BufferedInputStream(new CipherInputStream(new FileInputStream(input), cipher))) {
        try (BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(output))) {
            int length = 0;
            byte[] bytes = new byte[16 * 1024];
                
            while ((length = in.read(bytes)) != -1) {
                out.write(bytes, 0, length);
            }
        }
    }
}

Re-runing the previous encryption/decryption tests using BouncyCastle crypto primitives yields the completely different picture.

To be fair, the file encryption / decryption on the JVM platform looked like a solved problem at first but turned out to be full of surprising discoveries. Nonetheless, thanks to BouncyCastle, some shortcomings of JCA implementation are addressed in efficient and clean way.

Please find the complete sources available on Github.