What is Elastic MapReduce(EMR) Encryption?

Nadtakan Futhoem
2 min readJul 16, 2021

EMR is a managed service, comprised of a cluster of highly scalable EC2 instances to process and run big data frameworks

  • You can encrypt at rest or in transit or both
  • They exist as a separate entity within EMR
  • By default, the instances within a cluster don’t encrypt data at rest
  • The instances within EMR are created from pre-configured AMIs (Amazon Machine Images)
  • You must use EMR version 5.7.0 or later to use custom AMIs and encrypt the root device volume for specific compliance reasons.

EMR encryption with EBS

If you decide to use EBS as persistence storage, there are a number of options that can work together.

  • Linux Unified Key Setup — You can specify AWS KMS to be used as your key management provider or use a Custom Key provider
  • Open-Source HDFS Encryption — Secure Hadoop RPC use SASL, Data encryption of HDFS Block transfer use the AES 256

EMR encryption with S3

Encryption at rest

  • EMR supports SSE-S3 or SSE-KMS for server-side encryption
  • You can also use CSE-KMS or CSE-C for encryption before storage

Encryption in transit using TLS certificate provider

  • PEM — you need to create PEM certificates and reference its zip file in S3
  • Custom — you need a custom certificate provider as a Java class

EMR Application-Specific Encryption

Once the TLS certificate provider has been configured, the following encryption features can be enabled:

Hadoop

  • Hadoop MapReduce Encrypted Shuffle uses TLS
  • Secure Hadoop PRC uses SASL
  • Data encryption of HDFS Block Transfer uses AES-256

Presto

  • When using EMR version 5.6.0 and later, any internal communication between Presto nodes use SSL/TLS

Tez

  • Tez Shuffle Handler uses TLS

Spark

  • Akka protocol uses TLS
  • Block transfer service uses SASL and 3DES

EMR Encryption with KMS

When using encryption at rest using KMS CMKs:

  • Ensure that the role assigned to your EC2 instances within the cluster has the relevant permissions to enable access to the CMK
  • Add the relevant role to the Key users for the CMK

EMR Transparent Encryption with HDFS

Transparent encryption in HDFS offers encryption both at rest and in transit

  • Data is encrypted and decrypted transparently without requiring changes to the application code
  • Each HDFS encryption zone has its own KMS Key; by default, EMR uses the Hadoop KMS, but you can also select an alternative
  • Each file is encrypted by a different data key, which is encrypted with the HDFS encryption zone key; it’s not possible to move files between encryption zones!

Remember, encryption across EMR is not provide by default

Nadtakan Futhoem — Sr. Software Engineer

--

--