What is Elastic MapReduce(EMR) Encryption?

EMR is a managed service, comprised of a cluster of highly scalable EC2 instances to process and run big data frameworks

  • You can encrypt at rest or in transit or both
  • They exist as a separate entity within EMR
  • By default, the instances within a cluster don’t encrypt data at rest
  • The instances within EMR are created from pre-configured AMIs (Amazon Machine Images)
  • You must use EMR version 5.7.0 or later to use custom AMIs and encrypt the root device volume for specific compliance reasons.

EMR encryption with EBS

  • Linux Unified Key Setup — You can specify AWS KMS to be used as your key management provider or use a Custom Key provider
  • Open-Source HDFS Encryption — Secure Hadoop RPC use SASL, Data encryption of HDFS Block transfer use the AES 256

EMR encryption with S3

  • EMR supports SSE-S3 or SSE-KMS for server-side encryption
  • You can also use CSE-KMS or CSE-C for encryption before storage

Encryption in transit using TLS certificate provider

  • PEM — you need to create PEM certificates and reference its zip file in S3
  • Custom — you need a custom certificate provider as a Java class

EMR Application-Specific Encryption

Once the TLS certificate provider has been configured, the following encryption features can be enabled:

Hadoop

  • Hadoop MapReduce Encrypted Shuffle uses TLS
  • Secure Hadoop PRC uses SASL
  • Data encryption of HDFS Block Transfer uses AES-256

Presto

  • When using EMR version 5.6.0 and later, any internal communication between Presto nodes use SSL/TLS

Tez

  • Tez Shuffle Handler uses TLS

Spark

  • Akka protocol uses TLS
  • Block transfer service uses SASL and 3DES

EMR Encryption with KMS

When using encryption at rest using KMS CMKs:

  • Ensure that the role assigned to your EC2 instances within the cluster has the relevant permissions to enable access to the CMK
  • Add the relevant role to the Key users for the CMK

EMR Transparent Encryption with HDFS

Transparent encryption in HDFS offers encryption both at rest and in transit

  • Data is encrypted and decrypted transparently without requiring changes to the application code
  • Each HDFS encryption zone has its own KMS Key; by default, EMR uses the Hadoop KMS, but you can also select an alternative
  • Each file is encrypted by a different data key, which is encrypted with the HDFS encryption zone key; it’s not possible to move files between encryption zones!

Remember, encryption across EMR is not provide by default

Nadtakan Futhoem — Sr. Software Engineer

Founder of Nadtakan.com & Serverless Cloud developer. Follow me on Twitter https://twitter.com/NadtakanF