What is Elastic MapReduce(EMR) Encryption?
EMR is a managed service, comprised of a cluster of highly scalable EC2 instances to process and run big data frameworks
- You can encrypt at rest or in transit or both
- They exist as a separate entity within EMR
- By default, the instances within a cluster don’t encrypt data at rest
- The instances within EMR are created from pre-configured AMIs (Amazon Machine Images)
- You must use EMR version 5.7.0 or later to use custom AMIs and encrypt the root device volume for specific compliance reasons.
EMR encryption with EBS
If you decide to use EBS as persistence storage, there are a number of options that can work together.
- Linux Unified Key Setup — You can specify AWS KMS to be used as your key management provider or use a Custom Key provider
- Open-Source HDFS Encryption — Secure Hadoop RPC use SASL, Data encryption of HDFS Block transfer use the AES 256
EMR encryption with S3
Encryption at rest
- EMR supports SSE-S3 or SSE-KMS for server-side encryption
- You can also use CSE-KMS or CSE-C for encryption before storage
Encryption in transit using TLS certificate provider
- PEM — you need to create PEM certificates and reference its zip file in S3
- Custom — you need a custom certificate provider as a Java class
EMR Application-Specific Encryption
Once the TLS certificate provider has been configured, the following encryption features can be enabled:
Hadoop
- Hadoop MapReduce Encrypted Shuffle uses TLS
- Secure Hadoop PRC uses SASL
- Data encryption of HDFS Block Transfer uses AES-256
Presto
- When using EMR version 5.6.0 and later, any internal communication between Presto nodes use SSL/TLS
Tez
- Tez Shuffle Handler uses TLS
Spark
- Akka protocol uses TLS
- Block transfer service uses SASL and 3DES
EMR Encryption with KMS
When using encryption at rest using KMS CMKs:
- Ensure that the role assigned to your EC2 instances within the cluster has the relevant permissions to enable access to the CMK
- Add the relevant role to the Key users for the CMK
EMR Transparent Encryption with HDFS
Transparent encryption in HDFS offers encryption both at rest and in transit
- Data is encrypted and decrypted transparently without requiring changes to the application code
- Each HDFS encryption zone has its own KMS Key; by default, EMR uses the Hadoop KMS, but you can also select an alternative
- Each file is encrypted by a different data key, which is encrypted with the HDFS encryption zone key; it’s not possible to move files between encryption zones!
Remember, encryption across EMR is not provide by default