Skip to content

fix: Replace plain SHA-256 with HMAC-SHA256 for remote function artifact integrity#5602

Draft
mollyheamazon wants to merge 1 commit intoaws:masterfrom
mollyheamazon:hmac-key-new
Draft

fix: Replace plain SHA-256 with HMAC-SHA256 for remote function artifact integrity#5602
mollyheamazon wants to merge 1 commit intoaws:masterfrom
mollyheamazon:hmac-key-new

Conversation

@mollyheamazon
Copy link
Contributor

Issue

PR #5379 replaced the original HMAC-based integrity check with plain SHA-256 hashing for remote function serialized artifacts. While this removed the prior
vulnerability of exposing the HMAC key via DescribeTrainingJob environment variables, it introduced a new one: an attacker with S3 write access can recompute
the SHA-256 hash for a malicious payload and replace both payload.pkl and metadata.json, bypassing integrity verification entirely and achieving arbitrary
code execution via cloudpickle.loads().

Solution

Re-introduce HMAC-SHA256 signing with the key stored in AWS Secrets Manager instead of environment variables. Add a Parameter Store trust anchor to prevent an
attacker from pointing metadata at an attacker-controlled secret.

Signing (serialization):

  1. Generate a random HMAC key and store it in Secrets Manager at sagemaker/remote-function/{job_name}/hmac-key
  2. Sign the payload with HMAC-SHA256
  3. Write the HMAC digest + secret ARN to metadata.json alongside the payload in S3
  4. Store the secret ARN in SSM Parameter Store at /sagemaker/remote-function/{job_name}/secret-arn as a trust anchor

Verification (deserialization):

  1. Download metadata.json from S3 (untrusted)
  2. Validate secret_arn from metadata:
    • Account check: parse account ID from the ARN, compare against sts:GetCallerIdentity (blocks cross-account attacks)
    • Parameter Store check: compare against the value stored in SSM (blocks same-account secret substitution)
  3. Retrieve the HMAC key from Secrets Manager using the validated ARN
  4. Recompute HMAC and compare using hmac.compare_digest()
  5. Only then deserialize via cloudpickle.loads()

Why this is secure

An attacker with S3 write access can replace payload.pkl and metadata.json, but cannot:

  • Forge a valid HMAC without the key (stored in Secrets Manager, not S3)
  • Point to a cross-account secret (blocked by STS account validation)
  • Point to a different secret in the same account (blocked by Parameter Store trust anchor, which the attacker cannot write to with only S3 access)

Breaking change

This is intentionally a breaking change. The legacy plain SHA-256 path is removed. Existing remote function jobs in-flight during upgrade will fail
deserialization and need to be re-run. Given the small user base (~8 customers), this is acceptable to avoid maintaining a vulnerable fallback.

New IAM permissions required

Execution roles used with @Remote / RemoteExecutor need:

Permission Resource Used by
secretsmanager:CreateSecret arn:aws:secretsmanager:*:*:secret:sagemaker/remote-function/* Serializer (client + job)
secretsmanager:GetSecretValue same Deserializer (client + job)
ssm:PutParameter arn:aws:ssm:::parameter/sagemaker/remote-function/* Serializer
ssm:GetParameter same Deserializer
sts:GetCallerIdentity * Deserializer

TODO before merge

  • AppSec review of Secrets Manager approach
  • Update integ test IAM roles with new permissions
  • Port to V2 (master-v2 branch)
  • Customer communication per CVE/MAPS process

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant