Designing Zero-Knowledge Architecture for Your Lab
Part 3 of the “Building Zero-Knowledge Biotech Infrastructure” series
In Part 1, we explained why standard cloud encryption leaves biotech data exposed during processing. In Part 2, we showed how AWS Nitro Enclaves provide hardware-enforced isolation with cryptographic attestation.
Now the practical question: how do you actually design a zero-knowledge confidential computing architecture for your biotech organization? The technology is available. The hard part is the architectural decisions that determine whether it actually protects your data or just adds complexity.
This post covers the five decisions that matter most.
Key Terms
Terms used throughout this post:
- Attestation: A cryptographic process that proves exactly what code is running inside an enclave. Before releasing a decryption key, the key management service verifies a signed attestation document from the hardware itself.
- PCR values: Platform Configuration Register values. Cryptographic hashes that measure the exact software stack inside an enclave: the image, the kernel, the application binary. If anything changes, the hash changes.
- vsock: The single local socket channel an enclave uses to communicate with its parent instance. There is no other network interface. All data in and out travels through this channel.
- KMS: Key Management Service. A cloud-hosted service for creating and controlling encryption keys. In a zero-knowledge architecture, KMS key policies use attestation values to ensure only verified enclave code can trigger decryption.
Decision 1: Who Owns the Encryption Keys in Your Zero-Knowledge Architecture?
This is the single most important architectural choice. Everything else follows from it.
There are three models:
Platform-Managed Keys
The processing platform generates and stores encryption keys. Your data is encrypted, but the platform operator retains control of the keys. On most platforms, a sufficiently privileged administrator (one with root or hypervisor-level access) can potentially inspect process memory where key material is loaded during decryption operations. They may not exercise that access, and most platforms have policies against it. But the capability exists in the infrastructure, and no cryptographic control prevents it.
What you get: Encryption at rest, encryption in transit, and the operator’s promise not to look.
What you don’t get: Independence from the operator. If the platform is compromised, subpoenaed, or acquired, your data can be decrypted by someone other than you.
When this is acceptable: When the data isn’t sensitive enough to justify the operational overhead of customer-owned keys. De-identified datasets, published reference data, non-proprietary analyses.
Customer-Managed Keys (Same Account)
You create encryption keys in the platform’s cloud account, but only you can access them through IAM policies. The platform operator has administrative access to the account but commits not to use it.
What you get: Logical separation of key access. Audit trails showing who accessed what.
What you don’t get: Mathematical guarantees. An account administrator can always modify IAM policies. Your security depends on the operator’s discipline, not cryptographic enforcement.
When this is acceptable: When you trust the operator and need simpler onboarding. Internal platforms where the “operator” is your own DevOps team.
Customer-Owned Keys (Separate Account)
You create encryption keys in your own cloud account. The processing platform never has administrative access to your keys. Key access is gated by hardware attestation: the key will only decrypt data inside a verified, unmodified enclave.
What you get: Cryptographic proof that only attested code can access your data. The platform operator cannot decrypt your data even if they wanted to, even if they’re compromised, even if they’re compelled by court order. They don’t have the keys.
What you don’t get: Simplicity. Customer-owned keys require cross-account IAM roles, attestation-based key management policies, and a coordination process when the enclave code updates. Your team needs to understand key policies, or work with someone who does.
When this is necessary: When you’re processing data where unauthorized access has existential consequences. Patient genomics, clinical trial data, proprietary drug discovery pipelines, pre-publication research.
Our Recommendation
For biotech data that matters, customer-owned keys in a separate account is the only model that provides guarantees rather than promises. Every other model has an administrator somewhere who can bypass the controls.
This is harder to set up. It creates operational friction when enclave code updates (because attestation hashes change and key policies need updating). But when a customer asks “can you access our data?” the answer is “no, and here’s the key policy that proves it.” That’s a different conversation than “no, we promise.”
Decision 2: Where Are Your Trust Boundaries?
A trust boundary is the line between “we trust the code on this side” and “we don’t trust anything on that side.” In a zero-knowledge architecture, you need to draw these lines explicitly.
The Zero-Knowledge Architecture in Practice
The diagram below shows a complete end-to-end flow for a confidential computing biotech workload: from your machine, through your own key management account, into the processing infrastructure, and back.
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ Client CLI │
│ 1. Encrypts file locally with a data key │
│ 2. Uploads ciphertext only (plaintext never leaves this machine) │
└──────────────────────────────┬──────────────────────────────────────────────┘
│ ciphertext only
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR CLOUD ACCOUNT │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ KMS Key (you own it and the platform has no administrative access) │ │
│ │ │ │
│ │ Policy: kms:Decrypt permitted ONLY IF the request carries a valid │ │
│ │ attestation document with a matching PCR0 hash. │ │
│ │ All other callers (including platform operators) are denied. │ │
│ └─────────────────────────────┬────────────────────────────────────────┘ │
│ │ attestation-gated decryption │
│ │ cross-account IAM role, PCR0 condition │
└────────────────────────────────┼────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROCESSING INFRASTRUCTURE │
│ │
│ ┌─────────────────────┐ vsock only ┌───────────────────────────────┐ │
│ │ Parent Instance │◄══════════════►│ Nitro Enclave │ │
│ │ (UNTRUSTED) │ │ (TRUSTED) │ │
│ │ │ │ │ │
│ │ Pulls ciphertext │ │ 1. Attests to your KMS key │ │
│ │ from object storage │ │ 2. Receives decryption key │ │
│ │ │ │ re-encrypted for enclave │ │
│ │ Proxies KMS calls │ │ public key only │ │
│ │ on behalf of │ │ 3. Decrypts data in memory │ │
│ │ enclave │ │ 4. Runs analysis pipeline │ │
│ │ │ │ 5. Re-encrypts results │ │
│ │ Writes encrypted │ │ 6. Returns ciphertext │ │
│ │ results to storage │ │ via vsock │ │
│ │ │ │ │ │
│ │ Never sees │ │ No network. No persistent │ │
│ │ plaintext │ │ storage. No SSH access. │ │
│ └─────────────────────┘ │ No operator access. │ │
│ └───────────────────────────────┘ │
│ Hypervisor enforces memory isolation at hardware level │
│ Platform operators cannot read enclave memory even with root access │
└────────────────────────────────┬────────────────────────────────────────────┘
│ ciphertext only
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ Client CLI retrieves and decrypts results locally │
│ using your own cloud credentials. Plaintext never existed outside │
│ the enclave or your machine. │
└─────────────────────────────────────────────────────────────────────────────┘
The trusted zone is minimal: the hardware, the attested enclave code, and your own encryption key. Everything else (the parent instance, the network, the platform operator’s application code, cloud provider staff) is untrusted by design. The architecture assumes those layers are compromised and still protects your data.
What This Means for Your Architecture
When you draw trust boundaries this way, several things change:
Networking becomes simpler. The enclave has no network access. You don’t need complex VPN configurations, network segmentation, or bastion hosts to protect data during processing. The enclave’s network isolation is enforced by hardware, not firewall rules.
Monitoring changes. You can observe metadata (job timing, status, resource usage) but not data content. This requires different operational patterns than traditional infrastructure where you can SSH in and inspect files.
Updates require coordination. When the enclave code changes, the attestation hash changes. If your key policy references a specific attestation hash, you need to update the policy before the new code can access your data. This is a feature (you control when to trust new code), but it’s also operational overhead.
Incident response is different. If the infrastructure outside the enclave is compromised, your data is still encrypted. You can tell your customers exactly what was and wasn’t exposed, backed by cryptographic proof rather than forensic analysis.
Decision 3: Which Confidential Computing Technology?
Not all confidential computing is equal. The three major approaches have different properties, and the right choice depends on your workload.
AWS Nitro Enclaves
A stripped-down VM carved from your EC2 instance. No network, no persistent storage, no interactive access. Communication via a single vsock channel. Attestation via the Nitro Hypervisor’s PCR measurements.
Strengths: Minimal attack surface. Full VM-level isolation. No dependency on CPU-specific features. Works with any code that runs in a Linux environment.
Limitations: No network inside the enclave (all data must be passed via vsock). Available only on AWS. Memory and CPU are carved from the parent instance, so you need to size the parent for the enclave’s needs plus your own.
Best for: Batch processing workloads with clear input/output boundaries. This maps directly to proteomics and genomics workflows: a DIA proteomics run produces a bounded set of raw instrument files (typically 1-5 GB each) that flow through a search pipeline and yield quantified protein or peptide results. The data arrives, gets processed, and leaves. No persistent enclave state is needed between steps. The same model applies to WGS variant calling pipelines, where FASTQ inputs are processed against a reference and results are written out before the enclave terminates.
Intel SGX
Application-level enclaves that protect specific code and data within a process. The CPU encrypts enclave memory, and only code running inside the enclave can access it.
Strengths: Granular protection at the application level. Smaller trusted computing base (TCB) than VM-level approaches.
Limitations: Limited enclave memory (historically capped at 128–256 MB enclave page cache; expanded in recent generations per Intel SGX documentation). Requires application code changes to use the SGX SDK. Side-channel attacks have been demonstrated against SGX enclave memory (Kocher et al., “Spectre Attacks,” IEEE S&P 2019; Van Bulck et al., “Foreshadow,” USENIX Security 2018). Performance overhead for enclave transitions.
Best for: Workloads where you need to protect specific secrets (keys, credentials) within a larger application. Less suited for processing large biotech datasets that don’t fit in enclave memory.
AMD SEV-SNP
Encrypts the entire VM’s memory using a key managed by the AMD Secure Processor. The hypervisor cannot read the VM’s memory.
Strengths: Transparent to application code. No code changes needed. Protects the entire VM memory space. Available on Azure (Confidential VMs) and GCP (Confidential Computing).
Limitations: The trust boundary is the entire VM, which has a larger attack surface than a minimal enclave. The VM has full network access, so data exfiltration through application-level bugs is still possible. Attestation model is less mature than Nitro’s.
Best for: Lift-and-shift scenarios where you want memory encryption without rewriting code. Workloads that need network access during processing.
Comparison
| Dimension | Intel SGX | AMD SEV-SNP | Nitro Enclaves |
|---|---|---|---|
| Isolation scope | Application partition | Full VM | Full VM (isolated from parent) |
| Memory limit | 128–512 MB enclave page cache | Entire VM | Allocated from parent (flexible) |
| Network access | Full | Full | None (vsock only) |
| Persistent storage | No | Full disk | None |
| Performance overhead | High for large datasets (memory paging) | 2–5% (per AMD SEV-SNP performance benchmarks) | Near-zero compute, vsock I/O overhead |
| Attack surface | Smallest (app-level) | Medium (full VM) | Small (no network, no disk) |
| Ease of migration | Difficult (requires app partitioning) | Easy (lift-and-shift VMs) | Medium (containerize, architect for vsock) |
Our Take
For biotech batch processing (which covers most genomics, proteomics, and clinical workloads), Nitro Enclaves’ restrictions are features. No network means no data exfiltration. No persistent storage means no data leakage between jobs. No interactive access means no operator snooping. The minimal attack surface is exactly what you want when the data is too sensitive for “trust me” security models.
If your workload requires interactive analysis (Jupyter notebooks, real-time dashboards, exploratory data science), Nitro Enclaves are the wrong tool. AMD SEV-SNP provides memory encryption without the network restriction, though with a larger trust boundary.
Decision 4: What Are Your Actual Threat Scenarios?
Security architecture without a threat model is just expensive plumbing. Before adopting confidential computing, be specific about what you’re protecting against.
Threats That Confidential Computing Addresses
Privileged insider access. Cloud provider employees, platform operators, or your own administrators with infrastructure access. Nitro Enclaves prevent memory inspection even with root access on the host.
Infrastructure compromise. An attacker gains access to your cloud account, object storage, or databases. With customer-owned keys and attestation, encrypted data remains encrypted. The attacker gets ciphertext they can’t decrypt.
Compelled disclosure. A court order or regulatory demand targeting the platform operator. If the operator doesn’t have the keys, they can’t comply with a decryption demand. They can produce ciphertext and metadata, but not plaintext.
Supply chain attacks. Compromised dependencies, malicious container images, or tampered build pipelines. Attestation detects code changes: if the enclave code is modified, the attestation hash changes, and the key management service refuses to release the key.
Threats That Confidential Computing Doesn’t Address
Application logic bugs and Iago attacks. Attestation proves that a specific, verified binary is running inside the enclave. It says nothing about whether that code is correct or resistant to manipulation. A class of attacks known as Iago attacks exploits this boundary directly: a compromised host manipulates the return values of system calls made by the enclave, causing the enclave application to behave against its own interests, without ever breaking hardware isolation. The enclave is running exactly the attested code. The attested code is being fed malicious inputs it doesn’t validate. For a genomics pipeline, this could mean corrupted input dimensions that produce silently wrong variant calls, with no cryptographic alarm raised.
Malicious or vulnerable code in the enclave itself. This is the logical complement to Iago attacks. Attestation proves what code is running, not whether that code is trustworthy. If vulnerable or malicious code is packaged into the enclave image, it will be attested and run with full access to decrypted data. The hardware did its job. The code didn’t. This shifts the security boundary in a fundamental way: confidential computing doesn’t eliminate the need to trust the enclave code. It makes that code the only thing you need to trust. Which means code provenance and verification become your new perimeter.
A rigorous verification pipeline before any enclave image is trusted should include formal code audit, thorough code review, static code analysis (SAST), Docker image security scanning, and independent PCR0 hash verification against the published enclave image.
Data at rest and in transit. Confidential computing protects data during processing. You still need encryption at rest and in transit. These are complementary, not alternatives.
Authorized misuse. A legitimate user with valid credentials who misuses their access. Confidential computing doesn’t solve authorization problems. If someone is authorized to submit jobs and receive results, they can process any data they have access to.
Side-channel attacks on hardware. Academic research has demonstrated side-channel attacks against some confidential computing technologies, notably Intel SGX (Spectre, Foreshadow, and subsequent variants). Nitro Enclaves have a smaller attack surface, but no hardware is immune to all possible physical attacks. The practical question is whether an attack requires physical hardware access (very difficult for cloud infrastructure) or can be mounted remotely.
Right-Sizing Your Security
Not every dataset needs zero-knowledge processing. The overhead (operational complexity, update coordination, debugging limitations) is justified when:
- Unauthorized access triggers legal and contractual exposure (consult your compliance team on applicable regulations and data processing obligations)
- The data has competitive value (proprietary sequences, drug candidates, unpublished research)
- Your customers require cryptographic proof of protection (pharma sponsors, clinical partners)
- You can’t afford a breach investigation that runs for months
For published reference genomes, synthetic test data, or de-identified population statistics, standard cloud security with proper access controls is sufficient and much simpler.
Decision 5: How Will You Handle the Operational Trade-offs?
Adopting zero-knowledge architecture means accepting specific constraints. Plan for them before you commit.
You Can’t Debug by Inspecting Data
When a job fails inside an enclave, you can see error codes, log messages, resource utilization, and timing information. You cannot inspect the input data, intermediate results, or output. This changes how you troubleshoot.
Practical approach: Build comprehensive logging into your pipeline code. Log data shapes (dimensions, row counts, file sizes) rather than data content. Create synthetic test datasets that exercise the same code paths as real data, so you can reproduce issues outside the enclave.
Updates Require Coordination
Changing the enclave code changes the attestation hash. Every customer whose key policy references the old hash needs to update their policy before the new code can process their data.
Practical approach: Plan a transition window where both old and new enclave versions run simultaneously. Notify customers in advance. Provide tooling that makes the policy update a single command, not a manual console operation.
Resource Sizing Is Inflexible
Enclave resources (CPU, memory) are allocated at launch and can’t grow. If your pipeline needs more memory than declared, the job fails.
Practical approach: Profile your workloads on representative data. In proteomics search pipelines (MSFragger, MaxQuant), peak memory during database search typically runs 2-3x the raw file size when searching against a full human proteome database: a 4 GB raw file will require 16-24 GB at peak, not 4 GB. WGS variant calling pipelines (BWA-MEM2, GATK HaplotypeCaller) against 30x coverage inputs commonly peak at 32-64 GB depending on the reference and ploidy. Size for your largest expected input with 25-50% headroom. Extreme outlier inputs may need a separate resource profile rather than sizing everything for the worst case.
Vendor Lock-in Is Real
Nitro Enclaves are AWS-specific. Intel SGX is Intel-specific. AMD SEV-SNP runs on AMD hardware. Your confidential computing choice ties you to a platform.
Practical approach: Keep your pipeline code portable. Use standard tools (Nextflow, Snakemake, standard bioinformatics packages). The enclave is the execution environment; the pipeline logic should run anywhere. If you need to switch platforms, the migration cost should be the enclave wrapper, not the analysis code.
Putting Your Confidential Computing Architecture Together
Key ownership is the one decision you cannot patch later. Technology choice, threat model scoping, and operational trade-offs can all evolve as your architecture matures. Trust boundaries can be redrawn. But if the wrong party holds the encryption keys from the start, no amount of hardware isolation recovers from it. Get that decision right, and the rest of the architecture follows.