Why Biotech Can't Trust the Cloud (Confidential Computing)

By
in Security on
Post header image

Part 1 of 6 in the “Building Zero-Knowledge Biotech Infrastructure” series

Most biotech organizations can’t use cloud computing for their most valuable data. Not because the cloud isn’t technically capable (AWS could process 50TB of whole-genome sequencing data in days instead of months). Not because teams don’t want to (IT and bioinformatics groups know the operational benefits).

They can’t use it because legal and compliance teams won’t sign off and they’re right to refuse.

The standard cloud security model (encrypt data at rest, encrypt data in transit, trust the provider’s compliance certifications) doesn’t actually protect data during processing. When your variant calling pipeline runs, or your mass spec analysis executes, or your imaging segmentation processes, the data sits in plaintext memory on infrastructure the cloud provider controls. Their hypervisor can access it. Their operations staff can access it. Anyone who compromises those systems can access it.

For genomics, proteomics, clinical trials, and manufacturing data, that’s not acceptable. Patient genomes are permanent identifiers. Biomarker signatures represent years of R&D investment. Manufacturing processes are core IP. Clinical trial data carries regulatory liability.

This post explains why the current cloud trust model fails for biotech. In Part 2, we’ll show you how confidential computing changes the equation by making data processing cryptographically verifiable and hardware-isolated.

Why Biotech Data Is Different

Biotech data isn’t like other enterprise data. The sensitivity spans every “-omics” domain and imaging modality, but for different reasons. What unites them: you can’t afford a breach, you can’t undo exposure, and standard cloud security doesn’t address the actual threats.

Genomics: The Permanent Identifier

Genomic data is the ultimate personally identifiable information. Unlike passwords or credit cards, a genome can’t be changed or reissued after a breach.

A leaked genome identifies you and your relatives, reveals disease predispositions, and remains compromised permanently. There’s no remediation path. The data exposure is irreversible.

This isn’t theoretical. In October 2023, 23andMe disclosed that hackers accessed genetic data from 6.9 million users. The attackers used credential stuffing to breach individual accounts, then exploited the “DNA Relatives” feature to access data from millions of connected profiles. The stolen data (including ancestry information, birth years, and genetic heritage) appeared on hacking forums within weeks.

Proteomics: The Billion-Dollar Biomarker

Mass spectrometry data from proteomics studies often represents years of biomarker discovery work. A leaked proteomic signature for early cancer detection could:

  • Give competitors a multi-year head start
  • Invalidate pending patents worth hundreds of millions
  • Compromise ongoing clinical validation studies

Unlike genomics, proteomics data changes over time, which makes longitudinal studies even more valuable and their protection more critical.

Imaging: Regulatory and Technical Challenges

Pathology slides, radiology images, and microscopy data present unique challenges:

  • Patient re-identification: AI can identify individuals from medical images even without metadata
  • Diagnostic liability: Leaked images could be used to challenge clinical decisions
  • GxP compliance: Images used in drug submissions must maintain chain of custody

A single high-resolution pathology slide from a clinical trial can be 2-5 GB and contain enough information to identify the patient, the disease state, and potentially compromise the trial’s blinding.

Cell and Gene Therapy: Manufacturing Secrets

Cell therapy combines multiple sensitivity concerns:

  • Patient-derived materials: Autologous therapies use the patient’s own cells
  • Manufacturing processes: Differentiation protocols and expansion conditions are core IP
  • Batch records: Single-cell sequencing of manufacturing intermediates reveals process performance

When your product literally contains patient cells, the line between patient data and manufacturing data disappears entirely.

Clinical Trials: The Regulatory Gauntlet

For research institutions and pharmaceutical companies, clinical data carries the highest stakes:

  • 21 CFR Part 11: FDA requires audit trails and access controls that most cloud deployments can’t prove
  • HIPAA Security Rule: Breach penalties up to $1.5 million per violation category
  • ICH E6 (GCP): Clinical trial data must be attributable, legible, contemporaneous, original, and accurate, with documented evidence
  • EU Clinical Trials Regulation: Requires specific technical measures for data protection

The Regulatory Landscape Is Getting More Complex

  • The EU AI Act includes provisions for high-risk AI systems in healthcare
  • China’s biosecurity laws restrict the export of human genetic data
  • State-level genetic privacy laws are multiplying faster than compliance teams can track
  • GDPR Article 9 treats genetic data as a “special category” requiring explicit consent and additional safeguards
  • Brazil’s LGPD and other emerging privacy regimes add more jurisdictional complexity

The Bottom Line: Every Domain Has Exposure

Domain Data Type Primary Risk Regulatory Pressure
Genomics WGS, WES, RNA-seq Patient re-identification, permanent PII HIPAA, GDPR, state genetic laws
Proteomics Mass spec raw files, peptide IDs IP theft, biomarker leakage Trade secrets, patent risk
Metabolomics LC-MS/MS profiles Diagnostic IP, patient health status HIPAA, IVD regulations
Imaging Pathology, radiology, microscopy Patient ID from images, diagnostic liability HIPAA, GxP for submissions
Cell Therapy Single-cell seq, batch records Manufacturing IP, patient-derived data 21 CFR 1271, HIPAA
Clinical Trials EDC data, adverse events Trial integrity, patient safety 21 CFR 11, ICH E6, GDPR
Drug Discovery Compound libraries, screening data Competitive intelligence loss Trade secrets
Biologics Sequence data, expression systems Manufacturing process IP 21 CFR 600s, biosimilar competition

Current Cloud Security Solutions for Biotech (And Why They Fail)

Organizations facing these challenges typically choose from a menu of imperfect options:

Option 1: On-Premises Everything

The traditional approach: build your own data center, hire your own system administrators, maintain your own hardware refresh cycles.

This can work, but the economics are brutal:

  • A single high-memory node for genome assembly costs $50,000+ and sits idle most of the time between analysis runs
  • Recruiting bioinformatics talent is hard; finding people who can also manage HPC infrastructure is harder
  • Hardware procurement cycles mean you’re perpetually behind on the latest capabilities
  • Disaster recovery requires duplicating your entire infrastructure

For well-funded pharmaceutical companies, this works. For academic labs, startups, and smaller biotechs, the capital requirements are prohibitive.

Option 2: VPNs and Private Cloud

Connect your on-premises network to a cloud VPC via VPN. Run your workloads on dedicated instances. Treat the cloud as an extension of your data center.

This doesn’t solve the trust problem. You’re still trusting the cloud provider completely:

  • A compromised hypervisor or malicious insider can access memory on your “private” instances
  • Your encryption keys are managed by the same provider who manages your compute. They can access both
  • “Private” subnets still run on shared physical infrastructure
  • You have no visibility into who actually accessed your data

The Business Associate Agreements (BAAs) that cloud providers offer for HIPAA workloads are legal protection, not technical protection. They define who pays the fine when something goes wrong. They don’t prevent things from going wrong.

Option 3: Encrypt Everything

The most common approach: encrypt data at rest in S3, encrypt data in transit with TLS, and call it a day.

This protects against a narrow set of threats: someone stealing hard drives from the data center, or intercepting network traffic. It does nothing for the actual processing phase.

When your genomics pipeline runs, the data must be decrypted. It exists in plaintext in memory on a virtual machine running on shared infrastructure. Any of the following can access it:

  • Cloud provider employees with privileged access
  • Attackers who compromise the hypervisor
  • Nation-state actors with lawful intercept capabilities
  • Anyone who can execute a cold boot attack or memory dump

Encryption at rest and in transit protects data except when you’re actually using it. That exception is where all the interesting attacks happen.

Encryption Comparison: Where Your Data Is Vulnerable

Security Approach Data at Rest Data in Transit Data in Use (Processing) Protection Level
Standard Cloud ✅ Encrypted (S3/EBS) ✅ Encrypted (TLS) ❌ Plaintext in memory LOW (Vulnerable during processing)
Private VPC ✅ Encrypted ✅ Encrypted ❌ Plaintext in memory LOW (Provider still has access)
Customer-Managed Keys ✅ Encrypted ✅ Encrypted ❌ Plaintext in memory MEDIUM (Key and compute separated, but still vulnerable)
Confidential Computing (TEE) ✅ Encrypted ✅ Encrypted ✅ Encrypted in enclave HIGH (Protected at all stages)

Option 4: Compliance Certifications

“But AWS is SOC 2 Type II certified! They’re HIPAA eligible! They have FedRAMP authorization!”

True. These certifications matter. They demonstrate that the cloud provider has security controls, follows processes, and submits to audits.

What they can’t do:

  • Prevent a malicious insider from accessing customer data
  • Stop a sophisticated attacker who compromises privileged credentials
  • Give you visibility into who actually accessed your data

Certifications are table stakes for running regulated workloads. They’re not a guarantee of security.

Why Cloud Security Trust Models Don’t Work for Biotech

Cloud marketing materials talk about security features. They don’t mention the trust model underneath.

When you use cloud computing, you’re trusting the provider’s security team configured things correctly. You’re trusting their employees won’t abuse privileged access. You’re trusting contractors were vetted appropriately. You’re trusting incident response will detect and disclose breaches. You’re trusting key management infrastructure won’t be compromised. You’re trusting legal teams won’t comply with overbroad government requests. You’re trusting executives won’t change policies later.

That’s enormous trust to place in an organization you’ve never met, whose financial incentives don’t always align with your security requirements.

Who can theoretically access your “encrypted” data on a major cloud provider? Cloud operations engineers with hypervisor access. Security team members investigating incidents on shared infrastructure. On-call personnel responding to alerts. Key management administrators who operate the HSMs backing your “customer-managed” keys. Government agencies with legal process (valid or otherwise). Auditors reviewing security controls. Third-party contractors on specialized projects. Anyone who compromises credentials for any of the above.

This isn’t about cloud providers being bad actors. They employ talented security professionals and invest billions in infrastructure protection. The problem is architectural: centralized computing requires centralized trust, and that centralized trust becomes a single point of failure. No amount of perimeter security can fully address it.

The Trust Hierarchy in Traditional Cloud Computing

┌─────────────────────────────────────────────────────────┐
│                  YOUR SENSITIVE DATA                    │
│         (genomics, proteomics, clinical trials)         │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
         ┌───────────────────────┐
         │   Hypervisor Layer    │ ◄─── Cloud ops engineers
         │  (Full memory access) │ ◄─── Security team
         └───────────┬───────────┘ ◄─── On-call staff
                     │             ◄─── Contractors
                     ▼             ◄─── Government orders
         ┌───────────────────────┐
         │  Key Management (HSM) │ ◄─── Key admins
         │ (Decryption possible) │ ◄─── Auditors
         └───────────────────────┘

              SINGLE POINT OF FAILURE
         Every layer can access your data

Traditional cloud architecture creates a trust hierarchy where your sensitive biotech data sits at the top, but every layer below can access it. The hypervisor layer has full memory access during processing. Cloud operations engineers, security teams, on-call staff, and contractors all have potential access paths. Key management systems that handle your encryption keys are operated by the same administrators. Government orders can compel access at any layer.

This creates a single point of failure: the cloud provider’s access controls and employee vetting. Your data security depends entirely on trusting their processes.

With Confidential Computing (TEE):

┌─────────────────────────────────────────────────────────┐
│         ENCRYPTED ENCLAVE (Hardware-Isolated)           │
│         Your data + code in protected memory            │
│    ✓ Hypervisor CANNOT access                           │
│    ✓ Cloud provider CANNOT access                       │
│    ✓ Cryptographic attestation proves integrity         │
└─────────────────────────────────────────────────────────┘

              ZERO-TRUST ARCHITECTURE
         Hardware enforces isolation, not policies

Confidential computing eliminates the trust hierarchy. Your data and processing code run inside a hardware-isolated encrypted enclave. The hypervisor cannot access it. The cloud provider cannot access it. Cryptographic attestation proves the integrity of the execution environment. The hardware enforces isolation, not policies or promises.

The Cloud Provider Insider Threat in Biotech

In enterprise security discussions, “insider threat” usually means employees at your organization going rogue. But for cloud computing, the more relevant insider threat is employees at the cloud provider.

According to the 2024 Verizon Data Breach Investigations Report, insider threats and privilege misuse account for a significant portion of breaches. These aren’t typically malicious. They’re usually mistakes, like misconfigured permissions or misdirected emails. But they happen.

Now multiply that by the number of people with privileged access to cloud infrastructure. Major cloud providers employ thousands of operations staff with varying levels of access. Each one represents potential risk.

The Edward Snowden disclosures revealed that intelligence agencies specifically targeted cloud provider infrastructure for mass surveillance. The PRISM program collected data directly from the servers of major tech companies. Whether through legal compulsion or technical compromise, government access to cloud data is a documented reality.

For biotech specifically, this matters because:

  • Clinical trial data could be valuable for investment decisions or competitor intelligence
  • Drug discovery research (including compound libraries, screening data, and hit-to-lead optimization) represents huge R&D investments worth stealing
  • Biomarker signatures from proteomics or metabolomics studies could invalidate years of discovery work
  • Manufacturing processes for biologics and cell therapies are core IP that competitors would pay millions to access
  • Patient data can be used for blackmail, identity theft, or discrimination
  • Genetic data has implications for individuals and their relatives for generations
  • Proprietary algorithms for image analysis or variant interpretation represent significant competitive advantages

The “we’re not important enough to target” argument misses the point. Breaches don’t discriminate. You don’t need to be the intended target to have your data exposed when infrastructure is compromised.

What Secure Cloud Computing for Biotech Actually Requires

Given these challenges, what would a truly secure cloud model look like for biotech?

First: process data without anyone seeing it. Not “encrypted at rest and in transit” but encrypted during processing too. The compute infrastructure should be mathematically incapable of accessing plaintext data, even if every employee at the provider wanted to. This applies whether you’re running variant calling on whole-genome sequencing, peptide identification on mass spec raw files, image segmentation on pathology slides, or statistical analysis on clinical trial endpoints.

Second: prove mathematically that no one saw it. Trust but verify isn’t enough. You need cryptographic proof that the code processing your data is exactly the code you approved, running in an environment that prevents data exfiltration. Not a compliance certification. Not a contractual promise. Mathematical certainty. For regulated environments, this proof becomes audit evidence. For IP-sensitive work, it becomes competitive protection.

Third: scale like cloud, secure like on-prem.

The point of cloud computing is elasticity: spin up 1,000 cores for a day, pay for what you use, then spin them down. Security measures shouldn’t require giving that up. If your “secure” solution requires dedicated hardware with long procurement cycles, you’ve just reinvented on-premises computing with extra steps and higher costs. Proteomics core facilities need burst capacity for peak instrument output. Cell therapy manufacturers need to scale with production batches. Clinical trials need to process data from multiple sites simultaneously. Modern DevOps practices enable this elasticity, but only if the security model supports it.

Fourth: maintain regulatory compliance. Whatever solution exists must satisfy HIPAA, GDPR, 21 CFR Part 11, SOC 2, GxP, ICH E6. Not by adding more legal agreements, but by providing technical controls that exceed regulatory requirements. The goal isn’t just passing an audit. It’s having technical measures so strong that the audit becomes a formality.

Fifth: work with existing tools.

Scientists have workflows they know and trust:

  • Genomics: Nextflow/Snakemake pipelines, BWA, GATK, STAR
  • Proteomics: MaxQuant, Proteome Discoverer, custom R/Python scripts
  • Imaging: CellProfiler, QuPath, deep learning models
  • Statistics: R, SAS, specialized clinical trial analysis packages

A secure platform that requires rewriting everything in a custom framework is dead on arrival. The tools must come to the data, not the other way around.

Confidential Computing: The Technology That Changes Biotech Cloud Security

These requirements sound impossible. For most of computing history, they were. But a category of technology called confidential computing has emerged in the past few years that makes them achievable.

The basic idea: what if the hardware itself enforced security boundaries? What if there were a region of memory that even the operating system (even the hypervisor) couldn’t access? What if code running in that region could prove cryptographically that it hadn’t been tampered with?

Technologies like Intel SGX, AMD SEV, and AWS Nitro Enclaves provide exactly this. They’re called Trusted Execution Environments (TEEs), and they fundamentally change what’s possible for secure cloud computing.

In our next post, we’ll dive deep into how AWS Nitro Enclaves work, why we chose them over alternatives, and how we built a biotech data processing platform that uses cryptographic attestation to prove your data stays private (even from us).