Data Protection Governance for Biotechnology and Genomics Under India’s DPDP Regime

Introduction: Why Genomic Data Demands a Higher Legal Standard
India is rapidly positioning itself as a global centre for biotechnology, genomics, precision medicine and bioinformatics. Advances in next-generation sequencing (NGS), population genomics, pharmacogenomics, AI-driven diagnostics and personalised therapeutics are reshaping healthcare, drug discovery and disease prevention.
Unlike conventional clinical data, genetic and genomic data is permanent, predictive and relational. A single genomic sequence can reveal not only present medical conditions, but future disease risk, ancestry, behavioural traits and information about biological relatives many of whom have not consented to any data processing meaning stronger vigilance through DPDP compliance.
The enactment of the Digital Personal Data Protection Act, 2023 (“DPDP Act”), together with the Digital Personal Data Protection Rules, 2025 (“DPDP Rules”), marks a decisive shift in how Indian law treats such data. While the statute does not create a separate category for “genetic data”, its risk-based architecture makes it clear that certain forms of data processing particularly those involving irreversible harm shall attract heightened scrutiny.
For biotech companies, genomics platforms, diagnostics providers, population-scale research initiatives and AI health startups, the DPDP framework presents a profound governance challenge:
- How to enable data-driven innovation without violating consent, purpose limitation and proportionality; and
- How to manage long-term, cross-border and secondary uses of genetic data within a consent-centric legal regime.
This article examines how India’s data protection law applies to biotechnology and genomics, focusing on:
- The legal character of genetic and genomic data
- Consent, voluntariness and informational asymmetry
- Anonymisation myths in genomics
- Secondary use, AI training and function creep
- Cross-border data flows and global research
- Enforcement risk, penalties and mitigation strategies
Table of Contents
Applicability of the DPDP Act to Biotechnology and Genomics
A. Entities Within Scope
The DPDP Act applies to any entity processing digital personal data, including:
- Genomics and sequencing companies
- Precision medicine and pharmacogenomics platforms
- Diagnostic laboratories and pathology chains
- Population genomics initiatives
- AI-based bioinformatics and health analytics firms
- Direct-to-consumer (DTC) genetic testing services
- Academic and institutional research labs
- Biobanks and biological sample repositories (where data is digitised)
Both Indian and foreign entities processing genomic data of individuals in India whether for diagnostics, research or commercial purposes, fall squarely within the scope of the Act.
B. Data Fiduciaries, Joint Control and Ecosystem Complexity
In genomics, determining who the data fiduciary is, is often non-trivial. Depending on the model:
- A genomics platform may be the fiduciary if it determines testing purpose and analysis methods.
- A hospital or research institution may act as fiduciary in investigator-led studies.
- Multiple entities may jointly determine purposes, creating shared fiduciary exposure.
Sequencing vendors, cloud providers, AI analytics firms and CRO-like service providers typically act as data processors, but fiduciary liability remains primary.
Large genomics initiatives especially those involving population-scale data or AI-driven inference may be designated as Significant Data Fiduciaries (SDFs) due to:
- Volume and sensitivity of data
- Risk of discrimination or long-term harm
- Use of new and opaque technologies
Genetic and Genomic Data as Personal Data: Beyond Formal Definitions
A. Why Genetic Data Is Uniquely Risky
Genetic data differs from most personal data in three critical ways:
- Permanence: Unlike passwords or identifiers, genetic information cannot be changed once compromised.
- Predictive Power: It reveals future health risks and traits, not merely past or present facts.
- Relational Impact: One individual’s genome inherently discloses information about relatives who may never have consented.
These attributes mean that misuse or breach of genetic data can cause irreversible harm, a factor likely to influence enforcement severity under the DPDP Act.
B. Absence of “Sensitive Data” Category Is Not a Safe Harbour
While the DPDP Act does not expressly define “sensitive personal data”, this does not imply equal treatment across data types. The Act’s penalty and enforcement framework is harm-based, allowing authorities to consider:
- Nature of the data
- Likelihood of discrimination or stigma
- Long-term consequences
Genetic data will almost certainly be treated as high-risk personal data in practice.
Consent in Genomics: Legal Fiction or Meaningful Choice?
A. Consent Under the DPDP Act
Under the Digital Personal Data Protection (DPDP) Act, consent must be free, informed, specific, unambiguous, and capable of being withdrawn. In the context of genomics, however, each of these elements is legally fragile, given the complexity of genetic data, its long-term and evolving uses, and the practical difficulties individuals face in fully understanding, limiting, or withdrawing consent once such data has been shared.
B. Informational Asymmetry and Comprehension
Genomic testing involves complex scientific concepts like polygenic risk scores, variant interpretation, probabilistic outcomes. For most individuals full comprehension is unrealistic and Future uses of data are unknowable at the time of consent. Consent obtained through dense technical disclosures may be challenged as not truly informed.
C. Voluntariness and Power Imbalance
Consent may be legally vulnerable where:
- Testing is tied to access to treatment
- Employment or insurance benefits are indirectly affected
- Participation is framed as socially desirable or altruistic
The DPDP framework is likely to scrutinise such contexts closely.
Withdrawal of Consent: A Practical Impossibility?
The DPDP Act recognises the right to withdraw consent. In genomics, however:
- Data may already have been analysed
- Models may have been trained on aggregated datasets
- Results may have been shared across borders
While withdrawal may not require erasure of lawfully completed processing, failure to design realistic withdrawal frameworks exposes entities to compliance risk.
The Anonymisation Myth in Genomic Data
A. Why True Anonymisation Is Rare
Genomic data is inherently identifiable in nature, as even partial genetic sequences can be sufficient to re-identify individuals. The risk of re-identification is further amplified when such data is cross-linked with publicly available datasets, while ongoing advances in computing and data analytics continue to erode the effectiveness of traditional anonymisation techniques. Consequently, claims that genomic data can be “fully anonymised” are increasingly untenable.
B. Pseudonymisation Is Not Exemption
Coding, tokenisation or removal of direct identifiers does not take data outside the DPDP Act if re-identification is reasonably possible. Many genomics workflows depend on re-linkability, making full anonymisation incompatible with scientific objectives.
Secondary Use, Function Creep and AI Training
A. From Diagnostics to Data Assets
Genomic data collected initially for diagnostic testing is frequently repurposed as a valuable data asset for research and discovery, AI model training, drug development partnerships, and commercial analytics. Under the DPDP Act, such secondary use requires clear legal justification and, in many cases, fresh and explicit consent.
B. AI and Machine Learning Risks
AI models trained on genomic data raise novel questions:
- Does model training constitute new processing?
- Can consent cover unknown future models?
- How is accountability allocated where outputs affect clinical decisions?
Assumptions that AI training automatically qualifies as “research” are legally unsafe.
Cross-Border Data Transfers and Global Genomics
A. Structural Dependence on Global Infrastructure
Genomics workflows are structurally dependent on global infrastructure, often relying on overseas sequencing platforms, international research consortia, and global cloud services. Under the DPDP Act, cross-border transfers of such data are permitted only to jurisdictions specifically notified by the government.
B. Strategic Compliance Challenges
Genomics companies must now:
- Map all international data flows
- Monitor evolving notifications
- Design modular data architectures
- Anticipate localisation or access-restriction requirements
Failure to do so may disrupt ongoing research collaborations.
Family, Group and Community Harm
Genomic data implicates not only individuals but also families, ethnic or community groups, and indigenous or vulnerable populations. Misuse of such data can result in genetic discrimination, stigmatisation, and group-based profiling, thereby amplifying regulatory concern and increasing enforcement exposure.
Data Breaches: Irreversible Consequences
A. Mandatory Breach Notification
Under the DPDP Act and its Rules, breaches involving genomic data must be notified to the Data Protection Board of India as well as to affected individuals. Given the heightened sensitivity of genetic data, the thresholds for triggering such notification are likely to be low.
B. Long-Term Impact
Unlike credit card breaches, genomic breaches cannot be remedied by replacement. Loss of genetic data permanently compromises privacy, increasing the likelihood of:
- High penalties
- Litigation
- Loss of public trust
Penalties and Enforcement Exposure
The DPDP Act permits penalties of up to INR 250 crore per contravention, taking into account factors such as the nature of the data, the scale of processing, the irreversibility of harm, and the mitigation measures adopted. Given the severity and permanence of potential harm, genomics companies face disproportionate regulatory exposure even where the volume of data processed may be limited.
Compliance Roadmap for Biotech and Genomics Companies
1. Data Classification and Risk Mapping: Identify all genomic, phenotypic and derived datasets.
2. Consent Architecture Redesign: Move beyond generic consents to layered, purpose-specific frameworks.
3. Secondary Use Governance: Implement approval mechanisms for research, AI and commercial reuse.
4. Cross-Border Strategy: Design jurisdiction-aware data storage and access controls.
5. Governance and Ethics Integration: Align privacy compliance with ethics committees and scientific review boards.
Conclusion: Governing the Genome Responsibly
The DPDP Act and Rules mark a turning point for biotechnology and genomics in India. They do not prohibit innovation, but they reject the assumption that scientific progress justifies unlimited data exploitation.
For genomics companies, compliance is not merely a legal obligation but a precondition for legitimacy, public trust and sustainable innovation. Those who embed privacy-by-design, confront anonymisation myths and govern secondary use transparently will be best positioned to lead India’s precision medicine future.
By entering the email address you agree to our Privacy Policy.