Cybersecurity & Tech Intelligence

The Next Semiconductor

Liz Maida
Tuesday, June 16, 2026, 1:00 PM

Biomanufacturing data is a critical strategic asset, and the U.S. is failing to use it.

A bioreactor used to ferment ethanol from corncob waste being loaded with yeast. (USDA , https://tinyurl.com/uxbd555m; CC BY 2.0, https://creativecommons.org/licenses/by/2.0/deed.en).

A quality associate analyzes hundreds of environmental monitoring alerts from a facility monitoring system, reviewing humidity, differential pressure, and temperature data. Each unresolved alert requires investigation. Most are due to routine practices such as opening an incubator or a room door. But there is no systematic way to know which alerts occurred during live manufacturing and could impact the final product.

A single 2,000-liter bioreactor can generate over a million data points, tracking critical process parameters such as temperature, pH, and dissolved oxygen. When combined with data such as metabolite concentration, cell density, and product titer, biomanufacturers can predict and optimize yield and proactively identify manufacturing issues. If this data is aggregated across manufacturers, it becomes a strategic asset that reduces U.S. dependence on foreign manufacturers and accelerates production for biodefense. Instead of aggregating and analyzing this data, the U.S. is allowing it to lie stagnant in disparate systems. In contrast, China is well positioned to exploit this data.

Manufacturing Biology

The pharmaceutical industry is experiencing a fundamental shift from chemical compounds to biologics and cell and gene therapies. Scientists are using their knowledge of biology and chemistry to develop life-saving therapies and treatments that significantly improve patients’ quality of life.

The attempt to replicate biological processes has made manufacturing increasingly complex. Production depends on living cells, which have highly variable inputs as biological material. Anomalous or out-of-specification results lead to weeks of investigation that involve manually compiling data from multiple sources. For example, gene therapy manufacturers often rely on adeno-associated virus (AAV) vectors to deliver genetic payloads to cells.

In one investigation that we supported, four sequential AAV batches failed purity tests, indicating that the therapeutic dose might be diluted or trigger an immune response. The root cause investigation required reviewing data from the bioreactors, upstream cell culture records, downstream purification logs, chromatography results, and the material certificates of analysis (CoAs). A different system stored each data source, and most of the records and CoAs were on paper. Ultimately, an operator recalled that a component in the cell culture medium had been changed, but that information was only noted in a handwritten comment in the batch record. The learnings from this investigation exemplify the institutional knowledge that should be systematically captured and shared.

These real-world learnings about the optimal conditions for cell growth and how to quickly identify potential issues are invaluable. They inform late-stage process optimization such as clone selection, medium development, and ideal bioreactor conditions. They also establish the foundation for accelerating the development and creation of new modality platforms. A platform allows manufacturing to rapidly scale up because the ideal conditions and controls are already known.

Data is critical for the safe and cost-effective manufacturing of these therapies. If this information is so valuable, why is the data still so fragmented?

Fragmentation Is a Bug, Not a Feature

Three related obstacles limit the use of this data: manufacturing data resides in disparate systems; FDA software validation requirements complicate the adoption of new data systems; and companies are hesitant to share data, wary of their competitive advantage.

Uncoordinated systems store the manufacturing data, and the actual production data is often handwritten. While modern applications integrate directly with other software via application programming interfaces (APIs), legacy software is not designed to make its data available programmatically. Even if all of the data is digitized and accessible, each system has its own data model and schema. Data definitions are not aligned and may not have a direct mapping—one provider’s definition of a “room” might map to a subset of “location” in another provider. The same software may experience schema drift as new versions are released and installations are customized to a customer’s needs. Even with SAP, a manufacturer’s facilities may each run a different version of SAP with nonharmonized table and field names.

These small naming discrepancies may seem inconsequential, but they mean that data cannot be automatically combined. Cleaning and normalizing this data is a painstaking activity. It involves tasks such as reconciling European and U.S. date formats and handling the difference between zero (the measured value is “0”) and null (the data is not in the database). The data typically requires human expertise in order to use it. Institutional knowledge, such as equipment or personnel changes, is not tracked in any system. The number of different places where data resides, the complexity of cleaning it, and the need to thoughtfully curate the data in order to answer a question result in that data not often being used.

The regulatory agencies overseeing the pharmaceutical industry recognize the importance of encouraging adoption of innovative technologies. The Food and Drug Administration (FDA) originally released its computer systems validation (CSV) guidance in 2002. This guidance was established to ensure the proper and precise functionality of custom-built equipment, but it has been extended to software. The FDA has updated this guidance to focus on computer software assurance (CSA), reflecting its recommendation of risk-based strategies. But despite this update, uncertainty persists, especially regarding data platforms where the software and the data require different validation approaches.

The original validation framework was designed for custom-built applications, not the current software-as-a-service (SaaS) model. The FDA’s transition from CSV to CSA reflects an acknowledgment that the original framework was too prescriptive for modern software development. The agency’s stated motivation was to help manufacturers keep pace with the rapidly changing technology landscape. CSA aimed to reduce the burden by shifting to a risk-based approach, allowing companies to rely more heavily on vendor documentation.

Today, vendors are constantly updating their software and deploying new features. A pharmaceutical company cannot validate SaaS as quickly as the updates are released, and they should not be required to. Vendors continuously validate software as part of their development process rather than only after installation at a client site. Modern software development has established best practices and tools that automate source code version control, unit and functional testing, deployment, and documentation generation. However, most pharmaceutical companies continue to maintain validation teams or engage external experts due to the potential liability and unclear expectations around deliverables. While organizations want to modernize their data infrastructure, validation teams often struggle navigating between validating a software platform and the data it contains. The result is an industry that understands the inherent value of its data but lacks the regulatory clarity required to adopt new technologies.

Companies are hesitant to share data as their process know-how is considered a competitive advantage and intellectual property (IP). This ranges from knowing how many times to swirl a flask to the ideal medium for optimizing cell growth to selecting the appropriate critical quality attributes. While there have been initiatives to encourage the sharing of manufacturing data such as the Massachusetts Institute of Technology’s BioMAN and NIMBL, there are limited incentives to collaborate. For a contract development manufacturing organization (CDMO) or large manufacturer, the accumulated data from production runs and parameter adjustments is its institutional knowledge. From their perspective, sharing data risks revealing proprietary process details and undermining their competitive positioning.

These are valid concerns. There are benefits to strict IP protections, such as enabling companies to recoup their investments in research and development by ensuring they have control of the resulting IP. However, not all manufacturing data should be categorized as core IP, and in some cases the benefit to national security outweighs the value of maintaining proprietary data. For example, recommended process parameters for assessing cell growth are derived from proprietary data, but they are not as sensitive as the specific formulation of a proprietary medium designed for a particular molecule. The U.S. biopharmaceutical industry currently has a wealth of this manufacturing data, but the industry does not use it effectively. A state-directed competitor does not face these constraints.

Designed in the U.S., Made in China

China is well positioned to aggregate and exploit this data due to the state-directed coordination of its manufacturing sector. China has made biotechnology a strategic initiative for the past 20 years, and U.S. companies are heavily dependent on Chinese manufacturers. The National Security Commission on Emerging Biotechnology, a bipartisan, congressionally appointed body, released a report in April 2025 noting that Chinese state investment has enabled companies such as WuXi AppTec to dominate the pharmaceutical biomanufacturing industry. A 2024 survey by the Biotechnology Innovation Organization found that 79 percent of U.S. biotech companies have at least one contract with a Chinese-owned or China-based contract manufacturer.

While the original scientific discovery and process design might occur in the U.S., the data generated during manufacturing resides in China. Whoever controls the physical manufacturing process ultimately controls the data. CDMO contracts typically require data sharing, but in practice they usually involve sharing a subset of a customer’s own data as PDFs or manually compiled spreadsheets. The CDMO retains visibility across all of its customers’ data, and the value of that dataset is directly related to its market share. A small number of state-directed entities can accumulate cross-industry intelligence through their own operations without solving the coordination problem that limits U.S. companies from aggregating data across competitors.

The separation of design from manufacturing is similar to what occurred in the semiconductor industry. In the 1960s, companies such as Texas Instruments (TI) analyzed the impact of temperature, chemicals, and processes to standardize the quality of their integrated circuits. Morris Chang’s experience with integrated circuit manufacturing at TI was instrumental in establishing the Taiwan Semiconductor Manufacturing Company (TSMC) as the leader in semiconductor manufacturing. The process optimization knowledge helped TSMC manufacture higher quality, lower cost circuits and become a critical dependency in the semiconductor supply chain.

That same dynamic is already underway in pharmaceutical manufacturing. Since biological systems are unpredictable, this practical knowledge is arguably an even more significant asset than in semiconductors. Biomanufacturing data enables operational efficiencies, but the data also informs initial process development, enabling therapies to be designed for manufacturability at scale. If China aggregates this data, the U.S. cedes supply chain leverage, accelerated scale-up capabilities for both new therapies and biodefense, and the ability to manipulate U.S. manufacturing processes. AI models trained on manufacturing data recommend target process parameters, flag potential anomalies or deviations, and predict yields. If the underlying data is selectively altered, the impact on the model’s outputs is difficult to detect or attribute.

Transforming Biopharma Data Infrastructure

Two interventions are required.

First, the FDA should issue clarifying guidance distinguishing software platforms from data curation. There is a distinction between confirming that software works as intended and determining whether the data is accurate. Testing frameworks can validate that an application has correctly implemented the formula for calculating yield. Those tests, however, assume that the data accurately reflects the production measurements—and that the measurements were correct.

These tests also overlook the most important aspect of assessing the correctness of analysis—the decisions about what data to include. Initial data curation significantly influences the outputs. For example, the inclusion or exclusion of a failed production run dramatically impacts aggregation statistics. Software validation and assurance apply to the prebuilt application and should be the software vendor’s responsibility. Pharmaceutical companies should review the data curation, given they have the subject matter expertise required to assess their own data. Without this clarification, companies treat data platforms as software systems subject to full validation requirements. Even worse, they continue to rely on spreadsheets compiled by manually transcribing data from paper records.

Second, the U.S. should incentivize data sharing, establishing a federally coordinated consortium modeled after the cybersecurity Information Sharing and Analysis Centers. This consortium should define anonymization standards and boundaries between shared foundational knowledge and proprietary IP. There are established computer science principles for data anonymization and privacy-preserving computation that can protect proprietary information while enabling meaningful data sharing.

Consortiums often focus on establishing an ontology, but the coordination needed to define and enforce a shared ontology can create barriers to participation. Ontologies can be successful when there is a well-defined set of valuable data that is easily translatable into a common schema. In cybersecurity, these were indicators of compromise (IOCs), technical identifiers of malicious intent such as IP addresses, domains, and emails. Even these definitions were expanded to incorporate timeliness and sequencing relationships. In biomanufacturing, a recipe or specific sequence of process steps is more valuable than a standalone list of mediums. The consortium could rely on existing ontology efforts but also allow for non-ontology compliant submissions provided they include sufficient metadata and context. This shifts the data transformation work to the consortium, but some techniques could reduce the burden, such as identifying semantically similar data rather than relying solely on static schema matching. The result would be more diverse data and a lower barrier to entry for participants.

While the technical aspects must be addressed, the more complex challenge is convincing companies to contribute their IP to an initiative that might benefit their competitors. The system must be defined so that the risk of not participating outweighs the risk of contributing. Only those companies that contribute to the larger dataset should have access to the intelligence derived from the aggregated data. To encourage early participation, the government should seed the consortium with protected foundational datasets, providing immediate value before the consortium reaches scale. The shared learnings will have a compounding effect as the dataset grows, and the network effects will drive additional participation. Coordinated data sharing will improve manufacturing efficiency, but more importantly, it is a national security imperative. Companies need to weigh the risk of contributing data against the strategic risk of ceding this advantage to a state-directed competitor.

The separation of chip design from manufacturing transformed the semiconductor industry. The U.S. ultimately ceded production control to other countries as integrated circuit manufacturing became a commodity. Washington recognized the national security risk only after foreign dependence was deeply entrenched, then scrambled to address it with the CHIPS Act. The pharmaceutical industry is following a similar path. Manufacturing data is a critical asset that will guide the development of new therapies and enable a scalable biodefense. Unlike semiconductors, the U.S. has an opportunity to address the structural obstacles before the dependency is established.


Liz Maida is the co-founder and CEO of Fathom, which builds AI data infrastructure for biomanufacturing. She writes on how strategic state objectives are encoded in technical infrastructure, with a focus on the national security implications of AI and biotechnology. Her background spans cybersecurity, biomanufacturing, and early-stage technology companies. She holds graduate degrees from MIT, where her research focused on graph algorithms inspired by biological networks.
}

Subscribe to Lawfare