Abstract
Biopharmaceuticals, or biologics, are a burgeoning sector in the pharmaceutical industry, predicted to reach $239.4 billion by 2025. This unparalleled growth is often attributed to the enhanced specificity offered by large molecules over small molecules. The large size of the constituent proteins necessitates the continuous implementation of big data predictive analytics to elucidate the most [...] Read more.
Biopharmaceuticals, or biologics, are a burgeoning sector in the pharmaceutical industry, predicted to reach $239.4 billion by 2025. This unparalleled growth is often attributed to the enhanced specificity offered by large molecules over small molecules. The large size of the constituent proteins necessitates the continuous implementation of big data predictive analytics to elucidate the most effective candidates in the lead optimization process. These same methodologies can be applied, and with the advent of machine learning and automated predictive analytics, this is becoming an increasingly facile task, to the augmentation and optimization of the downstream production processes that comprise the majority of the development cost of any biologic. In this work, big data from cell line generation, product and process design, and large-scale lead validation studies have been used to compare the applicability of simple statistical models against these black-box approaches for the rapid acceleration of enzymes to the pilot plant stage. This research can be expanded upon to exploit the big datasets generated as part of the progression of biologics through the development pipeline to further optimize production outcomes. Over the coming months, data from the project will be used to probe which approaches are amenable to which processes and, as a result, more amenable to various economic simulations. The computed optimization objective for the HIT must include the cost of acquiring, storing, and analyzing data to construct these predictive models, alongside the expected commercial reward of choosing an optimally ranked candidate. In this vein, perspective must be taken in the probable future price, capability outputs, and ownership issues of increasingly sophisticated data analysis software as superstructures become more frequent. It is frequently stated that decisions made to reduce production costs are data-driven, but that is not because more economically or energetically costly experiments or production methods are employed; to truly evaluate production steps, dynamic energy, and economic models need to become more commonplace. Conversion of process quality approaches from large questionnaires, risk analysis, and expert opinion-driven methods to statistical and thus more reliable approaches is an area of future research in analytics used herein.