2018 FALL - Data-readiness in a World of AI


The 2018 fall PRISME Forum Tech Meeting will be held Thursday, the 15th of November, 2018 and will be hosted by Takeda at 1 Takeda Pkwy, Deerfield, IL 60015.


PRISME Forum Chair: Olivier Gien

Fall Business Meeting of the PRISME Forum 2018

Tuesday and Wednesday, the 13th and 14th of November, 2018

Meeting hosted by Takeda Pharmaceuticals

Data-readiness in a World of AI

One of the key points of discussion at the last two PRISME Forum Technical Meetings on the topic of AI was that the limitations for AI/ML was not computing power, nor indeed algorithms, rather it was the availability of high-quality and fit-for-purpose structured data sets labeled both with appropriate metadata and endpoints. The scarcity of data for training machine learning is a fundamental feature of AI in the Life Science industry. Living systems are complex and noisy and as such requires a significant amount of data to model them accurately. While substantial amounts of in vitro experimental data exist, in vivo data is much more difficult to collect and, in the case of human data, use is limited by informed consent, privacy regulations and ethical considerations.

The idea that ‘data is more important than algorithms’, has been gaining support since 2001 when Banko et al. published their paper “Scaling to Very Very Large Corpora for Natural Language Disambiguation”i which demonstrated that several very different Machine Learning Algorithms performed almost identically well on the complex problem of natural language disambiguation once they were given enough data.

The idea was, more recently, taken up by an article entitled “The Unreasonable Effectiveness of Data”ii by Peter Norvig et. al. in 2009 which showed (Figure 1) that it can be relatively easy to reach around 50% accuracy using a variety of algorithms but to improve further, the need for data grows logarithmically. For AI to be effective a sufficient amount of high-quality data needs to be readily available.

The biopharmaceutical and healthcare industry in its entirety has a great deal of data. However, this data is rarely in a form amenable to use to train AI/ML methods without substantial data cleanup and labeling with meta-data and endpoints.

Additionally, this data is generally widely dispersed both within individual companies and between companies. This causes problems with gaining access to the data and, with the diversity of data formats, reading and understanding the data. Individual biopharmaceutical companies selfevidently have less data on which to train AI/ML systems to produce robust and generalizable results. If there were cross-company collaboration to merge data sets then much larger, more diverse and more effective training data sets could be made available. Despite this, the industry is cautious about sharing its data; not least because companies fear they will compromise or lose their IP. Other alternatives to address the issue include methods that mitigate data shortage and overfitting such as transfer learning, multi-task learning and the generation of synthetic data.

This PRISME Forum Technical Meeting will set out to explore opportunities for the biopharmaceutical industry to improve timely access to sufficient, high-quality data, on which AI systems can be trained (both within and beyond individual companies) and to best use the available data in the age of AI. A focus will be on practical examples that have been implemented at pharmaceutical companies along with efforts that have been attempted, but failed, and associated lessons learned.

Topics that will be addressed include:

  • The implementation and use of the FAIR data principles (Findable, Accessible, Interoperable, Reusable)iii in industry
  • Current tools and methods for meta data capture, end-state labeling and automated data preparation both at the point of creation and the time of use
  • Practical storage, management and access to data from every stage of the R&D process and examples of data re-use & models constructed with data federated across multiple domains.
  • Examples of the use of methods such as transfer learning to reduce the amount of directly relevant data required to build models for specific tasks.
  • Methods that would allow companies to share their data, including the use of “guestalgorithms” that can train on data sets without exposing the IP
  • Identification of the most tractable domains within biopharma – both for internal development and where cross-industry data sets for AI training could be created


The PRISME Forum Technical Meeting Advisory Committee (see table) is seeking contributions (e.g. plenary presentations, start-up company ‘pitches’, poster presentations, etc.) from any person or company with an informed and experienced contribution to make in is area.

  • Christian Baber (Chair), Head of R&D IT, Shire
  • Nick Brown, Head of Technology Incubation Lab, AstraZeneca
  • Dan Chapman, Head of IT New Med. Information Management, UCB
  • David Christie, Vice President, Enterprise Applications Group, CSL Behring
  • Lars Greiffenberg, Director – R&D IT and Translational Informatics, Abbvie    
  • Carol Rohl, Executive Director, Scientific Information Management, Merck
  • Martin Romacker, Principal Scientist – Data and Information Architecture, Roche
  • Nico Stanculescu, Logistics, PRISME Forum 
  • Jianchao (JC) Yao, Associate Principal Scientist, Merck
  • TBD, Takeda

i https://www.microsoft.com/enus/research/wpcontent/uploads/2016/02/acl2001.pdf 

ii https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf 

iii https://www.nature.com/articles/sdata201618 


The hotel for this meeting is the Hyatt Regency Deerfield, located at 1750 Lake Cook Rd, Deerfield, IL 60015.  The discounted room rate is $169 per night plus tax.  

Rates are valid ONLY through October 13, 2018.  

Reservations can be made online at https://book.passkey.com/go/PRISMEForum

When reserving a room, please remember to use “PRISME” for the above rate and appropriate allocation to our room block.  


O’Hare International Airport is a 20 minute ride (14 .4 mi/24 km) from the meeting venue or conference hotel.


Uber and Lyft remain reliable sources for the transfer between O’Hare and the meeting venue/hotel.

Additional car services will be posted soon!


Morning and afternoon transfers will be offered between the hotel, the meeting venue and the social/networking events (per program outline).