PSI Webinar: Avoiding Pitfalls in Supervised/Unsupervised Learning

Joint PSI/EFSPI Visualisation SIG 'Wonderful Wednesday' Webinars

Our monthly webinar explores examples of innovative data visualisations relevant to our day to day work. Each month a new dataset is provided from a clinical trial or other relevant example, and participants are invited to submit a graphic that communicates interesting and relevant characteristics of the data.

PSI Book Club - The Art of Explanation: How to Communicate with Clarity and Confidence

Develop your non-technical skills by reading The Art of Explanation by Ros Atkins and joining the Sept-Dec 2025 book club. You will be invited to join facilitated discussions of the concepts and ideas and apply skills from the book in-between sessions.

PSI Journal Club: Randomization

Date: Tuesday 9th September 2025 Join us to hear Sofia Weigle and Weng-Kee Wong present their recent work.

PSI Training Course: Pediatric Extrapolation

This course is aimed at biostatisticians with no or some pediatric drug development experience who are interested to further their understanding. We will give you an introduction to the pediatric drug development landscape. This will include identifying the key regulations and processes governing pediatric development, a discussion on the needs and challenges when conducting pediatric research and a focus on the ways to overcome these challenges from a statistical perspective.

PSI Career Young Virtual Event (Q3 2025)

This networking event is aimed at statisticians that are new to the pharmaceutical industry who wish to meet colleagues from different companies and backgrounds.

Pre-Clinical SIG Webinar: AI agents for drug discovery and development

AI agents are large language models equipped with tools that can autonomously tackle challenging tasks. This talk will explore how generative AI agents can enable biomedical discovery.

EFSPI/PSI Causal Inference SIG Webinar: Instrumental Variable Methods

The webinar is targeted at statisticians working in the pharmaceutical industry, and the objective is to 1) provide a basic understanding of IV methodology including how it relates to causal inference, and 2) present two inspirational pharma-relevant applications.

PSI Pre-Clinical Workshop - October 2025

The Pre-Clinical Special Interest Group (SIG) Workshop 2025 will take place over two half-days on 7 - 8 October in Verona, Italy, bringing together experts from industry, academia, and regulatory institutions to discuss key challenges and innovations in pre-clinical research.

PSI Training Course: Introduction to Machine Learning

Four sessions will include ML foundation (including an introduction, data exploration for ML and dimensionality reduction and feature selection), Supervised learning (including support vector machines and model evaluation and interpretation), model optimization and unsupervised learning (including clustering) and advanced topics (including neural networks, deep learning and large language models).

PSI/EFSPI Vaccines Statistics Workshop

The program will feature insightful sessions led by distinguished invited speakers, alongside a poster session showcasing the latest advancements in the field. Further details will be provided.

PSI Careers - MEDMathS: Medicine Empowered by Data, Maths and Statistics

Date: Wednesday 5th November 2025 A careers talk about medical statistics and how it plays a crucial role in developing new medicines

PSI Medical Statistics Student Careers Event 2025

Date: 19 November 2025 This event is aimed at students with an interest in the field of Medical Statistics, for example within pharmaceuticals, healthcare and/or medical research.

PSI Career Young Virtual Event (Q4 2025)

This networking event is aimed at statisticians that are new to the pharmaceutical industry who wish to meet colleagues from different companies and backgrounds.

Alexander Schacht, Lilly	Not all patients are created equal, but are there subgroups that are more homogenous? Abstract: Can I divide my overall patient population into meaningful segments? Do patients follow different patterns over time? We should ask these questions more often and techniques of unsupervised learning, where the classification of a patient into a group is unknown, answers these questions. We differentiate these approaches from supervised learning techniques in which classification of the patients is known. Typical questions for supervised learnings algorithms include: Can I predict patients outcomes given his/her baseline characteristics? Cluster analysis represents a class of approaches in unsupervised learning. It helps to answer the above questions. Cluster analysis stands on the determination of metrics, which measure the distances between patients in terms of their many different characteristics. In this presentation, I will present and discuss different approaches available in SAS. The determination of the number of clusters represents a classical problem of bias-variance trade-off. The presentation will discuss various heuristics but also practical considerations to determine a reasonable choice of clusters. The practical implementation of cluster analyses comes with various challenges. I will discuss standardization of variables, weighting of variables, correlated data, outliers, finding spurious small clusters, and identification of relevant clusters. Finally, the communication of cluster analyses has its unique challenges and I will mention various approaches based on real case studies. Bio: Alexander Schacht (PhD), Principal Research Scientist, Global Statistical Sciences leads a group of 5 European based statisticians driving the statistical activities around launch preparation including HTA submission to support access and commercialization in different auto-immune diseases. After 2 years at Boehringer Ingelheim, Alexander joined Lilly in 2004 and held various positions within statistics with a focus on neurosciences working on phase I, III, and IV in areas like Alzheimer, Schizophrenia, ADHD, Depression, and Pain. Alexander received his PhD in Biometrics in 2002 from the University of Göttingen on work related to non-parametric analysis of covariance. For the publication based on this, he was awarded the 1st. Gustav-Adolf-Lienert Price in 2009 by the German region of the International Biometrical Society. He has published both methodological papers (e.g. on network-meta-analysis, non-inferiority approaches for time-to-event data) and medical papers including more than 60 papers in peer-reviewed biomedical journals. He is a regular speaker at both medical and statistical international conferences. As the chair of the special interest group on benefit-risk of the European Federation of Statisticians in the Pharmaceutical Industry, Alexander is leading and promoting research on quantitative assessments of benefit-risk. He is interested in all aspects of launching new treatments.
Ilya Lipkovich, IQVIA	Overview of methods for subgroup and biomarker identification from clinical data Abstract: In this talk I will provide a high-level description of a broad class of statistical methods for subgroup/biomarker identification in early and late-phase clinical trials. First, I contrast “data-driven” subgroup analysis with a traditional “guideline-driven” approach and describe key elements of principled data-driven subgroup analysis. Then I review 4 classes of methods for subgroup identification that had emerged recently as a result of cross-pollination across machine learning, causal inference and multiple testing (global outcome modeling, global treatment effect modeling, modeling individual treatment regimes, and local treatment effect modeling). I also briefly review available software and key features of subgroup identification methods. Bio: Ilya Lipkovich is a Sr. Research Advisor at Eli Lilly working in Real World evidence. He received his Ph.D. in Applied Statistics from Virginia Polytechnic Institute and State University in 2002. He has more than 15 years of statistical consulting experience in pharmaceutical industry. Dr. Lipkovich research interests include subgroup identification in clinical data, analysis with missing data, and causal inference from observational data. He is a chair a Subgroup Analysis Working Group sponsored by the Society of Clinical Trials. He has published widely including co-authoring a book “Analyzing Longitudinal Clinical Trial Data. A Practical Guide.”
Andy Nicholls, GSK	Using the SIDES algorithm to the identify patient phenotypes that have the potential to benefit most from switching to Relvar Abstract: In 2016 GSK successfully completed the Salford Lung Study, a 12-month, open label, randomised, effectiveness study to evaluate fluticasone furoate (FF, GW685698)/vilanterol (VI, GW642444) Inhalation Powder delivered once daily via a Novel Dry Powder Inhaler (NDPI) compared with the existing COPD maintenance therapy alone in subjects with Chronic Obstructive Pulmonary Disease (COPD). Upon completion of the study, the Scientific Committee expressed an interest in using a data-driven approach in order to identify patient subgroups for which the treatment effect was strongest. In this presentation we will look at why SIDES was chosen for this analysis, the design parameters, and how it fared. Bio: Andy is a Statistician with a strong interest in Data Science, having previously worked as a specialist R Consultant and Data Scientist for Mango Solutions. On re-joining GSK in 2017, Andy provided support to the Relvar project, for which he led an exploratory cluster analysis using Salford Lung Study data in order to try to identify patient subgroups that might experience an additional real-world benefit of Relvar. He now works in GSK’s new Statistical Data Sciences division within BioStats and is Business Systems Owner for the BioStats HPC environment for R.

Registration
PSI Member	Free
Non-member	£20 (plus VAT)

Name	Hostname	Vendor	Expiry
ARRAffinity	.psiweb.org		Session
This cookie is set by websites run on the Windows Azure cloud platform. It is used for load balancing to make sure the visitor page requests are routed to the same server in any browsing session.
ARRAffinitySameSite	.psiweb.org		Session
Used to distribute traffic to the website on several servers in order to optimize response times.
__cf_bm	.vimeo.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
_cfuvid	.vimeo.com		Session
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits
__cf_bm	.glueup.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
AWSALBTGCORS	psi.glueup.com		7 days
AWS Classic Load Balancer Cookie: Load Balancing Cookie: Used to map the session to the instance. Same value as AWSELB.
PHPSESSID	psi.glueup.com		Session
Cookie generated by applications based on the PHP language. This is a general purpose identifier used to maintain user session variables. It is normally a random generated number, how it is used can be specific to the site, but a good example is maintaining a logged-in status for a user between pages.
History.store		Google	Session
cookiehub	.psiweb.org	CookieHub	365 days
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site.

Name	Hostname	Expiry
vuid	.vimeo.com	400 days
These cookies are used by the Vimeo video player on websites.
AWSALBCORS	psi.glueup.com	7 days
Amazon Web Services cookie. This cookie enables us to allocate server traffic to make the user experience as smooth as possible. A so-called load balancer is used to determine which server currently has the best availability. The information generated cannot identify you as an individual.

Name	Hostname	Vendor	Expiry
_ga_	.psiweb.org	Google	400 days
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions.
_ga	.psiweb.org	Google	400 days
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.

Event