Modernization Hub

Modernization and Improvement
Developing Novel Therapies for Stimulant Use Disorder

Developing Novel Therapies for Stimulant Use Disorder


– Good morning, everyone. Good morning. Good morning and welcome. I’m Mark McClellan. I’m the director of the
Duke-Margolis Center for Health Policy. And I’m really glad to see
so many of you with us today in person and on the webcast
for this public workshop on Developing Real-World
Data and Evidence to Support Regulatory
Decision-Making. This is being convened under a cooperative
agreement with the FDA. And this public workshop
comes at a very important time and obviously a time
with great interest among the stakeholder
communities on how to advance the use of real-world evidence
for regulatory decisions, and through that to
support better medical care and better health. This is evident by, again,
the number of people what are here in
the room with us. We’ve got more than
a thousand registered to join us online. Again, thank you all for
being with us online. We are having this
meeting because the FDA is currently developing its
Real-World Evidence program and actively engaging
the broader public, the stakeholder communities
and understanding how real-world evidence
can best be used to increase the
efficiency of research and compliment or add to
the total body of evidence on medical products. As the FDA mentioned in the
Real-World Evidence framework that was published
late last year. This is an important
milestone towards establishing the FDA’s
Real-World Evidence, expanded FDA Real-World
Evidence program. This type of evidence represents an important opportunity
to address questions that may not have
been answerable through traditional
clinical trials. The framework lays out key
regulatory considerations. These include evaluating
whether the real-world data involved is fit for use, ensuring that the trial
or study design used to turn that data in
to real-world evidence can adequately answer
the question of interest. And then assessing whether the study meets
regulatory requirements for study conduct. These issues are going
to be at the heart of the discussions today. In our opening panel,
in our sessions, in our first three sessions, we’ll take a careful look
at the data reliability of real-world data. We’ll look at
opportunities to foster an ecosystem that enables
the systematic development of fit-for-purpose
real-world data. Then in panel sessions
four and five, we’ll consider some of the
key methods and study designs, along with endpoints
that could be used to support causal
inferences and frameworks to help answer
regulatory questions through real-world studies. The methods focus in
today’s panel discussion will focus on
non-randomized study design. It is important to
remember that randomization is also possible in
the real-world setting. We had a public workshop
with FDA in July that looked at how study
designs using randomization could generate
real-world evidence within routine care settings. And the Duke-supported Clinical Trials
Transformation Initiative is also doing important
work on that topic. There was a lot of
stakeholder interest in that discussion. And some of you may
have attended it as well as we considered key
study design issues such as intervention selection, outcome measurement, blinding, and others as well
as the applicability of regulatory requirements. We’re gonna hear
more about key themes from Adrian Hernandez
and Peter Stein who both participated
in that event on randomized approaches to
real-world evidence generation. And their comments
will also provide some synergies that
we’ll try to pick up on with today’s agenda. We’re looking forward to
having a productive discussion throughout the day and
sharing lessons learned, emerging insights, and
experiences from across the stakeholder
community on generating and using real-world
evidence effectively. Your feedback will help
inform the agency’s continued thinking as it seeks
to develop guidance and explore opportunities,
support the development and use of real-world
data and evidence as part of this expanding
real-world evidence program. Before we get going, I
wanted to quickly go through a few housekeeping items. I wanna remind all of you that this is a public meeting, that it is being webcast online. The record of the
webcast will be available on the Duke-Margolis website
following today’s event. The meeting is intended
to spur discussion. We’re not looking to
achieve consensus today, but rather hear a variety
of important perspectives on these issues. To that end, we’ve set aside
some time during the day for audience participants
to make comments and ask questions
during the moderated discussions from
most of our panels. We’ve also included
a comment period at the end of the day
to provide feedback and raise important questions
that you didn’t have a chance to bring up earlier. And as a heads up to our
speakers and panelists, Kara Marcone and Joy
Eckard, can you guys wave? Are here in the front. They’re gonna help
us keep on schedule with some nice signs to indicate how much time you have
left for your presentation. Also during the day,
there’s coffee and beverages right outside the room. Please help yourself. Lunch is going to
be on your own. If you look at
this packed agenda, we’ve got 45 minutes set aside. I’ll try to make sure
you get all of that time. It’s a very busy day today. Our event staff at the
registration desk outside can help direct you to
some of the options nearby. And you should feel free
to bring food back here with you when we start
back up in the afternoon. That’s what I have to
say for introduction. At this point, I’m very
pleased to get us going by introducing Jacqueline
Corrigan-Curay. She is the director of
the Office Medical Policy at the Center for Drug
Evaluation and Research at FDA for some opening
comments for the day. Thank you, Jacqueline. (audience applauds) – Thank you, Mark and thank
you everyone from Duke who’s really worked to bring
this together, especially Adam. And also I’d like to recognize
Captain Dion Perron from FDA who’s been our liaison to make
sure we all got here today. It’s a great day. We’re very excited. This is a really great agenda and we think it’s going
to be very helpful to us as we continue to work
on the RWE program. As Mark mentioned, we
published the framework in 2018 to layout the program. We’re really well into
our program at this point. And I wanna think that
there were a number of very valuable and
insightful comments. And I wanna let you know that we’ve now taken
those comments. And they are really
informing our evaluation and guidance development
in all the areas that we outlined
in the framework,
including data quality, study design, and
regulatory considerations that we also mentioned
in the framework. And the background was as data
standards and implementation. A number of you gave us
comments on data standards which we’re also
taking into account as we move forward. As we explained
in our framework, our program is multifaceted. In addition to
working on guidance, we are conducting a number
of demonstration projects to inform us on all of
the areas in our program. And we’re excited now that we just had the new
Sentinel contract was let, and that will really enable FDA to continue to grow its
own RWE capabilities. And with that, develop
additional expertise and understand new methodologies
for evidence generation. In addition, we are actively
meeting with sponsors as they think through their
use of real-world evidence in the development programs. And this is being done in a
way to maximum shared learning across all the offices and
divisions in CDER and CBER. As we noted in our
framework of course, external stakeholder
engagements such as this are really critical
to our thinking. This is a field
that is just moving so very rapidly. And the opportunities to hear
from leaders in the field is invaluable to us and
really informs our thinking. As we stated, we need
data that’s fit for use for regulatory questions. And there are just a
number of questions that we need to unpack about
what means to be fit for use. Certainly at the source,
what was collected, what clinical concept
does it represent, how complete it is for use
in evaluating a drug product. Those are all considerations. And today, we’re gonna
talk about how we translate some of those clinical endpoints
that we see in practice for use in regulatory decisions. Our focus is
oncology, but the goal is really to think
about this broadly and what does that
framework look like? That of course is just
one part of the picture. We know that today, data
does not seamlessly flow from the EHR or claims
data for use in research. Today, we’re gonna
hear maybe how we might get one step closer to that, whether that will require
reimagination of the EHR and the clinical
workflow to capture data that is relevant to both
quality practice and research, as we’ll perhaps
hear from one source, and initiative at
UCSF that we support. Or maybe it’s more reliant on
natural-language processing and artificial intelligence
and human curation to extract essential data. These are all approaches
that are being looked at. But for today, the data
does need to be extracted, it needs to be standardized. And along the journey
from the source to research-ready datasets,
there’s just many steps and quality checks that
need to be performed. FDA is familiar with this
in the Sentinel system and is indeed understanding
how that’s done. That gives us such
confidence in the quality of the data that we use. Earlier this year,
we ask whether there are some best practices that one should expect
if we’re considering that data might be fit for use. And we held a small workshop and engaged with Duke
on that question. We understand that
stakeholders have continued to work with Duke on that issue. So we look forward today to
hearing those discussions and help us better understand what does it mean to get
quality data curation, especially from EHRs? And while we spend a lot of
time talking about EHR data ’cause it’s just a sort of
difficult nut to crack still, we are also extremely
interested in the opportunities afforded by mobile
technologies and sensors to capture the
patient experience outside of the
healthcare system. Through the work of
Captain David Martin on the FDA MyStudies Apps and we have collaborations
with our CERSIs on use of sensors to
explore new endpoints. We are learning about the
value of this technology in clinical research. And I understand we’ll
hear a lot more today about a number of you who
are working in this area. Finally, of course, we need
to think about study design. And today’s agenda provides
an update on work being done on observational study
methods and causal inference. Equally important,
as we mentioned, is the use of randomized
control trials and to generate RWE. But we had this two-day meeting. As Mark mentioned, it
was a great meeting. It’s available online. And so we’re not going to have
a session dedicated to that, but we are gonna start the
day, as it was mentioned, with a brief recap of the main
takeaways from that meeting. And we want to do that
to both bring everyone to the same page, but also because the issues
we will address around data and endpoints are highly
relevant to these designs and we need to keep in mind as we’re thinking
about these issues, how they fit into the
different study designs. So I’m gonna seed a
little bit of my time so we don’t get behind and
turn it over to Dr. Hernandez and Dr. Stein. Thanks.
– Thank you very much. (audience claps) Thanks and Adrian and Peter,
if you could come on up. So we’re gonna go right into
this recap of our recent event on Leveraging Randomized Designs to Generate Real-World Evidence. Adrian Hernandez will
be speaking first. He is the professor of medicine and vice dean for
clinical research at Duke University
and School of Medicine and Peter Stein is the director of the Office of New
Drugs at CDER at FDA. They’re going to cover
some of the key themes that came out of
that public workshop on leveraging randomized
clinical trials. And that was held, as
Jacqueline mentioned, on July 11th to 12th. And the full webcast
is available online. So let me turn right on
to Adrian, thank you. – Okay, so. This is Peter’s. Can we go to the. (laughing) – It’s probably that one. – Let me skip through and
see if I can get to mine. Let’s see if that’s. – You take all of
Peter’s thunder away. – Yeah, okay. All right.
(audience laughs) So thank you on behalf of Peter. So as you think about the day
today, there’s a lot of focus in terms of real data, primarily around
observational methods. But one thing that we’d
like to emphasize here is as people solve the problems
in terms of of real data, that will be incredibly
useful for doing simple, pragmatic, randomized
clinical trials. And so in that context, we want
people to really think about real-world evidence
that can happen with randomized clinical trials. So first let me just
start off with two numbers that at least from my
perspective, I consider a couple problems that
we’re aiming to solve. The first is 2%. And that’s actually
about the percent, about the number of people
who actually participate in clinical trials in the US. It’s only about 2% participates in an average healthcare system in clinical trials. Yet we’re actually
trying to deliver care for the 100%. And so sometimes
people have questioned whether we fully understand
how to generalize evidence from 2% to the other 98%. The other is 90%. And that’s actually
from surveys. When you ask a
question worldwide about how many people
would actually be willing to participate in
a clinical trial for a health problem
that they may have. And so you can see here
that there is a gap between the 2% that actually
participates in clinical trials and the 90% who actually say they would like to participate. And so if you think about
real-world evidence, it is actually in some ways
trying to close that gap, trying to get better evidence
for the broader group and also meet what people
want to do in terms of solving the problems together
by participating in
clinical research. Now this is the number of people who participate
at Duke, 21,000. And we’re very proud of that. We consider these pioneers. It’s about 2.8% of the
people that we see at Duke. So it’s better on average,
but we still are not perfect. And so our question all the
time is why did they agree to participate? Why are they, these pioneers in terms of donating their time, their effort, their data
toward generating answers that may not
actually affect them, but may affect people like them. And when you think about
it, it’s even harder. So just consider
the user experience of a research participant. And so at Duke, as an example
which is probably similar to many other complex
healthcare systems, people have to go park
in a large garage. They then have to find
their way to a research site or a clinic area. They often have confusing maps which is not easily
interpretable such as GPS. And then when they get there,
they have this experience in terms of what looks like
a very clinical experience that may not necessarily
be fun and exciting. And then they spend time, in terms of their
social interactions, watching someone type in data in through a case report form. So if you ask the question,
is this the ideal experience? I’ll offer that we’re
not quite there yet. And so thinking about how do
we get to better experience? Leveraging real-world data is actually making
it more seamless for the user experience or
the research participant. So again, asking,
why do people do it? And they really do do it
because they’re hoping for something better. They’re part of a community
that’s actually trying to change the game,
trying to generate answers for people like them. And when it gets even
further down the line, you actually look at
the total experience. Not only the research
participant experience, but actually the
team experience. And traditional clinical
studies often feel like work, filling out paper forms, filling out things like this. And so they actually
don’t have something that’s more seamless for
participants to do this online as they go through life. And what they’re used to
is something like this. So people really want
an experience like this that’s actually integrated with their life at
their convenience and being able to do it at home. So that’s convenient,
flexible and personalized. And so that’s the expectation from people on
the outside world. And so I do though have
hope for the real-world, so to speak, going from
that 2% to the 98%. And I do think that
this community of people who are really driving answers for what real-world data is
and the quality around that can change the game for
randomized clinical trials. And so this is in part do to
interest on health systems. They too want better data. And there are a lot
of reasons for that. They’re very interested in having a live
diagnostics and analytics to help manage the patients
that they care for. They also want to
address quickly safety and harm reduction, similar themes to what
Sentinel is doing. They want to get to
preventive health. They want to get to
precision health. They want cost reduction
or value of care. And they also want to be better in terms of population health. And so as health systems
think about this, so too we should take advantage
of what they are doing and what can be possible
for clinical research. And a few key points that
I think are important regarding the opportunities
here is that certainly we’ve noted, data
are everywhere. But the more important
thing is all the efforts in terms of curating that
data is really growing from multiple dimensions. There are efforts
such as PCORnet, other efforts across the US. Sentinel is a grand example
in terms of curating data to be used. But individual health systems
are actually doing this in lots of different ways and
also in different consortia that provide answers for
people with research questions or healthcare questions. The second is actually
people-centeredness. If you take the world
outside of healthcare and think about what
they’ve been doing and those direct to
consumer products, they’re really focusing on
being flexible, frictionless and so-called fun. I mean, who would imagine that you would actually
buy furniture online without sitting on it? Who would imagine you
actually do exercise at home with a studio experience? So they brought that
experience into the home. And direct to participant
efforts for research have also kind of
thought about how to take those experiences and
translate that into research to make it more personalized,
streamlined and valuable, returning something
to that participant so that they stay
engaged continuously. The third thing is
mobile health technology. As people advance in terms
of user-reported data, other measures, passive sensing, this is only gonna grow as well
as being able to understand how it can be used and again, having a
more complete experience of a participant and
understanding their health. So let me end with a
real-world example here. And actually, I was stimulated
by Bob Temple’s example in our summer workshop about,
we should remember the basics. Simple trials were
invented awhile ago, and we’ve kind of lost
our way in some ways. And so I actually went
back a little older and looked at a very important
disruptive technology. So this is one of
the early trials with the GC Group in Italy. And without any funding
and a lot of skepticism, they were able to accomplish
a very important, simple trial in the cardiovascular space. They did it because they
had cardiologists engaged, actually motivated
to answer a question. And they also did it ’cause
they were very selective in terms of the data that they focused on,
very parsimonious. And so you now you look at,
say what’s possible now? We’ve been doing
the Adaptable study, asking a question about
two doses of aspirin, leveraging health systems and
creating health data, PCORnet, and randomizing 15,000 people. And we’re in follow-up now. So we’ll learn what is possible
in terms of real-world data. And we’ve done it
through designing this around the participants. So your Participant Portals and also thinking about
how this could go forward. And if you start thinking
about real-world evidence, real-world data, and how
it can be done in trials, you can think of
different categories and the elements in green are
things that have been done leveraging real-world data and real-world
systems so to speak, to do embedded clinical trials. Others are in the works. And then finally, just
and remember, quality and outcomes matter. You’ll hear more
about this from Peter, but really thinking
about what’s the purpose? And as we think about this in
regulatory enabling studies, thinking about what is needed, say for post-market commitments or expansions or revisions. And then thinking about core
principles, quality by design that applies to
any clinical study that gonna be
important for health. And also how to make decisions. The world is not perfect, so understanding that
is priority basic. And we need to think
about that balance between so-called
the ideal world of a traditional trial
and the real-world that we encounter people dealing
with their health problems. And so going through
that and really thinking about what’s the hybrid approach
to this is pretty critical that it’s not gonna
be a single answer. And so I think the puzzle
is coming together. And so as we think about that, understanding the
different elements from engagement, enrollment,
to actually real-world data and randomized studies. And then matching
all the unmet needs with all the advances around
real-world data and technology, including randomized
trials is important, but also making sure
that we engage patients, clinicians, and
systems and ensure that we have trustworthy data. So thank you, and I’ll
turn it over to Peter. – Thank you. (audience applauds) We do have your slides back up. – Well done. Well good morning. And this is really a
pleasure to be back here to talk about some of
the discussions we had a couple months
ago when we talked about the use of randomization in pragmatic trials. And what I’d like to
talk about and I think this will be, I hope, a nice
follow-up to Adrian’s comments, is the sort of what we are
talking about when we talk about pragmatic trials. ‘Cause I think to begin
with, I’m not entirely sure that we have a clear
definitions of pragmatic trials. What I’d say is
that the terminology has been a little
bit indefinite. I went back to try to see if I could find
definitions of this and there was definitions from
Schwartz and his colleague back originally in 1967 and
it was republished in 2009 where he talks
about the approach of explanatory versus pragmatic. And as I refer
that article to you ’cause I think it’s a good basis for some of these discussions because he talks more about
the explanatory trials as looking at hypothesis, trying to establish
biologic phenomenon. And pragmatic trials,
trying to answer questions for the real-world. And I would posit
that in many senses, what we’re trying to achieve
as we look at the data from our Phase 3 trials is to understand more of
that pragmatic questions than necessarily the explanatory or biologic phenomenon
in question. So I think there is a spectrum. But our intent is really
to understand the drug as it would be used
in the real world. And so I comment in terms of what I think
Adrian’s points are, that in fact, I think there’s
a lot of synergy there. I think it’s a somewhat
artificial distinction to think about trials that
are sort of traditional and trials that are
sort of real-world. Because the intent really
is for us to leverage data that can robustly
answer what we think are the important
clinical questions. So the idea of our
intent in supporting our regulatory decision-making, it was really to get data that provides
persuasive information. And so those elements
that we call real-world that can provide
persuasive information are elements that
were very interesting. It’s really a question
of how persuasive can we meet our standard
of substantial evidence that’ll come back
to in just a moment. I look at randomization and blinding really
as methodologies. They’re not sort of
absolute platforms, absolute requirements,
but really what they are are tools to develop
data that is persuasive. They allow us to sure
balance at the time that we randomize patients, at
the initiation of treatment, we can assure that the
patient comparisons are robust which is challenging
when we’re looking at observational data. Blinding allows us to ensure
that the interventions and the things that
happen to patients and the collection
of information is balanced after randomization. So these are tools to
generate persuasive data that we can rely on. I would also say that
the traditional trial infrastructure is
really intended to define a patient
population that we know is appropriate
for the treatment. We will also know that
we want to have careful and regular monitoring so
that there’s appropriate collection of
safety information. Well what’s appropriate? Depends on the situation. Certainly early in Phase 1, when we’re looking at
a drug introduce to man will be different
than in Phase 2 and will be
different in Phase 3. So it has to be adapted
to the circumstances and to the intent of this study. We wanted to make sure the
data is well-documented. Jacqueline talked
about that earlier. We wanna make sure that
we can rely on the data, that it has been
translated from the source into the trial dataset in
an appropriate fashion. That doesn’t exclude
the potential for using real-world data, it just says that the
data needs to be reliable, whatever the source
of that data is. Whether it’s in the
context of a randomized, traditional trial or
whether it’s in the context of a real-world
evidence-based trial. What are we looking for? Well at it’s core, the
elements that we look for from a regulatory perspective,
what the objectives we have, as I listed here, really relate to both approvability
and labeling. We need to make sure
that the drug works in the way that it’s
reported to work. We need to have
substantial evidence that the drug is effective. Typically, we say
that’s too adequate and well-control trials, but what is an adequate
and well-controlled trial? Well it’s define in regulation, but those regulations
are not highly detailed. They give the general rules
of what we all would construe as trials that can be
reliable, pre-specification, appropriate analysis,
definition of the population, definition of the endpoint, things that would remove bias. All of those things
I think we would say would be elements that
if you were trying to take data from the real world and say that it strongly
supports something, you would want to say
those elements were met. I think we would
share that perspective on what adequate and
well-controlled might mean. The drugs benefits have
to outweigh the risks. That’s how we interpret safe. So we have to understand
the safety characteristics of the drug. And again, if we’re
using real-world data, how reliable is that? How much information
has been gleaned in those settings to assure
that we’re getting what we need? We also have to properly
assess the dose, the regimen, the safety
profile and the risks so that we can have
appropriate labeling, instruction for use
for the physician so that they can safely and
appropriately use the drug and describe the evidence
from clinical trial. So that’s the information
that we’re seeking to get from trials,
whether they be conducted in real-world settings
or in traditional, clinical research settings, that’s the information
that we’re looking for. So I did wanna go back to
some of the definitions and descriptions
I’ve seen recently on what are sort of
pragmatic trials. And I took two quotes, one
from Dr. Califf and Sugarman back in 2015, and one from Taljaard
and his colleagues from trials back
just a year ago. And the comment there
was if the intervention works in the real-world settings so that it can be generalized
to everyday practice. The comment I’d make there and I think there’s a
similar theme from Dr. Califf which is informing
decision makers regarding the comparative benefits of
balance of benefits, burdens, and risks as the purpose of
these types of pragmatic trials. Well I think that does
make an assumption that traditional
randomized clinical trials don’t inform everyday practice, that results from such
randomized clinical trials are not generalizable. And I would challenge us all
to really think about that because there’s a sort
of general contention. It’s an introductory
to every article about pragmatic trials as well. Traditional randomized
clinical trials are done sort of in this
isolated ivory tower situation. Along the nice map that
Adrian showed at Duke which is a place I think I
would not wanna participate in a clinical trial ’cause
I would clearly get lost trying to get the–
– It’s better than DC. (audience chuckles) – [Peter] All right, score
one for Adrian on that one. But there’s a perspective that
there’s a sort of ivory tower type trial that aren’t
really real people. And then there’s
the pragmatic trials that are the real-world trials. And I’d say we should
pressure test that assumption ’cause I’m not so
sure it’s true. Certainly many years
ago, even Phase 3 trials were fairly
restrictive, age limits and very long, long
inclusion, exclusion criteria. Certainly more and more
we’ve been pushing sponsors not to have age limits
and to open up trials in all kinds of settings. And so yes, only 2% of
patients are participating, but are those 2% representative of the patients who
will get the drug? Is there a reason to
think that the information from a traditional trial that’s
used to support an approval isn’t going to represent the
response a patients will have? Well, I think it’s fair to say
it won’t fully represent it. There are patients who
might not have participated, but I do think we have to
really think about that and just not accept
that as given, that these trials are not
somehow representative. And I think it
asks the question, what are we trying to find out when we’re thinking
about pragmatic trials versus traditional trials? I think we have to
also think about what underlines differences? If we do see differences
between traditional, randomized clinical trials and pragmatic trials, what
underlies those differences? What are we trying to figure out that might those
differences might be due to? I also would just point out
the other comment that’s made which is from Taljaard who talks about how intervention works under well-defined, highly
controlled conditions, referring to our
traditional trials relative to the pragmatic trials. Well, when trials are
less well-controlled and less well-defined,
what useful information can be gleaned from that? Certainly there are
pragmatic designs that can loosen many
of these criteria, but will that provide
reliable, robust information? I think that’s something
that we have to answer as we think about
the trial designs and the trial purpose. As we think about
these differences, we have to look at
all the differences that might occur as we broaden and use more pragmatic designs, adherence to the drug,
population studied, the co-interventions
that might occur, how patients are
monitored, the follow-up, the assessment of the end,
collection of endpoint and the quality and
reliability of the data which will be a point for
a lot of discussion today, I know, in future panels. I think we’ve shown
this diagram before, but it just points out
that don’t believe that we can think about
the traditional trial
designs on the left and the real-world data,
real-world evidence on the right and think about this
as one or the other, but really more of a spectrum. And I might point
out that in terms of pragmatic randomized
clinical trials, I think these really
also have a spectrum. We could think about
pragmatic trials more like sort of the
large, simple trials that we’ve thought about, where
there’s not a lot of excess collection of information, but really getting
right to the core of the critical questions
we’re trying to answer and collect very limited
endpoint information. And pragmatic trials where
we might think about adding additional elements such
as getting that data from health claims records or from EHRs rather than
from using case report form. So a wide spectrum,
even within the concept of a pragmatic trial. Well this is the,
I’ve sort of redone, if you will, the PRECIS diagram. That’s PRECIS-2 that was
published a few years ago and originally published
a few years before that. Sort of looking at the elements that make for pragmatic trials. And all I would say is that
as we think about this, I think we have to focus on
what we’re trying to figure out. If we broaden a
study population, change recruitment settings, do we know the patient
has the disease we’re trying to address? If we change the setting and
put to change the organization at the site level to make
it more patient-friendly, are we sure that we’ll
have well-controlled and reliable information? Well a patient monitoring an
evaluation be appropriate? It certainly might be,
but we have to attend to those information. If we’re looking at
interventions that
are not blinded, we have to think about
the role of blinding. Will be able to
adequately assure balance
between the groups if we have unblinded
information? If we’re not focusing
on adherence, do we want to
understand adherence as an objective of the study or do we wanna know
what happens to patients who are taking the drug? Those are questions
I think again, we have to ask when we think about what the
research question is. Are we able to detect
the efficacy endpoints and get adequate
safety information if the monitoring
isn’t in some way well-defined and assured? The outcome, are we sure that
it’s going to be accurate and reliable if
we’re translating from claims data, for example, into data that we’re using
to construct an endpoint? Will that be reliably assessed. And how well are
the analysis going to account for
missingness of data if we’re using a
pragmatic design and patients are moving in
and out of healthcare systems? Are they doing that
in a systematic way or are they doing that
in an imbalanced way? We have to understand that
if we’re going to do analysis of data that isn’t generated
for research purposes, but for use in other settings. I’ll skip through
this ’cause I think I’ve made these points, but I just wanna
end on commenting on some of the challenges
of pragmatic trials. And the message I think
I would wanna deliver is we have to think about what
the purpose of the trials is. If it’s for regulatory purposes, then the elements
have to be assured that allow us to assure
that the data is persuasive. So is the design
consistent with a purpose? If it’s for regulatory purposes, we have to be sure
that we’re able to assure that the
data is reliable and that we can use this to
make important decisions. If that we use a broader
patient population, are we assure that
we’re getting patients who have the disease, that we’re targeting the
right patient population and not bringing patients who
might not have the disease or the indication? Are the interventions consistent with what we’d see
in clinical practice? If we’re using on blinding,
do we have the right kind of design? Are we using
objective endpoints? Are we assuring that
co-interventions are balanced? Are the endpoints meaningful? If we’re gonna extract
data from claims, are we actually measuring
what we think we’re measuring as the endpoint? Is the data reliable? Are we assure that
it’s coming from source that is properly accrued
and that it’s translated into the datasets for analysis
in an appropriate way? And is patient follow-up
sufficient and appropriate to assure that missing
information, if imbalanced, isn’t confounding the results? These are the challenges
I think we face as we think about
using pragmatic trials to assure that we
can rely on them if they’re for
regulatory purposes. Well, I’ll stop there and
thank you for the invitation, and I’ll look forward to
some further discussions on all these issues
as we go forward. – Peter, thank you very much. (audience applauds) I’d like to thank
both Peter and Adrian. We’re gonna go right
into our first session. Again, for additional
detail on the issues, the key issues related
to randomization, real-world evidence,
pragmatic clinical trials, please look at the
information on our website from that summer workshop. This session is on
establishing high-quality, real-world data ecosystem. And come on up, panelists. For, as you heard in those
opening presentations for randomized real-world
evidence studies, as well as observational
real-world evidence studies both depend on an ecosystem
and infrastructure to support the capture of
well-understood, quality data and its curation, the curation of fit-for-purpose
datasets to inform decision-making about
medical products. So systematic approaches can
help ensure interoperability while maintaining
patient privacy to realize the
potential of aggregating these diverse sources,
increasingly rich and large sources
of real-world data. In this session, because
of the importance of the data issues and the data ecosystem,
we’ll hear about strategies that some stakeholders are using to enhance data and
support the vision of an effective
real-world data ecosystem. The panelists include
Adam Asare who’s director of technology for
the University of California, San Francisco
Breast Care Clinic, and he’s also chief data officer for Quantum Leap
Healthcare Collaborative. Wendy Rubinstein is
deputy medical director of CancerLinQ which
is an initiative in the American Society
of Clinical of Oncology. Nancy Yu is the CEO of
and co-founder of RDMD. And Kevin Haynes, principle
scientist at HealthCore. So we’ll hear from
all four of them, starting with Adam. So please come on up. – Great. I’d like to think the organizers for having me present today. I’m here to present
our approach at UCSF in terms of developing a
real-world data ecosystem. So as has already
been presented, is there’s been a
lot of challenges of doing source data capture in the management of
clinical trials data for Phase 2 studies such
as the I-SPY 2 trial which is a randomized, Bayesian
adaptive platform study. We currently have over
1,800 patients enrolled in that platform
across 18 sites, along with Athena program
which has over 100,000 women enrolled across the
UC Center in Sanford. What we’ve approached is
a unified architecture and platform that
envelopes or encompasses all stages of treatment
in the Breast Care Clinic, in terms of the different
stages and programs. And then creating
electronic platforms are bring your own mobile device for patient-reported outcomes, providing discreet data capture
by clinicians and providers, and also providing
seamless integration of those elements for flow into
case report forms for study. And the way we are working
towards incentivizing providers is providing decision
support tools at the point of
care in real-time. So we have dashboards,
and then allowing for services that would
typically be covered by EHRs, to have those
seamlessly integrated within the platform itself, and having this continuous
learning process of quality improvement
so that there really is less and less siloing between
the actually trial work that’s being done and
the clinical processes. So one of the areas
that we’ve been told is that the EHRs themselves do not provide adequate
decision support. So one of the key takeaways
is what we hear constantly is where is this patient
in the order of treatment? So what we’ve created is really
clear, easy to use visuals in terms of where this patient is in their continuum of care. Another process for example is the nurse navigators’
involvement process don’t have a way, they are offline,
managing the process as to where lesions or
biopsies are being managed. Being able to create
an integrative process for this data to be directly
entered at the point of care, and then having
providers or clinicians just simply sign off on
that and to visually see that they can easily access this and sign-off later on
down in the process. So what this allows
is the clinicians have more direct access and more time with
the patient as opposed to reviewing and manually
entering this data downstream. So our approach is to create
structure data elements that are integrated
in the EHR system. We’ve been working, the UCSF IT, there’s definitely
governance issues, but there’s also technology
limitations as well. We’ve actually been working
as part of our FDA CERSI grant to actually create inroads. We’ve got an overlay of
forms that are seamless and work directly
within the EHR system. We’ve created a series of forms and within therapeutic areas that are clinically oriented. And we’re using
validated instruments for patient-reported outcomes such as the Promise
Health Measures and the NCI PRO-CTCAE surveys. Then all this is now
integrated in our repository that we’re calling the
OneSource data store. And then this in turn
can help support a number of different users
downstream in the process. What we’ve done in terms
of, we’ve had a number of different consultations
in terms of evaluating our clinic process. So there’s been a lot of
redundancy, repetition. What we’re hearing
over and over, people can’t find the data. There’s inconsistency in
terms of what is the source. So if you’re trying
to create source data, what is the source,
when it was collected and who did sign off on that? So by streamlining our
process in the future state, it’ll be much more seamless so that we won’t be hearing
those exact questions over and over again. And we’ve prototyped
these systems in a number of different areas
within the Breast Care Center and they’re very well
received by the clinicians and the providers. So this gives you a visual in
terms of the different stages. So it’s not just enough
to capture this data, but where are we in the stage
in terms of who is creating the source of this data
across these different stages of the continuum
of clinical care? Who should just simply
review and sign off? So the clinicians would
be in that part, process of the nurse navigators
entering this information. And then downstream, there may be just
having this data locked in the clinical care process. And then when we actually
get to the research and the case report forms, there may be another
case of or another stage of simply reviewing this
data at the very end or supplementing that data
’cause there may be some other additional interpretation
or assessments that need to be done. So this is my one
technical side. I’m the primary architect and
the technology individual. So this is the
last slide I have. So with our collaboration
with CERSI and the FDA, where we’re using SMART on FHIR for using the open
source read access within our EHRs. Our goal is not to go
directly with the EHR to the case report forms, as we’ve discovered
in our first phase, a very limited amount
of data is discreet. We have some lab data, some
enrollment criteria data, but other than that, what
we’re doing is creating this intermediary quality layer. So we have the
decision support tools as this in-between layer. We’re using the
patient-reported outcomes as mobile devices. Then all this is then feeding
into the case report form, specifically this study. And then we’re mapping these
using the CDISC standards, and then we’re expanding upon the Therapeutic Area
User Guide for Breast Cancer, along with focusing
on adverse events, structured data, along
with the patient-reported adverse events data that
we’re also receiving as well. So thank you for the
opportunity to speak. – [Mark] Thank you very much. (audience applauds) – Good morning. Thank you for the
opportunity to come here and speak about CancerLinQ and some of the issues
with interoperability and especially about solutions that ASCO is coming up
with in the name of mCODE. So let me first lay out,
let’s say, the problem here at least briefly. I think you’ve heard
all about the problem. So CancerLinQ is primarily
a platform for clinicians, participating
clinicians, oncologists to reflect on the
quality of care that they’re delivering
and to benchmark it against all of the other
practices that participate. So we enable them to, we
automatically calculate quality measures that
they submit to payers. And they can also benchmark
against patients like mine apps. We are working with
over 100 organizations, but we’ve already
aggregated information from over 50 practices,
large and small. And you can see on the right, there’s sort of the
types, academic, large health systems and
also community practices. So what’s different
about CancerLinQ than some of the other
real-world evidence EHR collection systems is that we work with
any and all EHRs. We don’t own one. We don’t have a favorite one. And so we think that allows
us to get into really the complexity of
care out there. I think it’s probably
too small to read, but we have 10
supported EMRs so far. And that really tells you that
we’re in the sausage-making of real-world evidence, especially not only
collecting and aggregating, but really trying
to harmonize this. So I’ll try to give you, and
of course we do curation. The other numbers here are
over a million patient records, of which about 15% of
them have been curated at various depths. So to give you a sense of,
you don’t have to scratch the surface very hard
to find problems. I’m only gonna discuss a
couple of the top-level ones. But one of the abstracts
that we presented at the ASCO Annual Meeting
by Bernstein, et al., was to look at the
de-identified and aggregated CancerLinQ data at
47 of these practices to look at the
consistency and variation of structured data within EHRs. So you know, usual
things, race, diagnosis, encounters, things that you
need to support clinical trials. And we found actually only
one practice used LOINC codes for their labs. That already shows you what
a problem we have here. We saw no standards
for medications
ordered or biomarkers. And as you start to look at, if you think about it,
having 47 practices, when you start to look
at unique lab tests, for example, neutrophils, we found 81 different ways
of describing neutrophils in 47 different practices. And the problem actually seemed
to be less for biomarkers, but that’s just
because there’s less structured biomarker data in
electronic health records. So I think that makes
the point there. The other point I wanna make
is really for genomic content. And here, the problem of course is not having the
information structured. But I’ll give you
two examples, EGFR and the BRCA1 and 2 genes. So for EGFR, if you look at
natively structured information in EHRs, only 1%, 1.7%
of all of the records for advanced non-small cell
lung cancer had structured data. But if we look at
our curated set, 85% of those records
actually had EGFR tests. The situation for BRCA1
and 2 is somewhat similar. Again, 1.5% of the records
have structured data. But when you look on a
practice by practice basis, only a third of the
practices in CancerLinQ had any structured
BRCA data at all. And actually the
results that we found were coming primarily
from two practices among all these 50 or so. Whereas we know that
the testing is going on because if we look at
curated records for breast and ovarian cancer, we find
5,000 BRCA1 and 4,000 BRCA2. So before I get to
mCODE, I do wanna hammer specifically on the points
for the genomic data because I think it’s a
little bit different now. Remember that the genomic data actually come off the
sequencer in structured format. It’s fully digitized. And as the information
percolates through aligners and all of that,
it’s still digitized. In fact, even as it gets into reports that
go to clinicians, those are XML
reports and the like, those are still structured. Then they get turned into PDFs and then they get
sent to the EHRs. They’re more or less
buried some place as a scanned document. And then we at CancerLinQ
have the pleasure of trying to curate it out. And it’s, of course, expensive, but we see quite
a lot of the data and quality fall through there. So I think the information
here is really to get through and to get it from the source. So if you look at what the
21st Century Cures Act says, we think that it would
help laboratories meet the stipulations
by providing that structured data directly
to health facilities. And that would facilitate
health information exchange without special effort
on the part of the user while avoiding the
Act’s prohibition of information blocking. So let me get to
the meat of mCODE. You know, mCODE is an
attempt on the part of ASCO working with many
stakeholders to develop and maintain standard
computable data formats. It stands for Minimal Common
Oncology Data Elements, mCODE, to achieve data interoperability and enable progress in
quality initiatives, clinical research, healthcare
policy development. You can go to this URL
and learn more about it. The guiding principles of mCODE are that it is
highly collaborative. It is iterative
case development. I worked with Bob Miller who’s the medical director
at CancerLinQ, along with a large
group of stakeholders to develop two
specific use cases, that’s just the start,
for the mCODE 0.9 release. The maintenance is
reductionist, it’s parsimonious, and it’s developed by users. And importantly, it’s a
non-commercial data standard. It’s not something
that ASCO intends to license and profit from. So to give you an insight
into how this works, there is a governance structure. You can see that there are
other organizations involved such as MITRE, FDA, ASTRO. There’s an executive committee. And for those of you who
are interested in presenting new standards or helping
to build this out, there is a technical
review group which maintains the
mCODE dictionary. And that will go through
initial use case development and then working groups as
well that you can read about. So I like this slide
because it gives you a sense of what the actual
data elements are. Patient-related,
disease-related, genomics, treatment outcome
and lab and so forth. You can look at this
on the website as well. And importantly, this
has actually gone through a balloting process for a standard trial use and has been approved. And it will come out
in the version 1.0 once all of the comments
have been resolved. So I think that’s pretty
much all the comments. Thank you for your time. – [Mark] Great,
thank you very much. (audience applauds) – Hi everyone, I’m
Nancy Yu, CEO of RDMD. Thanks so much for the invite. We’re really excited
to be here today. So we’re playing the
field of rare disease RWE. And I think there’s a
lot of work and learning still to be done here. So we’re still in
the early stages. And I just wanna
let everyone know that we have not
figured it all out. So a bit about RDMD. We’re building a platform
to help identify patients and generate real-world
evidence by really involving the patients directly because our whole mission
is to accelerate treatments for rare disease patients. And we think it’s
really important to bring them into that process and at the same time
be able to supplement that with really robust
curated data from EHRs, from medical records as well. And so we’ve built a
platform that is two-sided. One side of the platform is
directly engaging with patients. So we directly consent them. We directly work with them
and gather survey information. We take their opinions and
their patient-reported outcomes, and then we’re able to actually
get their medical records from all different facilities. So we have about
1,000 facilities that
we’re working with on the platform right now. And we get the full,
unstructured clinic note. And on the other
side of the platform, that’s where we’re actually
going into those records and taking that
unstructured data and transforming that
into a structured format. And that is not easy. This shows you a bit
of a visual around what the patient
side looks like. Patients will log in. They eConsent. They give us
permission to go access all of their medical records from any facility. And then on our end, we will actually have
clinical researchers and nurse abstractors that
will structure the information into a de-identified way for
external researchers to access. So we work with life
sciences companies. We work with
academic researchers in order to provide this data. And our whole goal is to share
this data for research use at no cost to
academic researchers and to promote that
open access of data. So patients can down their data, they can use it to
facilitate care, they can provide data
to their caregivers. And so the whole
problem in rare disease, as we all know, is
that we just don’t know enough information about
the condition itself. We don’t understand
natural history, and as a result, it’s very
hard to design clinical trials. So we’re trying to
get information from
both the patients as well as from medical records and aggregate that in
a structured format. So this is the data flow. On one hand, we gather
data from patients. This is demographic information,
PROs, medical history. And then on the other end, we have all of
these clinic records that we will aggregate
and then structure. So in our platform,
we’re de-identifying, we’re audit trailing everything. We have the source documents so that we can structure
the information. And then from there,
we can analyze the data and provide that to
different researchers. And so as we know, there’s a lot of different
data management challenges in rare disease in particular. So in rare disease there often isn’t a well-defined
standard of care. And that makes it very difficult because we’re getting data from so many different
facilities and hospitals. And how do you
normalize all of that? So for example, in
one of our conditions, in neurofibromatosis type 2, we have data from over
129 facilities right now. How do you harmonize
all of that information when doctors and
researchers are collecting this data very differently? Clinical outcomes assessments are also very variably captured. They may be subjective. They may be inconsistent even. And so how can you use this
data to support the validation of measures and outcomes when
there is that inconsistency and there is potentially
missing data? There’s also just a very
limited understanding of the conditions itself. So how do you interpret
complex clinical information from these charts without
that understanding of that disease? So often, we’ll work the
top researchers, KOLs, in each condition
to help us define what those measures are
that are very relevant and potentially useful to go
deep and to dive deeper into. And it’s just a note
that curated data or structured data does
not mean standardized. So as we’re curating data
we really need to understand what those standards are and how we tie that
to different standards that already exist. And lastly, this is
the problem in rare. Patients are dispersed
throughout the country. And so if we’re working with
data from 150 facilities, that’s great because we can
get that quantity of data. But again, going back to
the original point around there are no standards, that
makes it extra difficult. So this is the flow of
data on our platform. We get a patient eConsent. We get their permission
to get their records from their hospitals. And all of this is on
our digital platform. And then we request the records from all the
different facilities through an automated fax API because all hospitals
are still using fax. And then we process
the records in house. So we organize this. We have an audit trail
around which processors are actually handling the data so that we can
check for legibility and make sure that the data’s
attributable to that patient. And then we abstract. So from there, we can
export and analyze. So I think it’s important to
touch on there’s two concepts, data quality control
and data relevancy. And so I’m starting with
data quality control which is almost
like table stakes. You need to have a
great audit trail and understand the providence of where this
data’s coming from. And so I won’t go
into detail on this. You can look in the slides. But at every step, we have
quality control measures to ensure that we know
where the data’s coming from and we know where it’s gonna go. And on top of that, abstractors, we have nurse abstractors
who will look into the data, look into the unstructured
notes and actually extract out information on
our software platform. But all of this is done under a centralized research protocol so that we can
actually make analysis and look at insights
across different diseases. So this is what it
actually looks like. You can see that there’s
pre-programmed forms that we develop with
these top researchers and life sciences companies. We have a structured
output that can be instant from the structured data
that we’re collecting. The nurses will capture the
data from the medical charts and we can actually link the evidence from where it
was captured in the record. Data relevancy on
the other hand, we’re basically growing our
clinical module library. And this is not easy because
there are a lot of standards currently in rare that
I’ve listed on the bottom. But not all of them are, they don’t exist
for every condition. So we’re still building this out and we’re just making
sure that we harmonize and that all of this
data maps to or conforms with all of these
other standards. The last point
around data relevancy is high confidence
in real-world data really requires
you to triangulate different pieces of
information from the entire medical history of a patient. And so when we’re
looking at data sources, sometimes in the record
there’s missing data. And so I won’t go into
too much detail here, but we’re tiering our
confidence of the data so that we can flag that for
different purposes of use. If it’s being used in
regulatory setting, we need to know that we don’t
have the source information to back up a claim. And the last point is just
that we believe that patients are really partners in this. Patients can help us
minimize missing data and they can also be recontacted
for future follow-ups. And so they’re really key
partners in all of this because we can actually
go back to them and ask for missing
reports for example. Genetic reports aren’t
always in the EHRs. So we can go back to the patient and ask them for
more missing data. And then we can also recontact them for future
research opportunities. So I might have a couple,
maybe 30 seconds left, but I’ll just show a
case study of the depth of information that
we’re capturing. So this a data from an actual
patient with Hunter syndrome. We were able to look at
early natural history, diagnosis, and management
of the condition from the patient’s
full medical record. And we can update this
medical record over time. So it’s really important to
look at the entire context of a patient’s chart. For this patient, we were able to get
information from birth. They had a normal
newborn screen. And then at five
months, I think, they started having
chronic otitis media, had their first PE tubes placed. At four to 17 months, they had surgical evaluation
for hiatal hernias. And here, they started
losing their hearing. And here, you can already
see the first signs of suspected MPS. And so then this patient goes through the diagnostic journey, ordering the urine GAG tests. I won’t go through this
in too much detail. But then management of
the condition begins. So this patient’s
taking Elaprase, the dosage, the frequency. And what’s interesting here is we can see that
neurocognitive decline, as well measured in
validated scales. So I’ll end with just
it’s important to valuate all of this in context
because you need to know the full history of the
patient in order to make sense of the information. So when we were looking at the Wechsler’s
neurocognitive decline, we actually noticed
that in the note, the doctor’s note,
that they mentioned that this patient may have ADHD. And so that might have affected
the measures of that score. So it’s really important
to know those things before you use them in
a regulatory context or to base real-world
evidence decisions. So thank you. – Thank you. (audience applauds) Next is Kevin. – So I’ll thank the
organizers for the invitation. And I’ll say that, yeah, we’ve
been at this for a number of years and I think we still
struggle with data quality as well as application across. You know, so as epidemiologists, and many of us in the
room are epidemiologists, we’ve been utilizing
real-world data to generate real-world evidence
for quite some time, right. While John Snow’s
data partners Lambeth and Southwark, Vauxhall might not have utilized
a common data model with a standardized set
of verification checks, the assessment of reliability
and reliance of the data was of paramount importance. And that was certainly
utilizing real-world data to generate real-world evidence. And really because of
this data curation process being a dynamic process, I think we heard from
many of the panelists that close coordinated
and sometimes personal
relationships, as I think many of
us in the room have, with data partners is necessary. This is necessary for
both the ETL phases as we extract data into
research-ready environments, as well as then the
study-specific fit for purpose experiences. And so I think that
as we, as researchers and partners in
this research focus on stated research questions, really understanding
the reliability of the real-world data source, it’s applicability to
answer that question and understanding all of the
pieces of data conformance, completeness and plausibility to address that
research question is of paramount importance. And I think it’s a testament
to the Sentinel crew. Which I think this slide
will have to be updated as of Monday to have
the operations center and the innovation center and really the
collective thought of
multiple organizations coming together around
that central theme of creating and curating. It’s not just done at
curation and standardization into a common data model. Which does afford us
the opportunity to run standardized quality assurance
packages across these. There’s a whole lot of
effort that data partners do to get us to there. And that’s why our data partners are also really research
partners in this journey. And so you know, we
see people registered before they’re born, claims after you die, elevated circumcision codes in
females of reproductive ages. Why? Is that real? Why, how does that
happen, right? And so are these failures
indicative of poor quality, or are they reflective of the
real-world data environments that we live in? And I obviously
posit the latter. And so data owners really
do understand the context. And so this is the slide of sort of the curation review process available out on the
Sentinel website. This is an iterative process. The team goes
through this monthly. I think we just got our
email that we’re approved. We celebrate every time we get
one of these ETLs approved. But are we done
at approval stage? I mean, absolutely not because
we have to be thinking about each queries that we run and its ability to
be fit-for-purpose. As data partners, and I would
submit while I understand EMR data, I would partner
with EMR data folks to really understand
their curation practices. We understand and can
characterize our missingness. As a health plan,
we might understand, “Oh, based on this
benefit design, “this is an HMO capitated plan “and that’s why they
wouldn’t be billing us “for those types of things.” And so you get that
without that relationship with why the data was curated
in the first place, right? I like to say that
claims is really nice because health systems
like to get paid and payers like to
know who they’re paying and we like to know
what we’re paying for. So in theory, that should
be really good stuff. I’ve been a pharmacist in the
past and generated claims. I submit everybody
should generate data at some point in their life. (audience chuckles) I think that, you know,
recognizing that the real-world presents several challenges
for data curation and things across. You know, we go through
this curation process in several sort of Level
1, Level 2, Level 3 Checks. All of this is out there in
the Sentinel documentation and matches a lot of
the similar discussions with regards to the panel. But I’ll go through
a couple of examples. So I love zero to
two-year-olds because they, and so dovetailing
on the last example, they practically
have study visits ’cause they come back for
these well-child visits. But you know, even
in our real-world, my 21-month-old will have his
18-month visit this month. Right, so understanding, oh, right context. The first couple of
visits get pushed based on insurance stuff. You’re pushed into this,
even what you would assume would be real-world,
very regulated visits can get pushed. And while it’s plausible,
jumping into an EMR world, it’s plausible that
I could be five-foot. And in fact, in my epic record,
I am five-foot at one point. And the doctor came in thinking that we were gonna have an
entirely different conversation because about 10
inches of my stature were going to be lopped off. So you know, understanding
the longitudinality of data and being able to
correct over time, but then you need that
sufficient amount of time within a given data
environment to help augment and correct those
types of things. And so we go through
this constantly. I think there was a slide, and I’m probably
getting ahead of myself ’cause I have some of
these other examples. I thought of different examples
on the train ride down. That’s what happens when
you have this long career of living with examples. There was this slide, I think
many have seen it in Sentinel where there’s this nice trend
of all the numbers in Sentinel and then there’s this big drop. Well who’s that big drop? It’s us. Is it a big concern? Well actually, we were able
to de-dupe across our 14 plan. So the numbers dropped
but person time increased. And we all like
person time, right, because that’s gonna allow us
and afford us the opportunity to follow populations
longitudinally over time. And in fact, just the other day, we were running queries
for privacy preserving record linkage and found
another 51,000 people who are the same human with
more than one ID, right? So things that will continue
to sort of plague us as observation researchers as
we help to narrow this down. And I think the most important
pieces of all of these is the ability in an
identifiable data environment to conduct validation
studies, member surveys, provider outreach,
and of course, linkage to close
these gaps in data. And so that’s what gets
at a lot of the abilities to really dig deep into the
claims data to understand its providence and its
ability to identify these cohort, identifications
and really run the descriptive analyses
that are necessary. And so with each
query that we run, we put our researcher hat on and understand why they’re
asking that question and what types of things
might be incorrect about that from code lists to data
anomalies, et cetera. And so I’ll close, I
sort of opened with this, but this is really building
off of the frameworks that have come out of the
Duke-Margolis collaborations and we do this somewhat daily. – Thank you. (audience applauds) I wanna thank all four
of you for your comments. And we’re gonna have
a chance for comments and questions from you
in just a few minutes. There are microphones
in the room. But maybe I could
start off by observing that data curation and
date quality control seems like a lot of hard work. And I’m wondering if this is
getting any easier over time or what your sense is about
how much you can build out the quality control
within a data ecosystem. So I think Wendy, you talked
about starting with some of the key data
elements in mCODE. Reminds me of Adrian’s comment
from the thrombolytic studies in Italy in the 1980s. Start with the most
critical elements, get those right,
build out from there. That seems like a good strategy. As you all develop
standards and tools to help clean the data, that’s work
that you don’t have to do again. Does this get easier? Are there any strategic
insights you have about how we could make
more progress faster on getting to a reliable
fit-for-purpose data ecosystem? Anyone wanna? – Well I’ll just say briefly, maybe the goal is not
to have to curate. In other words, to get
it right at the source and get it in a structured way. – Yeah, I mean some of the early distinctions between traditional
clinical trials and sort of real-world
data, real-world evidence, was well, for real-world
data, real-world evidence, you’re using data generated
in the course of care. But you’re not just using that. I mean, you do a
lot of work with it to get it the point, at
least under current systems where it’s usable. – Yeah, so I mean,
one place where it’s, well it’s certainly gotten
easier since John Snow. But you know, I think that
one place where it begins to close gaps is
I think the step that we made as community
from going to Sentinel to PCORnet and from HMORN and other structured data
repositories to Sentinel. I think that evolution
curve has certainly assisted in sort of
getting those checks that can be routinely curated. As we as claims
partners begin to link to clinical data, then
we can begin to automate the process of validation
and those types of things. And so when you
have a really nice, say breast cancer registry or something that you can
then link into claims, then you sort of can at
scale do validation studies besides the sort
of 100-chart study here or there type thing. So yeah, I think through
the process of linkage and through the
process of automation it has gotten easier, but I think that’s opened
up more opportunities to go, “Oh, I need
to check that now “and that now and that now.” – I think in our
world in rare disease, it’s also very different,
so I’ll just preface that. But there’s not even a place
to get clinical information in one place in rare disease. And so you have to go to
all the different hospitals with the different formats to even understand what
the disease might be. But one thing we’re
doing is we’re creating these clinical modules
across diseases that can be usable across
different diseases. So whether that’s standardizing
or developing particles that can be broadly used
across different conditions, that definitely gets
easier over time. With every research
partner that we work with, it becomes easier because
we can leverage what we know from how to curate data from
another disease in audiology. I mean, audiology assessments, we have a clinical
module around that. That’s highly applicable
to many other conditions. So I think it does
get easier over time, but it is indeed a lot of work. – I wonder if another key
issue for the ecosystem is interoperability and that’s
been famously in the news over past decade, past years
as a lot of federal efforts. Is that getting easier as well? Any thoughts you all have
on breaking down the silos between some of
different data sources that you’re trying to pull
together for these studies? – So I can comment on, with our experience,
we’ve discovered that there are some technology
barriers, obviously, but the technology barriers are really not the
rate-limiting stuff. It’s really the institution
and their governance policies. So a lot of it is getting by it and having to get
with the bureaucracy within each of
those institutions. So I don’t need to
name EHR vendors, but with any institution,
with any vendor, the technology may
be there in place and we may have some
open source standards, but that doesn’t necessarily mean they’re gonna
flip the switch to allow you to access that data and definitely, if you want
to go through the hurdle of writing back. So we’re in the
process of actually trying to reengineer the process and some of it requires creating customized
templates and forms. Being able to go to that
and have that data written back to the EHR source
has its own hurdle as who actually wants
to own the source and where was that generated? We would like to say secondary
systems could do that and then by having
that agreed upon, we get a lot more
interoperability because then we aren’t reliant
on one monolithic system or several 800-pound
gorillas in the environment. We can actually then have a
number of different third party tools that are agreed upon. And then if we’re just
looking at the data and not the bureaucracy,
we can actually have some more transparency
and seamless integration. – I think where you’ll
see it is that you know, as you have less and less
residents and fellows doing research projects,
sitting in front of one computer with another computer,
entering in information from an electronic medical
record into REDcap or something, to do a research study. That will be your barometer
that things are least getting somewhere
more interoperable. As a patient, I don’t think I
wanna leave my health systems because at least then,
everything is all there. I need another health app like
a hole in the head to manage. So there’s gonna be a more
need, you know, maybe there’ll be that Mint.com type approach
to have the app of all apps, to have that one place. – From what you said,
Adam, it really resonates in terms of it’s not some, I mean, there are
technical issues, but it’s not primarily
a technical problem. It’s a societal problem. And much of it is attitudinal. You know, I think increasingly
health systems recognize that they need aggregate data
with other health systems. You know, in the rare
disease world it’s obvious that sharing needs to go on
pretty much with everybody. And even within
the cancer world, you know if you slice and
dice and say that you have a non-small cell lung
cancer, but it’s EGFR mutated and then on and on and on, you know, there’s really
an absolute requirement for broad data sharing. And yet, it’s the
institutional aspect of it that limits it. – Great, I just wanna remind. So we do have a couple of
microphones set up on both sides of the room. If anyone does have a
question for the panel, please head on up. We do have a few
minutes for that. But in the meantime, I’ll
pick up on the discussion, the points that you all have
just made by what might make these efforts for
interoperability and
bring the ecosystem together more, sustainable
or more self-sustaining. And wonder if, you know,
a lot of our focus today is on regulatory uses
of real-world data for real-world evidence. Are you all finding other
kinds of applications in the work that you’re doing
around quality improvement, safety, other activities within
healthcare organizations, around helping patients
maybe understand their disease course, around issues with payers? How reusable is
the infrastructure that you’re creating? – I can comment on that. I think there’s definitely
a lot of use cases because companies are looking
for natural history data to even design their trials, much less submit that
as a support to the FDA. We’re seeing
healthcare utilization become really interesting because for many of
these rare diseases there’s not even a claims
code, an ICD-10 code. And so looking into
the hospitalizations, looking into the utilization
of certain assistive devices and things like that
becomes really useful too. So those are two areas
that we see a lot. – I think as you have, money
tends to drive quality of data. I remember that when, you
know, back in the ’90s it didn’t matter what
the day supply field was because it was never audited. So it was a random
number generator of where the pharmacist’s
hand hit the keyboard, right? But once that became
an auditable field, like if you put
in a 30-day supply for a once-weekly Fosamax, that was grounds for
rejection of the claim because seven times four
is 28 and not 30, right? So I think as you
tie things to money and maybe as things shift
into the value-based designs and value-based frameworks in
order to adequately compensate for care delivered, the
integration of clinical data with claims data will
help to close the gap. And then because that
real-world gap closure happened, that data will then be available for high-quality
evidence generation. And I think the
whole system will, ’cause it’s not a learning
healthcare system. Every time I see learning
healthcare system, you gotta put a
parentheses, S, parentheses at the end of that
because there are hundreds of healthcare systems. – Well even within a
single healthcare system, I think there an average
of 18 different systems they need to integrate. – Adam, Kevin mentioned
this shift to value. And then from talking to some
of your colleagues at UC, they talk about the
systems movement into accountable care
provider approaches as being a driver too. Have you seen that have
an impact on the work that you’re doing? – In terms of? – Additional support, making
this more self-sustaining. I know that you’ve
gotten support for getting off the ground
from the CERSI programs, some of the steps like that. And just wondering if this
is, these kinds of efforts at some point are gonna
be more built into maybe not a learning
healthcare single system, but something that’s
more like that. – Yeah, so we’re starting off
at the Breast Cancer Center in terms of focusing on
the usability and testing in that environment. And then the hope
is to broaden it through the Athena
Breast Care Network which is already in the
five UC systems and Sanford. And then a big part
of it is considering for the I-SPY 2 trial,
consideration for
the next evolution of that having a real-world
data, real-world evidence arm for healthy controls
being actually, those that don’t
actually enroll, but are eligible, as the
healthy control arm being able to expand at the 18
sites there as well. But what we’re hoping
is to have the metrics and the usability
at the UC site first and demonstrate those
metrics of usability and to demonstrate
that this is a model that is effective that
can be more broadened. – And then expand it
out more widely, great. One of the things, and again,
anyone who has questions, I’ve got a lot, so
I can keep going. Oh, sorry please go ahead and
tell us who you are before you ask your question. – [Esther] Sure,
I’m Esther Bleicher and I was wondering,
listening to you about how there’s so much work about
the quality of real-world data and it seems like you’re
putting forth a lot of effort to making these
systems higher quality. Do you think that there
could be clinical trials using your data or
there wouldn’t need
to be a control arm because you would be able
to use your high-quality, reliable real-world evidence as that control arm? – And we’re gonna come back to
some of those kinds of issues later today when
we get to methods, but please go ahead. – I think in rare
disease it may be unique because in some
cases, it’s unethical to subject patients
to a control arm. And so in those instances,
it may be helpful, but it really depends
on that data quality and data relevancy. So it really depends
case-by-case. But I think there’s
certainly that opportunity, but we’re all
learning more about it from a regulatory landscape. – [Jenny] Hi, Jenny
Christian from IQVIA. And my comment really stems from some of the
earlier presentations. I think Peter challenged
us to think about the generalizability of trials and who they are
generalizable to. So I would say maybe we
need to be cautious there because some of I wanted
to share some work from a few weeks ago at the
Friends of Cancer Research where we were examining advanced
non-small cell lung cancer, looking at overall survival
in frontline therapies. And several of us have begun to apply inclusion,
exclusion criteria from a trial and found
that about 40% of people from the general population met those criteria. But your question is do
the results from the trial apply to the broader population? And will began to look
at overall survival, both at the broader
population which was far worse than the trial. And then in the 40%
sample it was much closer. And so maybe the
comment to you all is around generalizability
and to think about in a larger ecosystem
how that plays in. Thanks.
– Now you all are certainly thinking about generalizability with the breadth of patient
information, settings of care, and so forth that you’re
trying to collect. Any comments? – Well, I mean, you can’t
have generalizability without internal validity. So I think that there’s
the place for ensuring that you have high-quality
internal validity, so there always will be a
place to be living very close to the traditional RCT space. But then I think as we step
down the real-world data, real-world evidence space,
into the observational piece that’s going to, while
still attempting to maintain high-quality internal validity, you’re gonna begin to derive greater generalizability
over time. – It’s just hard to address
the generalizability question unless you take
care of the basics, like we’ve been talking
about in this session first. Just one more question
before we wrap up. Can you say a little
bit more about the role of patients in all of this? So Adrian emphasized
in his presentation the challenges from a patient
experience standpoint. Data curation and data quality
validation is hard work, but at least it’s not
on the patient directly for the most part. Although having a connection
and engaged patients helps you with follow-up
and validation, clearly, as you all emphasize. Any thoughts about how
important patients are or how much they could help with solving some
of these issues? – Yeah, absolutely. I think patients are, part
of it’s in the medical record which we can structure, but half if not more come from
the stories of the patients, at what point they
got diagnosed. We rely on them a lot. And when we talk about barriers and the bureaucracy
of health systems, by getting direct
consent from patients, you actually break
down those barriers. And that’s how we’re able
to just get information very quickly from all
different hospitals. In rare disease in particular,
it’s just impossible to integrate with thousands
of different facilities. Some of these
facilities don’t even have an electronic
health system. We’re getting paper boxes
in the mail occasionally. So it’s really important
to get patient input and marry that with what is in
the structured clinical data. – So Nancy has
kind of, is it B2C or it’s really C2B
business model? For those of you who are
more directly dealing with providers as
a source of data, is this gonna come together
with patients in some way? What would your thoughts be? – Well part of our Phase
2 project with CERSI is to look at the
patient-reported adverse events and look at alignment with
provider adverse events and see how much of this data, first off, at one
level, I don’t think it’ll completely replace that, but at least to see what are
the metrics of alignment? And then is there a point at which we can say
we’re confident, based on our data, that
patients can have much more of a direct source
of this information as opposed to relying
considerably on the provider having the final say
on their information. And also having it
in more real-time during the continuum of care as opposed to waiting
at those time points when their visits are occurring. – Great, well I wanna
thank all of you for a really
terrific discussion. You know, this is
a foundational area for doing effective, developing effective
real-world evidence. It is not easy, but
it’s really interesting to see the arc that’s
developing here for improving the quality of real-world
data, the reliability, the methods for doing so and helping to develop
that ecosystem. Thank you all very much. – Thank you.
– Thank you. (audience applauds) – Now I’m gonna make an
executive decision here. It’s about 10:34. We’re scheduled to
start again at 10:50. I’m gonna make that five
minutes earlier, 10:45, giving you all five
minutes more for lunch. So I hope people will
go along with that. But please try to be
back here at 10:45 and next panel up here at 10:45. Thank you. (audience chattering) Start the second session now. So if you could head
back to your seats. Those of you who are
joining us online, thanks again for
being with us today. We are going to
continue our focus on the importance
of data curation as part of a data ecosystem that supports
real-world evidence with this session two on
Curating and Assessing Fit For Use Real-World
Data Derived Electronic Health Records. So this is gonna give us a
chance to into more detail around the key practices
and emerging insights from a range of perspectives
on how to improve and evaluate the reliability
of real-world data for regulatory decision-making, especially around
electronic health records. This, as you’ve heard already, this work, improving reliability
requires careful thinking about how to curate data,
the steps taken to clean, to modify, to
transform, to track data to assure that it
is fit-for-purpose for regulatory uses. And again, we’re gonna focus
on electronic health records to illustrate these points. The following session, next
session is gonna discuss another source of
real-world data, and that’s data generated
by patients themselves. So we’ll get to that
in the next session. Claims data are also
an important source
of real-world data, but it’s a source
that, as you’ve heard, the stakeholder community
has a lot more experience with to date. So today, we are focusing on
developing and quality checking clinical data from
electronic records and patient-generated
health data. But you’ll also be hearing
about opportunities to link real-world data
sources, including claims, which will continue to
be an important aspect of real-world data for
regulatory purposes. So let me introduce our
four speakers real quickly. Keith Marsolo, the
associate professor in the Department of
Population Health Sciences at Duke University is gonna give the initial presentation here. Then we’re gonna have
a discussion that
includes Jeff Brown, associate professor
in the Department of
Population Medicine at Harvard Medical School and at Harvard Pilgrim
Health Care Institute, Kris Kahler, who’s
executive director and global head of
the Outcomes, Evidence and Analytics Group for
Real-World Evidence at Novartis, Shaun Grannis who’s director of the Regenstrief Center for
Biomedical and Informatics, Clement McDonald, scholar for
Biomedical and Informatics and associate professor
of Family Medicine at Indiana University
School of Medicine, and then Bob Ball
who’s deputy director of the Office of
Surveillance and Epidemiology at the Center for Drug
Evaluation Research at FDA. So let’s start with Keith for
his lead-off presentation. Thank you. – Great, so thanks and thanks
for having me here today. So I’m gonna talk about as
Mark mentioned, EHR data. So the examples
that I’m gonna give are gonna be taken from a
network that Adrian mentioned earlier today called PCORnet
which is sort of similar in sort of spirit and
philosophy to Sentinel in terms of the approach
where we’re taking data from centers around the country. They’re putting in
a common data model where we curate it and
then we run queries. This is sort of the portrait
version of the picture that Kevin showed in landscape
in his talk last session. So we do use a two-stage process to assess the data quality. And in the first stage is really what we call
foundational curation, to establish a baseline level of research readiness. That can also be thought of
as sort the minimum necessary. And then we do
study-specific to ensure it’s fit-for-purpose
for a given question. But I think one of the things
that I’ll talk about today is really that this foundational curation
process is not static. We view it as a
continuous learning cycle. Continuous in that
we’re continuously
assessing performance. And then the other thing is
that we’re trying to close the gap between foundational
and study-specific curation so that we can add new data
checks based on study findings so we don’t have to
continually, cleaning it up at the study level. And so I’ve sort of highlighted
some of the key things that I’m gonna
talk about in bold. That wasn’t just
random formatting. There was a method to that. So why are we doing
foundational curation? I think for folks that
work with trial data or maybe claims data that
have been dealt with for years and years, you
know, you may wonder why do we need to do this? And so we heard
in the last panel that EHR data are messy. I think we all realize that. So we’re harmonizing
or standardizing these data for the first time. And so given the volume that
we’re having to deal with, it can be a little
bit overwhelming to try to tackle
both things at once. And so the graph
that you see here are essentially lab
data within PCORnet for each of the cycles. So within PCORnet, we ask the
partners to refresh the data every quarter and basically
every two quarters or every half-year, we sort
of group it into a cycle where they’re running
essentially the
same data checks. And so what you can see are
basically the growth of lab data over time within the network. The bars represent
the volume of results. And then the line is
essentially the number, median number of unique
LOINC codes that we see within the network. And so just a note about
the first bar there or the first cycle is we
were using the Sentinel model or a variant of the Sentinel
model in our initial lab data where we had an enumerated
list of lab tests. And so there was
only a few things that we could have in there. And so we relaxed that
with the later cycles which is why you see
the explosive growth. I think that the thing to take
home is we’ve got lab data that never was assigned
a LOINC code before. And you can see now,
we’re up to close to 10 billion lab results. So it’s hard enough
to just make sure that the LOINC code is correct and the values look
like they should, let alone, does it make sense for the population in question? And so we still have to do that. So this is another
example of study-specific. And so if we’re gonna run
a study on a population, do the variables, you
know, are they there? Are there issues with the data or is it sort of normal
practice variation? And so this is heart
failure patients. Looking at EGFR values, you could see there’s
a nice distribution of the percentage of
patients within the cohort who have a value associated
with their record. It’s a question of
should there really be a nice distribution like that or should it be much
closer to the top? And so you can see at the end, on the right side of the figure, there’s a bunch of data marts that haven’t loaded data or they don’t have any values which you can figure
is because they haven’t loaded those results. But the ones that are sort
of more towards the middle where you’ve got between five
and 60% of the population with a lab value associated
with their record, is that normal? Is that standard practice or is there something
else going on there? So trying to answer
this type of question, if we go back to
the other example where we’re just looking at
1,500 different LOINC codes and 10 billion results,
trying to dig into that can be pretty complicated
for all the different things you wanna think about. So that’s why we still do
study-specific curation. Now last week, the Duke-Margolis
team released a white paper talking about essentially
can we start thinking about like a minimum set
of necessary data checks that we would wanna
apply to the data. And I certainly agree
that that’s something we should try and do. I think it’s a little bit
difficult in the sense that you wanna make sure that
you align your data checks with the purpose that you’re trying
to accomplish, you know, making sure the
data gonna be confirmatory, you know are they gonna serve as standalone
endpoints or outcomes? And I think the importance
is if the minimum threshold can’t be met,
what are you gonna use that dataset for? And so the example here, so these are some
of the data checks that we have within PCORnet. And so what you can see are
the different data checks and essentially the
percentage of data marts that are passing those checks. So the base check
that we have for labs are more than 80% of lab
results mapped to LOINC. And then you go down, do the 80% of labs that
are mapped to LOINC, do they have a lab result? And then do they specify
the normal range? Do they specify the source
and the result unit? And so you may ask
why we grouped things the way we did and there was
probably a reason at the time, but essentially,
enumerating all the things that you would want in
a complete lab value. Now I’ve gotten into lots
of arguments with folks in the network around, in particular specimen source, like why do we need
specimen source? It’s not useful. You’re never gonna
use it for anything. We actually needed it,
we’re gonna make sure that the LOINC code
that they assigned to the lab test is correct. The things that you need are,
you need the result units to make sure that
the result works and you need the specimen source to make sure that if
it’s a urine test, they don’t tell you that
there’s a blood sample, right? That’s a potential problem. And so what you can see is
what we tried to do in PCORnet, the figure, which would take
me 15 minutes to explain is essentially trying
to look at distributions of lab values
across the network. And we try to flag outliers. And so without having
the result units. So you could see at the bottom in sort of the highlighted area is a data mart that
has lab values. It’s supposed to be a percentage and the median is,
it’s about 35%. And the values that
they have are like .3 and .4 for their median. So you can think that
it’s probably a unit issue where basically
they divided by 100, but it might not
actually be that and so having those additional
fields are important for us to understand what
the data quality are. So I think again,
when we’re talking about the what’s the
minimum necessary, where you put that could
potentially disqualify people so I think it’s important
to just keep in mind. So the other sort of things
that I had highlighted, whereas you know, curation
as a learning process. And so this is the way
that we have been able to expand things over time. And so this was something that
came out of adaptable trial. If we refresh the data, and that’s great, then
there’s a date of refresh, but then the question becomes
sort of how up to date or how current are the data
in that particular refresh? And you can see in
the example there where there’s gonna
be some visits, and then there’s a gap. And does that mean
there’s no events or does that mean
there’s no data? And so what we ended up
trying to do is essentially create a calculation
that basically looked at a two-year period and then the first year
to try to establish the volume of expected records within the data. And then from
there, we could tell like what’s the month with
the first with any data and then what’s the first
month with what we would say is complete data about 75%
or so of the benchmark. And then so that
can be a data check and we can use that
in our assessments. But then I think the other
thing to keep in mind is that this needs
to be continuous. And so these are some
results from January. And so this is showing the
variation by data mart, the latency for each
data mart is a bar. And most of ’em are
doing pretty good and then there’s some
that are not so great and they’ve got some
room for improvement. But the figure on
the right is actually for three data marts, showing their latency
values over time. And so you can see one
is doing pretty well. It’s pretty consistent
around two months. The other one
started around eight, and then they’ve
gotten to three. And then the one on
the far right has got, you know, it goes
from four to six, down to four to seven to four. And so I think this is sort
of the plug for data curation as continuous process
because if you just look at a point in time just once and then you think you’re done, you’re gonna get potentially
some wrong results in whatever type of analysis
that you need to do. So just sort of summarizing,
you know, data curation, it’s viewed as a process for
continuous quality improvement. We may not end up
with a single set of minimum necessary
data checks, and so maybe we need
a tiered approach. I think we’re all still
pretty new at this. And so I think as we develop
these best practices, we’re gonna have
to figure out ways that we can share
methods and results. But I think sort of the
final take-home is maybe, and Mark alluded to
this at the beginning is administrative claims
are sort of a known devil. We’ve spent years
understanding what’s good and bad about those. It’s gonna take
us time to develop that knowledge around
electronic health records. – Thank you very much, Keith. (audience applauds) Our panelists are gonna
give their comments from their chairs,
starting with Jeff. – So thanks Keith, so
thanks Mark and the team for inviting me. I love data quality stuff and I was wondering why. I think it’s something
because it’s a mix between science and art. And I promised myself
I wouldn’t say this, but you know what? In the early days, a
lot of data quality was, it looks funny to me. And that was the assessment. (audience laughs) And that’s not a great metric. Unfortunately, we’re still
there in a lot of the ways we do this. So I’m gonna try to frame
most of my comments towards what we should be doing. And Keith and Kevin
before kind of showed what we do now. And I think we know how to
make some advances there. I’m trying to think through
what the next generation should look like
is how I’m trying to, I need glasses to do this. So Keith mentioned data quality
being a continuous process. And that’s certainly true. I mean, the stories we
can tell of data sites, datasets that looked great and then after the 35th
refresh, it looks funny. These things happen. That’s just the nature
of the healthcare system. I think it’s also a continuous
process within each study. So the way we do
our research really, it’s a continuum of
your checking the data as it move from, well,
I’ll just make it up, source to a data model. You’re checking it again when
you do your cohort extraction. Does this cohort make any sense? It says mostly women,
does that make sense? Yes, it makes sense, next one. And then you’re
actually doing it, in the output of the study, if you certainly look at, I
think most researchers do this. We’ve become more standardized
I think in Sentinel because everything’s public
and I think a lot of folks are doing this now,
is most of the output, if you look at a report
on the Sentinel website, most of that output is
data checking output. We won’t call it that. We’ll describe it, it’s
rates by age and sex and year and some trends. But you have to go, well
actually we did this on purpose. Now I’m having this flashback. Very early in
Sentinel, we on purpose didn’t put the
results until tab six. So you had to go through
all of the other data before you got the number
you really were looking for which was some rate, right? But we wouldn’t let
you see the rate. You had to go all the way
to the end of the table or all the way to the end of
that file until you saw rate. And we did that because we
wanted people to understand the data that got
put into that rate. So if I’m thinking about did a quality, I guess I came up with two
kind of critical needs. It’s possible I just
came up with them this morning, so we’ll see. But if we wanna now build
a systematic approach, that’s what I was
trying to say earlier. If we’re moving from
a little bit ad hoc, and has a lot of people’s heads and some system to a
fully systematic approach, two of the critical
needs will be defining the actual metrics, what are we actually checking? So defining the
data quality metric. And then, this is kind
of a newer revelation that wasn’t from me, defining the expectation
of that metric. We expect it to be zero, one? What do we expect to see because having a
metric without knowing how to assess it
isn’t fully helpful. So that’s my thinking
about the future is working on those two pieces. And that’s something
the community can do together. It’s data model
agnostic, it’s EHR and claims and
registry agnostic. You can really develop metrics. You can develop the metric and then you have
to do it in detail and then describe what you
would expect out of that metric. We can start making
some progress. Most of that knowledge
is now in people’s heads. And I think that’s
part of the problem. Maybe a quick, I’ll
describe some of the work that we’ve just
finishing up with FDA, there’ll be a bunch of acronyms. This is a PCOR Trust Fund study
with ASPE now through FDA. But we called it the Database
Fingerprinting Project. Again, that wasn’t my
terminology, but I like it. And what we were doing is
developing a software package that allows people to
author a metric in detail. What do you mean? Per person per month is
actually meaningless. You have to be a little
more specific than that. So to author the metric and then a data quality
common data model. Huh, that’s agnostic to
any particular data model, PCORnet, Sentinel, OMOP,
doesn’t really matter. But is there a
data quality model that we can develop with metrics so that we can compare
across systems? So these are all things that
we can do as a community that I think is probably
worth undertaking. So in my last minute
or so, I think, and I’ll just
reflect on some EHR, kind of experience with EHR. And it goes back,
I mean I worked with Vaccine Safety Datalink and the Health Care
Systems Research Network and then Sentinel and
PCORnet and now others. The EHR data are just messy. I think everyone
agrees they’re messy. They’re obviously not
collected for research, but they’re not
standardized in anyway, unless it’s the data
for reimbursement. Those get standardized. There’s good reason for that. So you’re sitting in
a pretty messy space. Inside a single site,
you can make it work. When you go multi-site,
it gets pretty complicated because something’s
perfectly fine in site A and perfectly fine at
site B are different. So now you can’t
use them together which is kind of what we see. Clearly erroneous data
will never be fixed. People will have blood
pressures of 12,000 over 8,000. No one has any interest
in ever changing it. So the next refresh is gonna
have that same blood pressure, even if you change in the data. ‘Cause there’s no reason
to ever change it. So you’ll see all of this mess. Kevin will always be five-foot
at some point in his history. (audience laughs) He will. What he didn’t mention
is when his doc came in, he expected to be
talking about obesity because his BMI was way off. And that’s what happens. So we can probably
just leave it there. I think there’s a really
ripe field for this. And we can learn a lot from
what claims were doing, but the EHRs are
a different beast. And we have to figure this
out in order to do the work that we need done. – Thanks very much, Jeff. (audience applauds) Next is Kris. – Great, it’s a real pleasure to be here, part of the panel. I also had the
opportunity to participate in the work group that helped
prepare the white paper on data reliability. Which was actually a very
interesting experience, but also a bit
eye-opening for me because most of my experience
over the last 20-plus years has been working
with real-world data, but largely claims data. And only recently
starting to work with electronic
health record data. That experience has, in terms of being able
to really understand how important it is
that we get this right, it’s probably gonna be a
theme for the entire day that we make sure
we go out of our way to get this right in
terms of the data quality and data reliability. And I can’t support that anymore than I think anyone
else here in the room. But I do have some reservations. And some of the
reservations are around how do we do this
most efficiently? Mark, you asked the
question in the first panel, you know, this seems like
it’s very difficult to do. I think it probably is
very difficult to do, but I’m concerned that this could
potentially overburden our process for generating
real-world evidence. Which at its core,
one of the goals of using real-world evidence
was to make evidence generation as a whole more efficient. So my experience really isn’t
in the data curation process as Keith had mentioned, particularly around
the foundational
aspects of curation. I bring the perspective
here more as a researcher who needs to access data, needs to access some level
of quality or reliability of that data, and then ultimately
design a study that ideally is gonna provide
some meaningful information to help inform some decisions. So with that as a backdrop, I guess two main points
that I’d like to cover. One is related to transparency
and accountability around this whole process. So from an accountability
perspective, as a sponsor, if we’re
submitting real-world evidence to the FDA for decision-making, ultimately, we’re the ones
who are gonna be accountable to make sure all the
information in that submission is of high-quality as possible. Now unfortunately,
our experience, and I’m not gonna go into
too many details on this, but we certainly have instances
where this information from data holders
or data aggregators is actually quite
difficult to obtain, to really understand what
happened to each individual data element before it got into
this research-ready dataset that we now need to work with. And not only is it
difficult to obtain, but in some cases, it’s
been described to us as potentially even proprietary. So we can’t get access
to this information. So that, I see as a problem. So I think there is some
level of transparency, ultimately, that’s
needed for us to ensure that those of us that are doing
research with these datasets can feel confident with the
data that we’re working with to be able to draw
meaningful conclusions. Now ultimately, this
level of transparency between the data
holders, the researchers and ultimately the
decision-makers, I think is absolutely
necessary to ensure that we have a collaborative
research environment for drug development. And then so the second point
I wanna raise is around how much is enough? I mean, if we look
in the literature, we can see that
there are thousands of verification checks
that can be done on data. And we heard already from
Keith that this is not a one and done kind of opportunity. This is something that needs
to be done continuously. So now we’ve got the
dimension of thousands of verification checks. And now we’ve got the
duration, how often do we need to do this? And then another parameter
is when we start to think about the variables
that we actually use for a research study. So if you can think about
the example of you know, just an exposure in outcome. You know, clearly, I
think we would all agree, we need to get the
exposure and outcome right, absolutely, for a study. Question is, what about
inclusion, exclusion criteria? How much validation do
we need to do for that? There’s a recent study
published showing that it actually does
matter quite a bit to get the inclusion,
exclusion criteria right. But then what about
confounder variables? When we look at some of these
high-dimensional approaches, it’s quite possible you
can get into the thousands of variables just in a
propensity score model. So ultimately, I think it’s
the amount of information that’s in these databases, the complexity, particularly
if we start to bring multiple databases together, as well the number of
variables that could be used in a dataset that really may
beg the question, you know, again,
how much is enough? And will the FDA ultimately
be ready to review this volume of information
that may be coming their way? So I will pause there
and pass it over. – Thanks, Chris. (audience applauds) Great discussion which
we’re having, so Shaun? – Great, well thank you. It’s wonderful to be here. These lights are
incredibly bright, so you all should come up
and here and stare out. ‘Cause see, I think I see
some people out there. So I’m going to be talking
about identity management, patient matching,
record linkage. There’s lots of
different names for that. And I’m gonna be talking
about it in the context of our experience in Indiana, our experience in global
health where we’re working with different countries to
establish identity management. I’m gonna talk to you
about it in the context of the hundreds of
NIH-funded clinical trials that we’ve done using real-world electronic
health record data. Before real-world became
popular, I did a study. I didn’t realize
this, my first paper, and in about 70%
of all of my papers have the world real-world in it, words, real-world in it. So this is a topic near
and dear to my heart. Patient matching,
let’s talk about that. I do wanna talk to you
about the data quality, data model, that’s fascinating. And I think there’s
a there there. the United States is the
last developed country in the world without
a national patient ID. We are not going
to have a national patient ID any time soon. So patient matching techniques, processes, rules, metrics
need to be developed. In a fully robust
healthcare ecosystem, we standardize the physician. We have an MPI in this country. We standardize facilities. We facility codes. We know where patient
care was delivered. We standardize what kind
of care was delivered through clinical vocabularies, ICD, CPT, SNOMED,
LOINC, et cetera. That’s three of the four. We need to standardize
the patient. And we don’t do that yet. There is a gaping hole in our healthcare ecosystem right now, and that gaping hole is
around identity management. Now I can regale you
with the policy reasons why we’re not there. It has to do with
congressional funding. But congressional funding
has been a barrier, why? Because identity
management, patient matching is not just a technical problem. In fact, none of the
things we’re talking about here today
fundamentally, at their roots, are about technology. They’re about the
people underneath it and what value, what
meaning they apply to that. Identity is an incredibly, it is a lightening rod topic. Identity is linked to privacy, it’s linked to autonomy, it’s
linked to a lot of things, and so people care
very much around that. So it’s a complicated topic. Where are we today? We don’t have a
national patient ID. Everyone in the
country takes the risk and costs of creating
identities for their systems, whether you’re a clinic, whether you’re a large,
integrated delivery network, whether you are a
PCORnet research network, everybody’s doing
identity these days. There aren’t many standards
around identity today. So if you remember one
thing about my talk, remember that we need
establish standards, common practices around
identity management because we’re not gonna have that unique patient
ID anytime soon. Whenever I go to a conference
and here somebody call for a new set of
standards, I roll my eyes because the great
things about standards is there’s so many choose from. And another great, standards
are also like toothbrushes. Everybody has one, nobody
wants to use anybody else’s. (audience laughs) And so we need to, the place
that I’ve seen standards become successful is
when a group of people come together and recognize
the value in doing so. So we need to do establish
standards around process. So right now, I
couldn’t generate Keith’s graph of percentages. I couldn’t generate how
complete my dataset is. I can’t tell you
in my population, how accurate is my
matching algorithm. People don’t publish that. It’s hard to know. It differs in populations. Newborns are challenging. Adults are different. Public health data’s different. Linking in claims data
into clinical data introduces different
performance characteristics. We haven’t established
common metrics. It’s not hard, there’s only
about four or five metrics for patient matching, and they’re the common ones
that we all know about. But we also, in order
to improve matching, and we know there’s gaps,
we know there’s problems in the accuracy of
patient matching, we recently did a study. I call it measuring the
mass of an electron. Before we knew the
mass of an electron, we knew an electron had mass, we just didn’t know what it was. We did a study
recently where we said, “Hmm, if we improve the quality
of data used for matching, “we should see an improvement
in the accuracy of matching.” Makes sense. I don’t think anybody would
challenge that notion. Nobody had ever done it before. Nobody had ever measured
the mass of this electron, how accurate patient
matching was. We did a comprehensive study
within our information exchange that has hundreds of
different identity sources from lots of different
types of places. And we looked at standardizing different combinations of
our matching variables, the typical name, date
of birth, et cetera. What we found was that
standardizing address which can be a highly
variable field, and standardizing
last name improved certain performance metrics. It improved overall accuracy and it approved overall
sensitivity by 10
percentage points. That’s a big a deal. We didn’t know how
much juice there was going to be in that squeeze,
but that’s important. So now we’re working
with organizations like the Pew Charitable
Trusts and other, ONC, to think about how
can we introduce some sort of expectation
around standardizing, cleaning up our
data for matching? We’re at the beginning
of that journey. But keep in mind, and
I’ll leave it here, we have a gaping hole in our
electronic healthcare ecosystem around patient
matching and identity. And we’ve gotta start
going down that road to improve that process. So there’s lots of other things, and maybe we’ll take
some questions in
the discussion phase, but I’ll end it there. Thank you. – Great, thanks Shaun. (audience applauds) Bob? – Thanks for having
me on the panel. So we’ve heard a lot
about the challenges with quality in EHR data. And I just wanna begin my
comments by talking about the value that FDA has gotten
through the quality process that Sentinel put in
place for the claims data. Just as a reminder that
this is worth the effort even if it is a major effort. So FDA is now able to
answer numerous questions simultaneously in a matter
of weeks and months, rather than the
years it took before this system was put in place. But this ability relies
on a very sophisticated data quality assurance program with more than 1,400
data quality checks that examine completeness,
validity, accuracy, integrity and
consistency of the data. So it isn’t a small
lift, it’s a big lift. And this is done at
every data partner and it’s a vast operation. But what has FDA
gotten out of this? So at a very high level,
having this ability to continuously monitor the safety of products
is core to FDA’s mission. And FDA done hundreds
of analyses over the last few years alone. And for example, the systems
provided important information about the use of opioids. It has contributed to numerous advisory
committee discussions and has, FDA’s found it feasible to study more than 18
different safety issues in the Sentinel system that
would have otherwise resulted in industry-required
post-marketing studies. So in some, what’s been done
with claims data over the 10 years of the
Sentinel project is in a way, where
we’re starting. And maybe we’re starting
a little bit further along the line for EHR data, but it’s worth the effort. And just to echo, I think,
a few of the other points that have been already
made about EHR data, in terms of how FDA is
thinking about this. So FDA is working on identifying
the relative standards and methodologies for
collection analysis of EHR data, as well as for submission
and regulatory review. The key with respect to EHR data that Shaun was mentioning, I think, in a way is going to
be a lot of the data linkage, particularly between
claims and EHR data and how this is done. Taking a little bit
more of a, I suppose, a regulatory
perspective on this, it’s already been mentioned
that in a prior presentations that audit trails and transparency are key. And I think FDA is very
much in line with that. And I think there’s
an expectation that documentation of
data accrual, curation, transformation and
quality controls will be essential
for the use of EHR and claims data for
regulatory purposes. And I just wanna, I guess, maybe in a sense put a plug
in for the afternoon session on validation because
I think validation, while we usually think
about it in terms of whether the higher level
of features of a study, for example, the
exposure or the outcome. It also provides an extra
level of quality assurance. It’s particularly in
a context of EHR data. As we take the elements,
the clinical elements that are in the EHR and
combine them to, for example, identify certain outcomes. And we might wanna
think about validation in that context as well. That’s all, I’ll stop there. – Thank you. Thanks. All right. (audience applauds) A good bit to discuss. I think several
of you highlighted that for electronic
health records, we’re in the early
stages it seems like, around all of these
issues of data quality and reliability,
validation checks. And Kris said that, I think a statement that
all of you would agree with, that a high level of data
quality and reliability is critical for use of this
data for regulatory purposes, but we also need
to try to get there as efficiently as possible. And I’m wondering if we
could start off by building on some of the comments
that you all made about where we are
in this journey and perhaps how it
might be accelerated. As Jeff said, sort of
moving from art to science and maybe not all of that far
down the road of science yet. The Duke-Margolis paper
that we talked about, that several of you mentioned
about data standards and standard approaches to
these data quality issues envisions moving beyond
case by case uses. But it does seem
like we’re largely at kind of the fit-for-purpose at the individual
study level now in uses of EHRs. And I wondering if any
of you wanted to comment on your sense of where we are and what might help us get
to a more systematic approach as I’m struck by Bob’s
comment about how Sentinel, well, I remember the early days. Were there some early use
cases and fit-for-purpose was very much on
a one-off basis. But now for at
least a wide range of real-world evidence questions
that claims data network is used pretty much routinely with a pretty
reasonably high degree of certainty to understanding
of what kinds of analyses and conclusions that
the data will support based on the curation that’s
gone on in the network over the past decade. So any thoughts about, you
know, is that a good way to think about where
we are in this process for electronic health records and how might we
accelerate progress? – So just picking up on
the Sentinel example. So we have a couple of projects. Which really, I
could think in a way are these case-by-case projects, but they’re really
an attempt to begin to lay a foundation for
how one might generalize. So I’ll just talk
about two of them. To Kris’s point earlier
about like how important is it to have good
quality on confounders. Two of the biggest
confounders in many studies are BMI and smoking. But those data points
are notoriously
absent in claims data. So we have a project where
we’re trying to compare what types of, the amount
of BMI and smoking data that’s available
in PCORnet sites compared to what’s
available in Sentinel for similar cohorts. How does that turn into
a generalizable project? Well one might think
that the starting point for saying, well which
are the features, which are the
confounders that we have to have high quality on? Well we think it’s smoking
and BMI for starters, but maybe there’s others. And the lessons learned
first that can be applied. And similarly on
outcomes, complex outcomes have always been
challenging in claims data. And anaphylaxis is one that
comes up often in drug safety. So we’ve done several
projects where we’re trying to get
information from EHRs. And we started out
at doing some pilots, but now we have a
project which is trying to apply NLP and machine
learning approaches, but also to build
into that particular, just for that
particular example, a generalizable framework for how one might
apply anaphylaxis, I’m sorry, NLP and
machine learning for complex,
clinical conditions. So we intend to
take that framework and apply it to pancreatitis
and in multiple sites to address the issues of
cross-site variations. So there’s some sort of
starting point ideas. – [Mark] Foundations, yeah. – The other thing, so I
do think that we can use what’s gone before to build that corpus of
what’s good enough. I think the other thing is
we have to not be afraid of setting a bar that
we can’t meet right now. I think there’s a tendency if you’re a part of a network, I think this is academically, we all wanna join a network and then we all wanna look
pretty good in that network. Like we all wanna be woebegone. And so I think we
have to realize the level of quality that
we need for decision-making, not everybody’s gonna clear
that bar and that’s okay. And there’s ways that
can mitigate that and improve over time. But it’s just that this is tough and so I think we just
have to acknowledge that because otherwise, we’re
just never gonna improve. We’re gonna continually
have that low bar that everybody walks over which
is probably not good enough. – So strategically, if
I think about where I am because I wanna
go somewhere else. And so that future where
we have all of the data completely there and accurate, that’s the vision,
that’s the hope. With my faculty in informatics
and health services research, I tell them to plan your career. Recognize that tomorrow, you are not going
to have less data, you are going to have more. So this problem of
smoking and BMI over time, because of a lot of pressures,
is going to improve. But then the question is not,
okay, well just let it happen. The question that I often
frame this as is okay, how are we gonna
get more and better? Well, we’re going to
share best practices with what we’re
doing now, right? And do as well as we
can with existing stuff. But then there is, what are
the future opportunities in this space, building off of where we are? We talked about the
patients already. The patients, I believe,
are going to be huge in the future of
completing this data, ensuring accuracy of this data. So that’s one area where we
haven’t touched on a lot yet, but I think we’re going
to touch on a lot more. Healthcare data’s becoming
increasingly standardized because we continue to
have standards meetings and standards conferences
over and over again. You know, we argue
over curly brackets and straight brackets. And eventually, those
arguments die down and I think they get
answered through fatigue and people wearing out. But the data is becoming
more standardized. This is going to be
a gradual process. But I do think the
two active things that I think about is
within the existing system, how can I do better, how
can I learn from others and what should I be
thinking about and planning for in the future? – Yeah, I mean, I think I’ll just add since I was there in the
beginning of Sentinel with the five of us, I suppose. We focused a lot on the
things we knew we could do and then tried to
build from there. So claims data have been
used since the ’70s. You know, there was a paper
in science in 1973, I think. So we knew what we could do well and then you work
on extending that. And that’s, FDA’s
been funding that and we’ve been building out. So I think when you’re
thinking about EHR data, what’s your value proposition? What kinds of things
can you do well? You know, claims data, well, again I
think, claims data, I’ll just leave
it at claims data. We do acute outcomes well. Broken bones are easy. Stuff that gets in a
hospital, they’re easy. We don’t know about
more subtle outcomes that take longer to show up. We’re really good at outpatient
pharmacy dispensings. They’re audited, they’re
traceable, they’re beautiful. We really know how to do those. Procedure codes are really good because that’s
how we pay people. That’s why they’re so good. But then you start
to get into a realm where the claims
data can’t handle it so you have to build on it. I think EHR, the folks
who are using EHR data or trying to link,
it’s what can you do with this data source alone? What do you need to link to? And it’s possible that
we’re gonna have to start building domain by domain. Maybe we just gotta
start in oncology. So someone might solve oncology and someone might solve
something else over here and it might not all be
solved at the same time. We don’t need every
EHR from every person in the country to
answer most questions. So little specialized groups
might actually be a way of at least moving
forward quickly ’cause then you’ve got a
lot of people doing it. – Marc, and just reminding. I see a couple of people
with the microphones. We do have some
time for questions. So Marc, please go ahead. – [Marc] Marc Berger. I wanna come back to something
that Kris mentioned earlier which is that there’s
a lack of transparency among data aggregators about
how they do their curation. And I was part of
the working group that worked on the
Duke-Margolis paper as well. And one of the things that
I think needs to happen is let’s be real, data is dirty. It will always be dirty. It will get cleaner, it’s
never gonna be pristine. You’re gonna always have
to do some data curation. That’s never gonna go away. What we need to know, and if
the FDA’s gonna comfortable with this kind of data they have to be able to see
what was the curation process from the data origin, to its coding, to its translation, to its linkage. All that stuff. Not everybody
wants to read that, but it should be required that
the SOPs of data aggregators be available so you can see what is the ongoing
process of curation. You can’t look at
every single item, every single field
in a dataset and say, “Let me know what the metrics
are of how good this is.” You are concerned with
the most relevant metrics and key metrics, particularly when you’re talking fit-for-use. In which case, you
need to be able to go back to the
original source if you want to and
say is this field actually representing what
you think it’s measuring? That needs to be
in place as well. But we can never
get to a standard. And by the way, the
data’s being used already. It’s being used extensively
by the FDA already, but it’s being used
more by many other parts of the ecosystem, why? Because even though it’s dirty, it’s good enough to inform
many kinds of decisions. So if I was an outsider
looking at all these talks, then I’d say, “Oh, my god. “We got so much to
do, it can’t be done. “It will never get there. “It’s an asymptote. “Maybe I’ll get there in 2050.” We can get there a lot quicker, but the first step
is transparency. Just saying we’re gonna
share best practices, nobody does it. I mean, there is some of that
in the academic community. There’s some of
that with PCORnet and with the Sentinel network. But the amount of sharing
that’s done by the majority of data aggregators of
real-world evidence, they do think of it as
proprietary frequently. This needs to be available and it should be an
absolute requirement if you wanna submit an analysis for consideration by the FDA, that document
should be available so the FDA itself
can look at it. The FDA has a lot of
experience with Flatiron. And they have more
confidence with Flatiron because, I’m
asserting that anyway, because they have experience
working with them. They have experience
working with Sentinel. It’s that experience over time that gives decision
makers the confidence that the curation is
good enough to be used for certain decisions. – So Marc’s comment
about transparency, I saw some heads nodding. First, anything that you
all would like to add or clarify about
the best practices around transparency and data
providence and the like? In then second,
Marc’s other point. Anyone wanna comment on
what they see as early, pre-2050 opportunities for use of electronic
medical record data for these kinds of
real-world evidence studies? Some of them are happening
already, as you pointed out. – I really agree with
everything you’ve said. SOPs, I should show Marc. I wrote SOPs over here
before you were speaking. So I think it’s actually
incredibly important. So how will it happen? If the folks writing
the checks demand it or the folks who are
the final stakeholders, if it’s going to FDA, if FDA
demands it, it’ll happen. Or a farmer says, “I’m
not paying for the study “unless I get this,” then things move. And I think it’s
incredibly important to develop that SOP and
probably a core set of metrics that just describe the data. I think we probably
do a lot of it, but it’s not standardized,
it’s not route. So I’m, it’s like a checklist. The EMA has checklists
that you gotta go through before you
submit something. That kind of stuff we
actually have to get to or we’ll hide behind,
“It’s proprietary “or it’s hard to show you,
it’s really complicated.” And we to get past,
we absolutely have
to get past that. And we are using the data now. So I think it is a
warning to all of us who are a little
too deep in the data to have this pessimistic view because it’s so
messy, it’s so hard, but we use it and I think
we can do a lot with it. We just to be careful about it. – [Mark] I appreciate
the transparency. Other comments? – In the patient matching realm, just to go another layer deeper, I said that there are metrics
that could be documented. We believe that it’s crazy that nobody shares their
matching accuracy today. We publish ours. We have a 99.9% specificity and a 98% sensitivity. And we monitor that. We can’t find anybody
else in the country who publishes that today. So that’s overall, so there’s
different populations, newborns, we have a
much lower accuracy because they’re
really hard to match. So that sort of
transparency does not exist today in the identity space,
just as a sort of a vertical, to be thinking about. But I agree with everything
the gentleman said. And to the point about
how do we move levers, we want there to be regulations that every EHR vendor, we think
that this might make sense, every EHR vendor should
have address standardization as part of their functionality. Now that’s a very tiny sliver,
but it’ll make a difference. And so if we can
get organizations
to grease the skids to help people do
things the right way, that’s how we get
there over time. But we are using the data today. We just wanna make sure that
people use it appropriately. And I think we all have
our different processes. Jeff and I could
probably describe what we do in very similar ways, but we’re gonna
use different words and probably have
lumped and split in slightly different ways. And that’s part of the problem, if the FDA wants
to use this data, Jeff and I are gonna tell
them a different story that probably underneath
is highly similar. But again, there’s
not consistency yet. And over time, there will
become more consistency. – Thank you. Over here? – [Cindy] Yes, Cindy Garmin
from Serbs Consulting. Wanted to go back to Keith
and ask you to elaborate a little bit more on what
you mentioned in your talk about PCORnet making
the decision early on to have the curation
be study-specific. Because in my mind, I
don’t think that we can curate, validate, qualify
a particular database at any given point in time for every single field in there. I think it really depends
on the research question. So I agree with that, but I
wondered if you could elaborate as you know, part of
that infrastructure, how that decide
was made and why? – Yeah, I mean, I think it was probably just
made pragmatically and since we had to
get started somewhere. So I agree that we do need
to do this study-specific in the sense that
if we’re gonna look at a heart failure population, can we start to describe all those different characteristics. And it might be that
we get to the point where we’ve sort of got
all of this defined, and then that can just be
something that gets computed. But actually defining all that
those study-specific criteria for every specific study
is pretty complicated. And depending on the study, it may be good enough
or not good enough. And so really what we
did was to try and say are the data in the
data model good enough? Do they have enough
that we can start to do some of the
initial questions that
we might wanna ask? And then when we really wanna
leverage it for a study, we can go from there. And so I think that
was part of it. And then again,
the other thing is it was based on, we
had to be practical with who the network partners
were and everything else, I think there’s lots of
institutions around the country who have been using EHR data. And it’s just maybe
not good enough for all regulatory decisions,
but they’re using it to identify patients and
all that kind of stuff. Trying to move people forward
a little bit at a time. And then, again, as we
start to get more studies, we can raise the bar a
little bit faster, right? So I think that was the
thinking behind the approach. – Picking upon
your study-specific fitness-for-purpose approach, I get that it’s
study-specific at this point, given all of the
issues that may relate to an individual question. Is it your sense, or is
it too early to tell, is that study-specific
fit-for-purpose
assessment getting any easier by the
experiences across– – It will, yeah. And so that’s the
other key point. I think is that it might be that the amount
of study-specific gets smaller as we go because again,
study-specific for adaptable included the latency check
where we needed to see, again, if we’re gonna use the
data to look at endpoints, are there any endpoints? And so now we have
that particular metric so you don’t have to
do that from scratch. You still may wanna look at it within your
certain population. And so again, as we define it, I think what we’re really
trying to do is close the gap. And then if we can get to
the point where we don’t need to do study-specific,
I think this is to Jeff’s point actually, if we said we were gonna
focus on a specific domain and just analyze
the heck out of it, then maybe it is good enough. But again, when you’re working
with a really broad network with a bunch of broad interests, you just kind of have
to start somewhere. – Did you wanna add
something, Kris? – Yeah, I just
wanted to comment. I think there is an opportunity
for pre-certification. And this is a topic
that was posted in the white paper as well. Clearly, there are some aspects that need to be
done study-specific, but I think there are some
things that can be done that are non study-specific. Specifically, the
transparency part. I mean, if we can identify
as part of pre-certification what needs to be reported
and how it needs be reported and make that available and have some
transparency around that, I think that would
go a long way. – [Mark] Great, thank you. Over here? – [Matthew] Hi there. Hi there, Matthew
Struck from SCIAPPS. I love a lot of themes
coming up here around this is a journey over time, how we can look to
things like claims data. How claims data, we started, we’ve been working
on that for decades and this use of EHR data is just kind of in
the earliest days when you look at the
history of claims data. And some of these themes
around data quality models and transparency
around this stuff. So my question to you is, kind of pairing those
two themes together, are there some
particularly effective or useful learnings and models
from the claims data world that we might be able to
use as a jumping off point to model some of these quality
models or quality metrics or ways we talk about quality when we start either
linking these together, or taking the EHR world? ‘Cause like you said, we’ve
got decades of experience in talking about
quality and claims data. – So besides transparency,
are there a key set of best practice dimensions
that can carry over from the claims data work? – So I’ll keep it
short, yeah, absolutely. You know, there are a
couple little modifications where in the claims, I’ve got a denominator and
the EHR, it’s is harder to find that denominator,
but you can do it. So I think that most
of them map directly. You trend over
time, distributions. It’s not actually
that complicated. You gotta do a lot of it and you have to figure
out what it means. But if it’s numeric,
you get a range and you do all your univariates. And if it’s categorical,
you do a distribution. And if it’s calendar,
you do it over time and there’s subtlety to it, but I think everything
that we’ve done. Look, Sentinel copied from HCSRN and then PCORnet. The PCORnet checks were
based on Sentinel checks, modified for EHRs. And then they went off and did
a lot more interesting work because it was EHR focused. So absolutely. – Great, great. So I see four more people
at the microphones. I wanna get through
all of these in like the next eight minutes. So we’ll probably go quickly. – [Juliana] I’ll
do my best, thanks. I’m Juliana Kohler
from the US Agency for International Development. And I was very excited
by Shaun’s talk because I just got
back from Swaziland where I was deduplicating data. And if you think it’s awesome
to deduplicate data here, just imagine doing
it where no one knows what their birthday is. – See, Shaun? – [Juliana] It looks like
he’s had this experience. So my question for you is, and I’m trying to make something that’s reasonably useful
for the entire group. You talked a lot about the use of a universal patient
ID and the lack of one in the US. And you talked about a
couple of opportunities, particularly with address
with last name standardization for how to deal
with standardization
outside of a UPID. And it seems to me, based
on my stadium seating, third level row of watching
what’s happened with the UK and with Europe, it feels like we
pin a lot on UPIDs and we feel like it’ll
be an easy button. And I suspect it’s
not an easy button. So I’m interested
in your thoughts about what the
other opportunities outside of the UPID are? What can we do regardless
of the presence or absence of a UPID in order to
improve patient matching? – Sure, so absent a
universal patient ID, where I think the
strategy is going to go is improving what we
are all doing together. I think that there, items likes data standardization of the individual
fields can help, but to just cut
to the chase here, there’s projects like
CommonWell and Carequality where data is being
integrated across multiple EHRs today for
patient care purposes. They’re in the early days. The systems aren’t perfect and there’s opportunity
for improvement, but the framework there is that bring all of
your identifiers, whether it’s a driver’s
license number, whether it’s a social
security number. Anything that a
given source declares is authoritative and
unique will use that. And so I think that
there’s opportunity to begin aggregating
regional identity into a larger network
in the absence of a UPI, I think that’s our only recourse is to look at organizations that are managing
identity broadly. And if they do that more
efficiently and more accurately then the local systems,
they will buy into it. We know, Intermountain
publish the fact that they’re spending
millions of dollars on identity management and adjudicating those
missing patients. If somebody else can do
that for much cheaper at a lower cost
and more accuracy, then we will gravitate towards
those solutions over time. So that’s where we’re headed. – [Mark] Thank you,
and we got Nancy next. – [Nancy] Nancy Dreyer, IQVIA. My question is about motivation. We heard in the earlier
panel if the docs only, or whoever enters the
data only did it right, that we won’t have to spend
all this time on curation. And then, Kris,
we heard you say, “Where do we draw the line? “How much do we curate forever?” I wanted to know if we
thought about motivations or incentives in EHR
for core datasets for getting it right. And my background
comes from six years with injury surveillance work in a common EMR system across 32 professional athlete
teams in football. They are highly motivated. We spend a lot of time training all the data entry people and we still curate,
curate, curate and do fine tuning. So with a highly
motivated group like that where it’s not just
the push of a button, I think we need to think about in our broad
learning health systems, what we wanna
motivate the systems or what motivations could
there be put in place, either for the EMR companies
or for the health systems to motivate them
for data quality? – Yeah, I mean, I agree. I mean, I think the last panel
touched on value-based care as a potential lever. ‘Cause the reason
nothing happens it there’s not financial
incentive to do it. And so maybe it’s
value-based care. I think if we get to the point where we’re trying to
do precision medicine and we were gonna
use all these data for machine learning and
artificial intelligence, I mean, it’s the same junk data
that we’re working with now. And so you’re gonna
get much better answers if you’ve got better
data collection. And so it’s not that
you have 10,000 elements that are great, but if you’ve got 25, 50, you can do quite
a bit with that. And so I think that might
also drive some of you. – Value and hopefully feeding
back some of the analyses that you’re doing to actually
improve patient care. – Patrick? – [Patrick] Shaun
triggered a question that I’d like the
panel’s feedback on. He described that there was
sensitivity and specificity associated with
patient matching. And want he did was he said,
“The goal of the data quality “wasn’t to get perfect data, “it was actually to
quantify the error “because on the basis of
knowing with the error is, “one could decide whether
that error is acceptable.” And I would posit also
one could use that error in their understanding
of their analysis. So I’d be interested in
the panel’s discussion. We’ve been talking about
striving for good data, but to what extent is the
goal just to make data perfect versus the goal is for
us to just quantify how imperfect the data is and to use the measures
of imperfection in our understanding of the
effects of medical products. If we knew what the error was, we could just be less
certain in our estimates, but we still might be able to find meaningfully
impactful effects. So I’d be interested
in the panel’s– – Striving for good
understanding of imperfections. – Both can happen
at the same time. We can both document
what it currently is so that we have
something to work against and leverage that for
the current questions that we want to answer, but
also use it as a guide post and know where we need
to improve in the future. That’s how I think about that. – Are there examples of best
practices in doing that? – Documenting perform? I think that’s what we’re
here talking about right now? – [Mark] That’s what
we’re getting to, exactly. Great, great. Over here? – [Anne-Marie]
Anne-Marie Meyer, Roche. So we keep drawing this analogy between the claims and the EHR. In my role, I find myself
proselytizing all the great work from claims, from the decide
and the IOM and the CER work. But at the same time, I feel
like we’re rushing ahead in EHR and pushing square
pegs into round holes. So my question is are we at
a point where we can sit down and look at the things, the
best practices from claims, like the previous question, and identify which are the
pieces we can shave off the square ends and still use and which ones we
have to just give away and start to reframe
from the EHR perspective, like adherence, persistence,
into time till last treatment. But functional status
and comorbidity, it’s just not, it’s a
square peg in a round hole. So where is that? Are we at that point and
how could we move forward so we’re being most
efficient with our time. – For some of this
data analysis, like
functional status, stay tuned for the next
panel, but comments? – So the examples of the
Sentinel projects that I gave I think are, in a
sense, an approach. We feel that the EHR
has definite advantages for obtaining certain
types of information about BMI and smoking, so we’re focused there. And on certain types
of complex outcomes, it’s clear that if you
wanted to study them, you’re gonna need EHR
data because the claims aren’t gonna be as sufficient. So I think the question is,
how for does that generalize? Is that a different approach than the fit-for-purpose
for each study approach? Or are those things that really
are part of the same story? – Greg, we are
gonna have to break in just a moment. Is a quick question or comment? – [Greg] No. (audience laughs)
– Okay, all right. In that case, save it
for this afternoon. I’m sure we’ll have more
discussion of the development of best practices, the
evolution of data quality and methods as we come
back this afternoon. Right now, I’d like to
thank our panel though for an excellent discussion around electronic
health records. (audience applauds) We are now going
to break for lunch for those of you in the room and those of you
joining us on the web. We’re gonna start
again at 12:45, sharp. For those of you who are
familiar with the area, there are a lot of
restaurants nearby. If you have any questions,
stop by our desk out front for information on
local restaurants. 12:45. Okay, all right, good
afternoon, everyone. I’d like to welcome you back
to the afternoon session for today’s meeting on
Developing Real-World Data and Evidence to Support
Regulatory Decision-Making. We had some great
sessions this morning. We appreciated all of the
participation in the room and all the people who
are joining us online and those of you who are joining
in conversation by Twitter. It’s #RWE2019. Thanks for the
participation there as well. So we’re gonna pick up
from where we left off. In our last session
before the break we discussed data considerations for electronic health records. Which is an emerging
source of evidence related to, an emerging source
of real-world evidence. This session, we’re gonna
turn to another very important on rapidly expanding
source of data, data coming directly
from people. Patient-generated health
data could provide valuable information, including
an important dimensions of care, outcomes, and
factors that matter a lot in terms of understanding
the safety, effectiveness and other regulatory issues
related to medical products, but up until now have
not been a huge part of evidence generation. And that could be changing with this important
emerging dimension of real-world evidence. So we’re gonna hear about
some pilot projects, some emerging insights
on how organizations are addressing the
foundational questions about the overall
quality and reliability of this type of data source,
a recurring theme here today, and some of the key
implications of these factors for regulatory decision-making. Very pleased to have
a great panel with us. Ernesto Ramirez is senior
data scientist at Evidation. Angela Dobes is senior director of Crohn’s & Colitis Foundation. Both of them will be doing
some brief presentations to kick us off. And we’re also joined
by John Reites, the chief project
officer at THREAD. Gracie Lieberman,
the senior director of regulatory
policy at Genentech. And Elizabeth Kunkoski
who’s a policy analyst at CEDR at FDA. So Ernesto, pleased to
have you start the session. – All right, thank you. Thank you, Mark. Thanks everyone for coming
back from lunch so quickly. So hello, I’m Ernesto. I’m a senior data
scientist at Evidation. At Evidation, we focus
on providing insights about the interaction
of everyday behavior and health using
our deep experience with the latest in
inventions and innovations and sensing data capture
and analytical methods. Today, I’m going to give
just a brief overview of how we think of
PGHD at Evidation. Give you an example of
some of our recent work using
person-generated-health data and a few ways around how
we think about data quality with some specific
key takeaways. So what is PGHD? Building off the definition
provided by the ONC, the recent white paper
that’s been discussed today by the folks over
at Duke-Margolis which we had the opportunity to be a part of define PGHD as wellness
or health-related data, created, recorded or
gathered by individuals for themselves or
by family members or others who take care
of that individual. This type of data
recorded at high frequency and at high resolution
can help us understand an individual’s abilities,
behaviors and outcomes during the course of
their everyday life. These are your Apple Watches, your mobile health
applications, your MyFitnessPals as well as different ways
to interact with people during the course of
their everyday life through self-reported
measures like EMA. And we can use PGHD in a
variety of different ways. It can help us understand
the meaningful behavioral and physiological differences that typically lie outside
of the clinical experience. So things that may
not pop up in the EHR or in the claims record, but
still could be highly relevant, not only to clinical
trial outcomes, but relevant to the
patient themselves or the individual
enrolled in a trial. We’ve shown this time and
time again through our various collaborations with
different sponsors, but also with internal
data that we collect through our platform. An example we have
here up on the screen is some differences
in sleep outcomes related to diagnosis
with type 2 diabetes or multiple sclerosis. But let’s get to some fun stuff. How about a real example
from a real study that we just wrapped up
over the last few months and presented at the
recent Knowledge Discovery and Data Mining
Conference or KDD. It’s super fancy in the
data science community, not so well attended
by health folks, but hopefully those
two institutions and those two fields start
to combine a little bit. So we partnered recently
with Eli Lilly and Apple to conduct a study to test
the feasibility of collecting, processing, and
understanding real-world data from older adults with and
without cognitive impairment, and to see basically can
data that is captured during the course of everyday
lie over a 12-week period be useful for differentiating
healthy individuals from individuals with
cognitive impairment. We screened a little
over 200 individuals and then enrolled
82 healthy controls and 31 individuals with
diagnosed cognitive impairment across mild cognitive impairment and also Alzheimer’s disease. So what did these
people actually do? What kind of data
did we collect? The same stuff that a
lot of you in the room are probably collecting
everyday or contributing everyday through the normal
use of your mobile phones, like your iPhone,
your Apple Watch, potentially a sleep
monitoring system. We use the bedded device which
is another device from Apple. We also used iPad to do
different cognitive tests. And so all these individuals
obviously consented to participate in this study and we followed them
throughout the course of their everyday
life for 12 weeks. We collected over 16
terabytes of data across that entire study design. We processed that data
through our own data platform and tried to develop a
variety of different methods to really sink and
analyze that data across the various different
resolutions and frequencies that it’s recorded. An Apple Watch, the frequency
of which your heart rate is very different than the
frequency of which you call you know, your loved ones. And we had to figure out
a way to actually combine all of that data and align it
on a standardized time scale for not only
exploratory analysis, but also for our
model development. We call that method
of combining data and standardizing
it a behaviorgram. And this is a little small one and you can’t really
read everything, but this is our
method of taking all of that disparate data. Again, that’s captured in
a variety of different ways and combining to see
what kind of signals are actually there? What are people actually
doing in their real life that could be related to
the outcomes of interest, in our case, obviously
what was cognitive decline. This is a really
nice pretty picture, but it’s really useful for us within the data
science community as a first step of
understanding data quality because it’s readily apparent
when things go wrong, when you align your
data on a common scale. Okay. So some takeaways here. What do we actually do and how do we actually
think of things within our data science
group at Evidation? First key takeaway that
I want you to understand is that PGHD is
no different than any other clinical trial data, it’s no different from
claims data or EHR data. You have to really
live with the data to understand what is
there and what are issues that you need to understand. So it’s really important
to characterize your data. So we’ve shown using
the behaviorgram that it’s really
important to align things. When you can actually
start to see issues most commonly: missing data, it’s important to
determine whether or not that data is missing or those issues are
systematic or behavioral. Systematic issues are
really interesting because it comes down to
how systems actual talk to each other. Who here knows how your Apple
Watch actually gets data off the watch to your
phone to a third party? Okay, I saw a very timid hand. It’s really interesting. It’s really interesting
to see how it works, but you have to know
how that actually works because those things break
down from time to time. Everyone here has had a
computer system that’s failed. That’s, is going to
fail, computer systems and technological systems
will fail within PGHD trials. And it’s important to
know how they will. The other part that I
wanted to talk about is from the behavioral
perspective. You may have issues with data
that may not be related at all to how devices function
or how sensors might fail in some different ways. It could be related
to just how people act in the real world. Our unofficial data
science motto at Evidation is “Real-world data means
real-world problems.” And that becomes
really quickly evident when you work with
data off of devices that people just use
in their everyday life. And a good example
of this is we, in the cognitive decline study, we had this dataset that
there’s data that showed up where two individuals,
two participants, had exactly the same data. They woke up at the same time, they went to sleep
at the same time. They rolled over
at the same time, things that should
not happen when you have two different people wearing two different sensors. Turns out, like some people, some older adults, they
use the same email address to sign into their phone
for their iCloud account. And the way that data’s
actually is structured and the way it’s sent
over the systems, that caused a mismatch, caused basically
those two data streams to become one
single data stream. And so it had us
actually going out and talking to participants
about, “Okay, what are you doing “that’s different
than everyone else?” And figuring out, oh,
you’re just doing this one, crazy little thing that we never would have
accounted for unless we took the time to understand
both how people act, but also how systems
talk to each other. Okay. Second is how you actually deal with issues around data quality. Again, I always go
back to missingness because it’s a thing
that most people care about when they think
about data quality. Amputation’s a
really common tool. So if you’re trying
to fill in those gaps, figure out what
may have occurred. In terms of PGHD, we have
this really unique ability to use multi-sensor
streams to figure out what actually is going on. So an Apple Watch for instance, doesn’t give you
zero step counts. It gives you step counts
for when it thinks you’re actually taking steps. So you wanna understand
when zero step counts occur, you can use other signals
like the accelerometers and the heart rate to
zero-fill those gaps. In other instances,
you want to make sure that you also
conserve the amount of missing data that you do have because missing data
could be informative. In the cognitive decline study, one of the important features that come out in
our motto building was not just what people
answering the survey, but if people actually
did the survey at all. Which if you think about
it, makes a lot of sense for people with
cognitive impairment, remembering to do the
things you’ve asked them to do could
be a key feature. Okay, I have to hurry
here ’cause I only got, I think a minute left. So third, it’s really
important to create the appropriate
analytical methods that accurately characterize
your outcomes of interest. So understand your outliers. Understand where they come from and understand whether or
not they are true outliers that reflect real behavior or their issues due to,
let’s say, sensor failure. A good example of this quickly is in my dissertation research. I was using Fitbit devices. I saw a step count of 200 steps
per minute for four hours. And either I had the world’s
best ultra marathoner in my dataset or there
was a sensor failure. It turned out it was just
a simple sensor failure, a sensor reset. Perfectly fine after that. When you have outliers though
and you will have them, in PGHD, it’s important to use the correct analytical methods and use the statistical
aggregations that are robust to dealing
with those outliers. Okay. So I got two more. Wrap up quickly. Fourth in these key takeaways
is that once you’ve run down this path, you’ve
collected the data, used analyzed it,
you’ve characterized it, you’ve maybe done
some data amputation, you have a key
feature that you think is gonna be really important. It’s important to then
retest that feature to figure out is it sensitive
to actual data collection in the wild. This is a really great example. I encourage you to read it. Although it’s a case
study, it’s a case study of one individual wearing a
wrist-mounted accelerometer for 553 days, so 19 months, after a traumatic knee injury. And they developed a
feature called stance time from the accelerometer. And then instead of saying
hey, we got this great feature, we should all use it, they
down sampled that to say what is the appropriate
amount of data that actually reflects
what someone might really do in the wild, not just this single one person in terms of creating a
feature that’s robust to missing data and
actual data availability? Okay, the last
thing I’ll mention is that all of these
novel technologies are really great. They offer us some
really unique abilities. And I think it’s
really remarkable how we can use them to
maybe possibly improve data quality in real-time. And there’s two
ways of doing that. One is if you have
a priori hypotheses around what good data
quality standards are or how completeness or
compliance should be checked. You can use real-time streaming in order to generate
those actual tests. The second is if you’re
getting data in real-time, why not tell participants
about what they’re doing? We collaborated with the folks at the University of
Southern California and IARPA on a really interesting
study of workplace stress where participants,
nurses, were wearing, I think, six different sensors. Took a lot of effort
on their part, but we told them
exactly how they were doing every single day. You know, we also incented them, but having that
feedback mechanism was really important. So I’ll wrap up there. Obviously, you can tell we’re
really passionate about PGHD. It’s something that we
hope to continue furthering this conversation and
thank you for your time. (audience applauds) – And our next
opening presentation is from Angela. Thank you. – Okay, so today
I’m gonna discuss how the Crohn’s &
Colitis Foundation’s information exchange
platform, IBD Plexus, is integrating multi-dimensional
real-world data to accelerate research and
enhance patient-centricity. IBD Plexus is designed
to support activities across the research continuum
and product life cycle by providing
industry and academia expedited access to data and
samples to accelerate research. Through Plexus, we
centralize and link data across four diverse
prospective research cohorts. These are all
independent studies. However, with all unique goals, however we encourage
patients to sign up for multiple studies
when applicable to really generate this
robust, individual dataset, to get it to a critical
mass of information to advance the field forward. Through our research
cohorts, we are collecting primary real-world data. So we’re following patients
during their routine care. We’re collecting
patient surveys. We’re collecting electronic
case report forms. We’re also generating lab
data and molecular data, genotyping, transcriptomics,
microbiome data from the samples
that we’re collecting through these
prospective studies. In addition, two of our
studies are incorporating in this more traditional,
secondary real-world data, the medical record. Both inpatient and
outpatient data, historic and perspective data. And then what we’re
really striving for and what we’re able to do is really marry this
concept of clinical care and research where
one of our studies is leveraging what we
call an IBD SmartForm. And what this SmartForm
is is essentially an electronic health record and patient survey that’s
embedded directly into the electronic medical record and can be used for both
clinical care and research. So Plexus has a sophisticated information governance process. I’m gonna walk you through this. I’m gonna start with the
registration and go clockwise. So through Plexus
we’re able to register and authenticate
patients where we provide them a master ID
that lives with them. Our consent process is, it’s a duel-consent and
HIPAA authorization. Some key terms
within that consent are reusability of
the data and samples, being able to link to
these different programs, but also external sources
such as claims data, and also the ability
to recontact. Standardization
and normalization is of the utmost
importance to Plexus. So we try as best as possible to leverage existing common
data models such as OMOP. However, in the IBD space,
there was no existing models. So when it came to those
really specific clinical and phenotypic information,
we ended having to create our own ontology and
common data models. When it comes to our
data integration engine and our data processing tools, we’re able to leverage
them specifically to mitigate against some
interoperability issues as we are incorporating
electronic medical record data from various EMR systems. We also have a master
patient index engine where we use deterministic
and problemistic matching, again, to be able
to link the data across the data sources,
across research programs and also ensure we don’t
have duplicate patients within our system. We also have, for
our quality control, we leverage built-in
data quality checks and error report processing. And then all of our
data is anonymized and it’s pushed towards
an analytical platform where our prep to
research tools sit. And this is where
our researchers can gain access to these tools to query the database, but they can also gain
direct access to the data. And so we have
built-in automated data
provisioning process where we release both raw
and research-ready data. And because we release this data directly
to both our industry and our academic researchers, we ensure that we have
a white glove service. So that way, we can really
hand-hold the researchers in understanding
how to appropriately use this real-world data. So Plexus is breaking
down traditional barriers to gain access to data. We have, again, four
research study cohorts where over 70 sites
are participating through these cohorts. We currently have eight
pharmaceutical companies who are members of Plexus. Now another great
thing about Plexus is we’re able to leverage
the infrastructure from both a
technology perspective and a site network perspective to conduct ancillary studies. And what this does
is not only allow us to collect additional data, but new data points and new bio samples to really
grow this powerful resource. I wanted to talk a
little bit about impact. And I had mentioned before
that Plexus really spans the gamut of the
research continuum, from discovery to clinical
development to post-marketing. But I wanted to touch more
on some of the activities that were highlighted within the FDA real-world
evidence program framework, around these activities
that you can currently leverage real-world data for to improve research and
development and efficiencies. And so currently, our
members are using our data and samples for
exploratory purposes to expedite go, no-go decisions, developing drug development
tools, specifically biomarkers. We’re also able to develop
some complex algorithms to identify patients
specifically to enroll into our ancillary studies. And identification
of characteristics, specifically from a
clinical and molecular is really, really, important
to really advance the science. And so by doing that, we’re
able to stratify patients into sub-populations
for multiple activities, including enhancing the success
rate of clinical trials. And so what we really hope
though is we can leverage all of our success and
experience to ultimately be able to use real-world
data for regulatory decisions. But of ultimate more importance is really elevating the
stature and the importance of patient-generated
real-world evidence. And how do we get
to research outcomes that are most
meaningful to patients? So mindful of the
patient journey, the Crohn’s & Colitis
Foundation embraces a patient-centric approach
to all decision-making and mission delivery. We are really excited
for all the benefits that are gonna come out
of the 21st century cure, specifically that
will benefit patients. And we’re also very excited
to be up on the frontline and a part of a demonstration
project leveraging the MyStudies App. And for those who don’t know
about the MyStudies App, it was developed by the FDA
in private party sectors really around this goal
of facilitating the input of real-world data by patients. And so the foundation is
really interested in deepening engagement with our patients
that are participating in our research cohorts. And what we’re gonna
be doing is expanding our direct-to-patient
research capabilities to for the very first
time, include a mobile app. And some of the
demonstration project goals are to explore the
use of digital tools to fill in those data gaps. So those known-unknowns. Capture the patient
experience data. Be on the clinical
delivery system. Establish a more
comprehensive picture of how medical products really
function in the real-world, and help assess the use of
patient-generated health data to support real-world evidence, and also help establish
this high-quality, patient-generated
health data ecosystem. And so Ernesto talked a lot
about all the different types of patient-generated
health data from sensors, to task-based
activities, to surveys. And so what I’m gonna really
focus on patient-reported data through surveys and some
considerations to take. Specifically gonna
focus on three pillars: completeness, conformance,
and credibility. So if you’re studying
chronic diseases like IBD where the
disease state can change, so you can be active
and have flares or you can be in remission
and be asymptomatic. Depending on your
disease activity, it actually influences when patients will fill
out longitudinal surveys. Another aspect of completeness
is just design barriers. So we thankfully
learned early on that leveraging tokens
to be allowed patients to fill out surveys versus
them having to sign in and put in their
username and password greatly enhances
the response rate. And then just life in general. I love the behaviorgram
that Ernesto showed. Specifically,
there was one point where it was like when people
use their apps the most and it was right around bedtime. Right now, all of our
notifications are web-based. So you know, someone’s
in their beds, they’re not gonna scroll
through their email and be like, “I think
at 2:10 in the morning, “I was sent this survey.” But what they could do is
look at their notifications and more likely to
fill out a survey then. When it comes to conformance, standards, standards, standards. We’re not gonna
be able to assess the impact and the reliability of patient-generated health
data without standards. In addition, I wanna talk a
little bit about time points. So unlike clinical trials, real-world data don’t
have set visit schedules. And so you have to take
that in consideration when defining index dates. And lastly, credibility. So we know that there are things that patients are really
good at filling out and things that they’re not. So through validation
studies, we’ve been able to assess that patients can very accurately say whether or not they have
ulcerative colitis or Crohn’s disease. But they’re not as good
as defining exactly where that disease is located. For example, Crohn’s
disease can go anywhere from the mouth to the anus. When it comes to recall bias, we know people don’t
remember things. So we try to implement tactics
to not require patients to remember anything
more than a week away. And then business rules. So this really helps to get
at some of those outliers. So some of them can
be disease-specific, but also some can just be
relatable across disease areas such as not allowing
future dates to be inputted. But more importantly, to really
get to higher completion, conformance and
credibility rates is you need to provide
value back to the patients. And so what’s next
for the foundation? We’re gonna be leveraging
the MyStudies App technology and we’re gonna be
launching IBD PROdigy in Q1 of 2020. Patients are natural experts
in knowing and listening to their minds and bodies and we wanna learn
from all patients. – Thank you very much. (audience applauds) Thanks, Angela. Now we’re gonna hear
from our other panelists, reactions, further
insights or perspectives, starting with John. – Sure. So John Reites from THREAD. And very simply, what
THREAD does is we built a technology that’s modernizing the way clinical
research is done. We’re moving traditional data that’s captured in the clinic and we’re having it
happen in the home and on the go. And that’s with a
combination of things like eConsent, ePRO, surveys,
telehealth and sensors. And those include
our medical devices. So your 510K cleared, your
CE Mark approved devices and consumer wearables like
we’re talking about today. And so when we
look at this space, and we’ve got nearly
a hundred studies done with these types of models. I’ll just tell you that
the people expect us to have learned
everything with sensors and we haven’t. There is so much to
learn with sensors. We’ve done so many
projects with Apple Watch. And I can tell you everything
that’s in that thing, Ernesto and still can’t figure
out why some people don’t do certain things. And so what I’ll tell you is
that when we look at the space and really our focus was to
really talk about quality. What are some ideas? How do we manage
and support quality? Quality truly is be design. It is not about technology. It’s about the support that
goes around technology. And so first and foremost,
if I could just sort of tell you the thing
that I feel like we continue to learn
and learn and learn is that patients, people
need to continually be trained how to
use those devices. They need to be continually
trained how to use mobile. Assumptions like
we have in the room about how to put on
a device or charge it are not assumptions that
we should take forward in our daily lives. Most people, and I’ve seen it, I have this one study and
we have moms waking up in the middle of the
night to track feedings in a mobile app. And then we’ve got 85-year-olds doing coached mobile
spirometry through telehealth in their homes. And when you look at the
span of these patients, they all have the
same characteristics which is if you don’t
teach them well, they don’t know how to
repeat how to do something. And so I just
wanna harp on that. That has nothing to
do with the tech. It has to do with us
understanding that
people are people and we’re behavioral
and we build habits and we have to be
trained to do something that’s out of the ordinary. You know, the second
thing that I wanna mention is when we talk, you
guys were talking a lot which is great, about
the device connectivity, association, authentication,
tokenization, these tools that we have to
make sure that the person that does something
is the person we think they are. When we used to have
these graphics of cats with Fitbits on their legs. You guys remember these, right? Am I the only one
that saw these? Okay, there used to be images
where people would say, “Well how do you
know if a Fitbit “didn’t get put on a dog,” and all these ridiculous things. And the reality is we
actually know that today. I mean, I can put a dashboard
up in 10 seconds and tell you, “Somebody put their
Fitbit on someone else.” Their gait moves, right? It’s not rocket science. But what is rocket science
is looking at the whole of all of this data coming in
and getting those actionable insights on the go because
you have to correct things on the fly. And so what I really liked
about what Ernesto said is this looking for the gaps. What I would tell you
is one of the things that I think we have to do a
better job at as an industry is not looking at
gaps as missing data. Like oh, you did
something wrong. Looking at gaps to figure our
what are they not reporting? Right, and being
able to do signaling. So one of the other
things that I’ll tell you that really supports
quality from our perspective is setting up trigger-based
notifications, not just for the patient, but that work in a
digital escalation route to get to a site
or a call center or somebody down the line. For instance, like
if I’m a patient and I was supposed to
have done something and I didn’t do it, the
app should auto-remind me to do something. If I wait three times
and I don’t do it, I want that escalated
to somebody. I don’t want just a
patient sitting there, not doing something. Right, that could be
very valuable data. That could be very
critical data. It could be part of
a secondary endpoint. So this is what we call
digital escalation. Just becomes a really
practical way to help people remember to do things, and only escalate to
people to intervene when it’s really necessary. ‘Cause the last thing we wanna do is fill
up people’s inboxes and notification boxes with there’s 75 people
you need to call today. The real nature of
this quality initiative and all the works that
going to quality this data is frankly around scale. Right, a lot of these studies
that we’ve seen and done, you know, you have,
10, 20, 30 patients. Just to very honest, that’s
really easy for us to review. And we’ve got a study now that has 80,000
people, 16,000 people. You cannot scale that by
looking through spreadsheets. So analytical models to find the signaling is
really important. And I would love to tell you that somebody’s
figured that out, but I think the person
that figures that out will not be with us. They’ll be like on a
Caribbean cruise somewhere because it’s a really
hard nut to crack. So here’s what I would
say is when we look at all these aspects of
quality and quality control, the important pieces,
frankly, that we remember that people are people and they’re
behavioral by nature. And so we have to support them. But second is we
have to make sure that we white glove,
that we give patients and participants
support by escalating in the right ways to the
right people in the process. There’s more I’ll
answer in the questions, but hopefully I
gave you a summary. Thanks.
– Thanks very much, John. (audience applauds) Grace? – Well, thank you, John. I’ll use your comments
in my discussion. – Is that that cat? If the cat gets in there, you get two points.
– No, not gonna fit the cat. So I think that
the technology now offers us a really a great
opportunity of thinking, sort of reversing the questions
with real-world data, right? We’re trying to
generate endpoints within real-world data
that sort of conform with clinical trials. Well what if we use
technology to come up with new endpoints and
then actually apply them also to clinical trials and all of a sudden
level the plane? Can we identify disease
that app make sense. And let’s take what Roche,
Genentech is working on in en mass. So right now, the tool that
is used in clinical trials is the Expanded Disability
Status assessment which is used in
clinical trials, but really not in
clinical practice. So there is a way to develop
by using mobile technology, something that could
actually very well be used to monitor disease progression
in clinical practice, but also inform
endpoints in studies in clinical trials. So there is sort of this
floodlight initiative that Roche has started
a few years ago. And the idea is to
always be developing two apps going, sort of and
looking one from the other in a parallel fashion, this sort of clinical trials
and clinical practice app. And basically, it’s an app
that collects raw sensor data from both active,
performing active tests and also the passive data. And then that data is then
transformed into certain scores. For example, then
areas like cognition, motor function, gait and balance and other things that are
important to the diseases. And the whole process
of how do we move from just collecting the data and then specifically validating
it both as an endpoint and in the clinical practice? What are the steps to do it? How do we start? I mean, you start really from
was important to the patients? How do they think they
would want to respond to these types of data. And then you take these pieces and then move to the next one. What would KOL and experts say? What would they use? And then eventually, how
can we start summarizing this data and pre-specify
all that in the process so it’s not like,
“Okay, now I have data. “Now I’m gonna come up
with some measures.” But you have to pre-specify
the entire process. And meaning that,
okay, how do we test? Is it reliable? So do I test it, again, how many clinical
trials do I need to have just to test how the
apps is performing in a more controllable way? And then test it in
parallel in the clinic, in a clinical setting
to also understand how patients respond to it, and then feed it back
to this clinical test and clinical trials, I’ll say. So it is a back
and forth process. It’s not like I’m just gonna
have an app, put it out there. So for example, things
that we wanna test in more of a clinical
setting, you know, how does it converge
with the established, the EDSS system? Then, how can it predict
change within a system? And also, is there a correlation
of how it predicts change? And then eventually, can it
predict the long-term outcomes that are captured? And again, then you take
that information and test it in the clinical practice. Do we see the same response? Does the test predicts that? But then you wanna
also look at things. You know, is the test
doing the same thing in consecutive days when there
really should be no changes into the disease? Because if it’s flop,
flopping all the time then or it’s something
wrong, patients are using it, or the test is not sensitive and it’s just all
over the place. And also then, you
know, it’s nice to see a whole trajectory
of what the test is doing, but especially for
research you want a score. So do you take a
score from a week, from a month of performance? Again, you have to
pre-plan it and come up with some ideas of analysis and think about it, what
is the right measure? And so it is a process ongoing. The thing that I would
also wanna mention here, you know, that we have great sort of both
regulatory guidances and frameworks for
sort of the cause and the biomarkers. But we are sort of in
this space in-between. And what is probably
even more challenging is the whole global
regulatory framework and how do you sort
of develop both, something that you could
use in clinical trials as a totally new endpoint? And at the same time,
almost send the, that patients and physicians
will be able to use to monitor the disease. Thank you. – Thank you. And Beth? – Hi, my name is Beth Kunkoski. I work in CEDR’s Office
of Medical Policy and my primary role
is to figure out how to make this a reality. How can we use
these technologies and clinical investigations to support the safety
and effective efficacy of new medical products? And I think my colleagues
have done a wonderful job of laying the groundwork. I don’t have too much to add. John focused a lot about
doing and Grace did as well. Is what’s your plans? So I think really
deciding upfront what your plan and
your hypothesis and how you’re gonna
collect this data and talking to all the
different stakeholders, talking to the patients, talking to, I’m an
engineer by training, so engineers play
a huge role here. The regulators, the clinicians, we really need to bring all
of these people together to be able to make
this a success. So Grace talked a
lot about endpoints. I think the endpoint is
really the driving factor here of are we measuring something
that we’ve always measured? It’s just a new way. Is this is a novel endpoint that we need to figure out is this what’s
important to patients? Is this clinically meaningful? All of these things
will drive your study. And I think then if you
follow the traditional process of a clinical investigation, you’ll be able to get your
answer at the end of the day. I think usability studies
play a huge role here of start up front, test this out before you launch into
your full-blown study so that you can
test and make sure that subjects are using
the app as intended because you’ll save the time and money if you
do that upfront. We’ve talked a lot about
how the volume of data, there needs to be a process
in place to manage that data. Are there algorithms
that are crunching it on a daily basis as it comes in? I think there’s a
lot of room for AI and machine learning
in this area. So laying that
framework as well. From the regulatory
compliance perspective, Part 11, Electronic
Records, Electronic Storage. The audit trail, as long as
you are laying these plans out upfront, you’ll have success and it will
revolutionize our ability to collect data in
clinical investigations. This is something the FDA
has truly invested in. We’ve heard about some invested demonstration projects already, the MyStudies App. We have several CERSI
studies that we’re funding looking at these technologies
as well as be partnered with the Clinical Trial
Transformation Initiative, as well as we have
some legislation in patient-focused
drug development. We have a workshop in December
looking at the incorporation of clinical outcome
assessments and endpoints. We have made this commitment and I think that
working with all of you, we’ll start to see
this as a reality. – Great, thank you
very much, Beth. (audience applauds) Obviously, a lot going on here. And again, in just
a couple minutes, I wanna open up the
microphones too. But just to kick
off this discussion, we’ve talked today
about the early days of some types of data for use as real-world evidence
for regulatory purposes. And we’ve heard in this session
about an explosion of data that are being
used in many cases for preliminary studies
that you can see becoming on a pathway for use
for regulatory uses. I wonder if you all could
talk a little bit more about where you see the
earliest and most promising applications of
patient-generated health data in actual regulatory
decisions-making? So heard some about
in Beth’s comments about the FDA patient
focus drug development work around having endpoints
that, as Grace said, get directly at what
matters to patients, Angela as well. John, Ernesto talked
about ways to collect data maybe more efficiently
than what could be done in a clinical trial
or certainly outside of a traditional
clinical trial design where you have to follow
Adrian’s diagram this morning. What do you see is the
earliest, most promising actual regulatory applications? – I can start. So I noticed through Plexus, we’re working with
some of our members. Early on in their drug
development process to include, to develop these
kind of companion biomarkers for their drugs that incorporate not only clinical biomarkers, but patient-reported
biomarkers as well. One of the key
endpoints for drugs in the IBD space
is mucosal healing. That doesn’t mean
anything to patients. I mean, patients care
about fatigue, pain. And so how do we
incorporate those early on and turn them into biomarkers? – From our perspective, I
think there’s two thing. One is the ability to
modernize these studies, like these hybrid models
where you’re not changing the what’s collected,
you’re changing the how. But the validation, the
endpoint, the reason it was created is done
easier for people, sites, and patients. But on the other side, I’ll
tell things we call eDROs, device reported outcomes. You know, if you look at some
of the work with Novartis and Blackthorn and
other folks out, there’s some really
innovative work that’s happening in this space
around taking an assessment, an activity that patients do, making it digital,
wrapping Bluetooth training and everything into one package. So as what research kit
is to an active task. You can actually take that
quite a few steps further. And what I’ve seen is those
getting taken agencies to say this might
be the secondary or maybe even a primary
endpoint of the future. Very similar to what
Grace was mentioning about some of the
MS work that Kris and others are doing there. So I think that there’s
definitely models of this, but I think we, we’re a
measured industry, right? We start small. So we go get this
endpoint done this way and then that leads to this
endpoint done this way. And I think that’s really how
we get the momentum we need. – Yeah, I would, I don’t
think I have much to add, but I think there’s some
promising examples out there. You know, if you look at the
clinical outcome assessments that are being sent
through the FDA process, just search by other, you’ll
see the digital things there. There’s five of them, I believe
the last time I counted. And really, the
majority are saying, “Here’s the thing that’s
typically done clinic,” six-minute walk
test for example. What can be captured
in the real-world using a wearable
sensor that reflects a patient’s true
ability to do something that’s in a non-clinical
environment? And I think there’s a
lot of promise there to translate those first
steps that people are taking into the world into other areas that may not require
wearing four accelerometers on wrists and ankles,
but it’s coming along. – Okay, great, great. So some good early examples. And I think like we’ve heard
about in previous panels of you all starting in
some specific focused areas with an expectation
that will grow out, how are we doing in
terms of best practices for data quality and validation? It was a big topic of discussion
in the previous sessions that included elements like
transparency and provenance. You all have mentioned
some standards that I think applies
here as well. Also here, maybe there’s
some distinct issues like you all have
mentioned user training and the fact that there is a lot of continuous
quality improvement that goes along with data
collection and cleaning itself. So any thoughts about, or are we getting there? Are we still at a stage of
each potential application is, it needs to be assessed
for fit-for-purpose, fit-for-use in its own right? Are there getting to be
some broader best practices that we can understand to maybe accelerate the
use of real-world data from patient-generated sources? – I think it’s gonna
depend also on the use. I mean, if we think about,
you know, we gonna develop this for maybe primary or
even secondary endpoint. I think it still
is gonna be more of each individual applications because they’re additional, excuse me, additional
steps that are required. I can see that if we’re thinking
about just sort of using this for the patients’
and physicians’ benefit, there probably
might be additional, sort of less of a
structure things. But again, it’s gonna be, if
there is a risk to the patient, then again, we go back to this
maybe at least for awhile, you want it more of an
individual application. And you might comment on it. – I agree. I think there’s so much
variability in disease area and the technology that is
gonna capture these endpoints that you have to start with
each one and build from there as you apply it in other areas. – [Mark] Thank you. A couple of questions. Please go ahead. – [Jack] Jack Snyder. I don’t know if,
oh, there it goes. Jack Snyder from Cato Research at Global Clinical
Research Organization. One, I think you may have
answered this question, but I’m not sure. From an international
perspective, do we have any examples of
the complete lifecycle here which would be envisioned
as mobile applications generating a dataset
that’s proposed as a primary endpoint
for clinical research in front of any regulatory
body in the world. And that regulatory
body says, “Yes, “go ahead and proceed with
your clinical investigations “using that particular
mobile app data “generated primary endpoint.” And then the trials
have been done and we have a verdict
on what’s happening. Do we have any examples
anywhere in the world of that kind of lifecycle yet? – There are a couple
that are getting ready to go into submission. So I think that’ll change. That’s a more of an
RCT question though. So if I think about my
real-world evidence hat, ’cause I’d love to go RCT
on everybody right now, but I won’t. Like if I think about RWE,
if I think about real-world, there’s definitely a lot
more usability for this. Here’s what I would
say is I think the key to the question is as you go
global, does this work scale? And so here’s the question
I’m gonna ask us all, do people everywhere
else have mobile phones and the ability to wear sensors? That’s what we’re asking. And I think we know
the answer to that. The difference is as
we’ve been global in Asia and the Latin America and Europe is the culture differences
for how they use mobile and how they respond to
on-demand telehealth. Those things are very unique
and very culturally specific. So what I would say
is everybody’s a
rookie in this space. And we’re all learning,
but I will tell you that the one thing that I
think is really positive is that when you
look at these studies that people are trying to do, sites and patients are just so positively responsive
to this type of work because they understand,
they’re being taught this is why we’re doing this,
this is why it’s important. And by the way, it might
make it more convenient for you to be in a
research environment. – I will also say there’s
maybe one example. It hasn’t gone through
the full lifecycle, but there’s some recent
guidance by the EMA on the use of the 95th
percentile of stride velocity for patients with Duchenne
muscular dystrophy. And it’s a thing we
read and then reread and then reread time and
time again internally because I think it holds up
as a really great example for taking something
out of the clinic and into a patient’s home. It’s still quite intensive. It’s not like opening your
phone and walking around with it and then we can tell you
exactly what’s going on. You know, it requires some
intensive sensor wear, but it’s sensor wear
in your real life, not walking up and down
your physician’s hallway. So I recommend maybe
just looking into that. We’ll probably see
similar things. I believe one of those
is under the eCOA, but I’m not sure. – I could also say
that one of the even sort of how do you
test and validate this? You’re gonna have the
language barriers. And you’re gonna have to
see how you translate it. Do you still get the same sort of correlations with something? Can you still capture
the sort of, you know, the measure, the change
and are there cutoffs that you define in
one population that’s
gonna adequately describe or predict the
progression of disease? And is it because it’s
the different population or is it because
maybe your translation wasn’t as accurate? And in that part of the
world, you had to use a different language to
help people use the app. So I think there’s also a
lot of this kind of step that have to happen. – Good point. Over here? – [Kourtney] Hi, Kourtney
Davis, Janssen R&D. I have a question for Angela. Angela, I was really
impressed with IBD Plexus. Could you tell us a little
bit more about how you chose what went into the MyStudies
App, which questions? How are the patients involved and the physicians? And did you pick
outcomes or endpoints that had already been
validated previously? – Sure, so for
the MyStudies App, we’re actually replicating
what we’re collecting in our SPARC IBD study,
just leveraging a mobile app to collect that. But we do have patient
governance committees that inform the
patient-reported data that we do collect. There are no standards, right, there are not validated
IBD-specific surveys. We are incorporating
promise measures. So we’re going to get that
fatigue, social isolation, pain. But when it comes to IBD, that’s something
that’s really important for our disease area to
develop, validated prose. – Sally? – [Sally] Hi, thank
you very much. Your presentations were great. Sally Okun from PatientsLikeMe. I’m interested in
the previous session. I think it was Shaun who
challenged us about thinking about curation of the future and really sort of
thinking about patients actually being the
ultimate curator. So I’m wondering, given
the opportunities we have with technology and the ability to go back and inquire
of the patients that we might be
doing some work with, how do you see that
in the future role? Can we start to think
about the use of patients as the validator of
information that either took place in the clinic and
now they were automatically, in real-time given
some of the insight from that data that was
generated to then confirm it? Yes, that’s the
experience that I had. And I was wondering whether
the technological capability of having mobile communications
like that could actually a person-generation
validation metric that would then
be able to suggest that the information
in the EHR confirms what the patient’s understanding of what that experience was. And then we also have
some way of continuing that validation through
sensors and other data. So anyway, just trying to
connect the dots, I think, as we go in the future and using patient
and person-generated information and
insight and validation to help us better
close some of the gaps. – Yes. (audience laughs) So no, I think what you’re
hitting on is really important and if you think about the
way that world is moving, that patients and
people in general are becoming more empowered
to become the stewards of their own data through a
variety of different means. By no means is it perfect
across all sectors, particularly in healthcare. But I think there are some
really interesting things that we as people are trying
to work with a variety of different medical and
health data can look to. The one, I can think
shining example that I always look to
is the OpenNotes project which is just fantastic. Like I wanna read
what my doctor writes. And if I can read it, then I can have a conversation about it and I can correct it and I
can validate that information. Should it be what happens
with the entire EHR system? Who knows, why not? I mean, people are smart
and patients are engaged. I don’t see why we
couldn’t use them as a key source of truth. – I think patient recall
has always played a role in clinical investigations. And when you go in and
you talk to your doctor and he ask you how you felt
over the last six weeks, you’re always gonna
forget something. So I think this only enables us to check this
information real-time which will only make
it more accurate. – All right, well I wanna
thank you all very much for a great discussion of what is unquestionably
a big frontier in real-world evidence and hopefully more
engagement of patients in research as well. Thank you. – Thanks so much. (audience applauds) – We’re gonna move right
into our next panel on Methodological and
Analytical Considerations for Observational Studies. So I’d like the next
panel to come on up while I introduce this session. In this session,
we’re gonna transition from our discussion
of data quality and data issues, sources,
and quality and reliability to consider how methods
and analytic approaches used in observational studies could inform research questions and enhance our ability
to make causal inferences, again, for regulatory purposes. We’ve got a number of
presentations lined up to highlight results
of recent projects that have looked to replicate
randomized controlled trials using real-world
data and methods as well as other
analytical approaches that could offer some insights
into the appropriate use of these real-world
study designs. So as you can see, I’ve
got a big panel here for this session. We’re joined by Til Sturmer, the Nancy Dreyer Distinguished
Professor and Chair of the Department
of Epidemiology at the Gillings School
of Global Public Health at University of North Carolina. Bill Crown, chief scientific
officer of OptumLabs. Sebastian Schneeweiss,
who’s professor of medicine in Epidemiology at
Harvard Medical School and chief of the Division
of Pharmacoepidemiology at the Department of Medicine in Brigham and
Women’s Hospitals. Lucinda Orsini, who’s associate chief
science officer at ISPOR. Patrick Ryan, senior
director of epidemiology and the head of
Epidemiology Analytics at Janssen Research
and Development. Kristin Sheffield,
research advisor in the Center of Expertise within the Global
Patient Outcomes in Real-World Evidence
Organization at Eli Lilly. And David Martin,
associate director for real-world
evidence analytics and the Office of
Medical Policy at CEDR. So a number of presentations. To begin with from our
first several presenters and then some
discussion as well. So let me turn it over
to Til to get started. – Thank you all. I’ve lots of slides, so I
will need to move quickly. That didn’t work already. Forward, okay. Better. I just wanted to upfront mention that this talk has been endorsed by the International Society
for Pharmacoepidemiology where all of the developments
that I will present today have been first
presented and discussed. This is my disclosure slide. I don’t want to spend a lot
of time on that one either. So let’s go into
where we come from. And the ones that are
old enough like myself, remember this article here. So when it comes to
evaluating evidence about medical interventions, the next question should
be whether it’s randomized. And if not, forget it. (audience laughs) I wanted to walk you through
the reasoning for this because these were smart people. So it’s not sarcastic
in that sense. So confounding by indication. Good prescribing leads to
confounding by indication and I’ll let you read the rest. Confounding by indication
generally let’s the drug look bad because the
underlying indication usually increases the
risk for the outcome. That’s why we treat. More recently, we
have discovered
confounding by frailty. Essentially, patients
close to death are less likely to be treated
with preventive therapies. And if mortality
is your outcome, then the drug would
suddenly look good. And the saying here is if
it’s too good to be true, it might not be true. And then now to these
two confoundings that go in different direction, you now add the
ignoring of the issues related to adherence and
persistence on treatment and time on treatment. And I just have two
publications here put up. An older one about
adherence effects, the other one, a more recent
on time varying hazards. And then you can understand that comparing
prevalent drug users with non-users is bound
to be biased and invalid. And that was the
standards design for nonexperimental research when these statements were
made that I started off with. So smart people saying
smart things about a design that was used at the time. Okay, so but where are we now? We are in a time
where we have realized that we need to do active
comparator, new user designs. And this is my latest
attempt in depicting this. So we start of with patients
with type 2 diabetes. The main indication
for adding insulin to the oral therapy is obesity. It’s just the main
indication for that. So if we compare
insulin initiators with non-insulin
initiators of insulin, we will have strong
confounding by BMI or obesity. If we do, however,
compare two insulins, initiating insulin glargine which is an insulin analog
whereas NPH or human insulin, then we remove this
confounding by indication because the condition
on the indication, both cohorts that we
compare have the indication. And then is then
how it works out. This is a study done in
electronic health record data from the Mass General Hospital taking all the patients
initiating insulin glargine and all the patients
initiating NPH insulin. And then you look at their BMI and you see that
BMI doesn’t affect the choice of insulin at all. There’s no difference. This is not randomized. This is just the baseline data. That’s how it works. Okay. So the active comparator,
new user design dramatically reduces
the potential for bias due to these factors
that I mentioned. It also reduces the potential
for immortal time bias which I cannot go into. This idea is obviously not new. The first citation
is from ’87 here. But it has become
over the last 15 years the standard design for
nonexperimental comparative effectiveness research and
the focus on the intervention, the new user part of this
is obviously a prerequisite for causal inference. Now yes, the comparator
drug selection is important. So you need to
spend time on this because obviously, if it
doesn’t have the same indication then this will not work
to the same extent. Okay, so where are we going? These are just very quickly, and I’m not implying that
this is exhaustive list of what we’re
currently doing at UNC. And this is also not implying that you will be able
to follow all of this. This is just to give a preview. So stopping of
medication adherence, persistence, how
to deal with that. If we just censor people
when they stop medication, we can introduce bias
due to selection. And by the way, this
is the dark here, and that also is the same issue or a similar issue at
least in randomized trials. So we need to do
more work on this. As just a recent example
from the Medicare data, you have here the
adherence or persistence on the Dabigatran, and on the right side, the
persistence on Warfarin. And you see after two years, the persistence on
Warfarin is much higher than the persistence
of Dabigatran. And then when you look
at the on treatment versus the initial treatment, in fact, comparing
these two drugs you see striking differences that are mainly due to the non-persistence
on the Dabigatran. So you need to take
this into account, especially also when
you try to translate from a randomize trial
to real-world evidence. So the benefit and
harm of treatments may not be realized
in the real-world due to lack of adherence
or persistence. This complicates RCT
generalizability. It’s not the only issue,
we talked earlier today about other issues. We have the methods,
but they need the data. And here I think is the
linkage to the EHR data that will put us on a
different playing field because we will be
able to better predict
treatment changes. And this will also help to
strategize interventions. Second issue, single-arm
trials with control for confounding, but
using an external source where we now suddenly,
let’s say, have a preventive treatments that’s
preferentially given to smokers because they are at higher
risk for the outcome, but we don’t have
measures of smoking as has already been pointed out. And the claim status, we
only have some information on smoking that may have
a very high specificity, but it has a low sensitivity. So how does that
affect our analysis? So here again, the dark helps. So if there is only confounding, only the smoking
affects treatment, then we cannot generate bias and we will have some
confounding control depending on how good our measure of the
actual smoking is. But if we do have differential
mis-classification and in the absence
of confounding, we actually generate
bias by controlling for the measured smoking. And obviously, if we have both
going on at the same time, then all bets are off. And that’s not entirely
true because we can actually identify the parameter
space where this works, or doesn’t work. This is my first attempt
of having an animated thing in a slideshow. (audience laughs) And you see in green that we
can identify parameter spaces where the confounding
control is sufficient. This is another thing. You might say, “Why
don’t we just restrict “to the smokers?” Which is usually seen as
because it’s a high specificity that this will work. But here, we only see
that it only works when the prevalence
of smoking is high because otherwise we
get the false positives, even with a close
to one specificity or abundant and
create confounding. So controlling
for differentially
misclassified variable does not always reduce bias. We can identify in
what situations we can. Control for confounding
and restricting to those with the variable
is not always going to work. Finding study populations
with treatment equipoise. I’m just showing
some titles here. This was the first paper from
a statistical perspective to increase efficiency
to treatment populations. We followed up with this paper at using the same approach
to control for confounding. Alex Walker et al, extended this to comparative
effectiveness research. And while Plinch has
recently published a paper on comparing these
different approaches and how they perform. This is how it works. We just trim off the tales of the propensity’s
core distribution to find an analysis, study population where we have
more equipoise in treatment and potentially reduce
unmeasured confounding. We need more guidance work
on actual amount of trimming. And this is ongoing. I think I want to quickly
mention the new idea to define bias versus
the treatment effect in the target population,
rather than sticking to internal and
external validity. This is a paper by
Daniel Westreich from UNC and I urge you all to read it. This is a really
interesting concept that will help us
to move forward. And this is the final slide
here by Michele Jonsson Funk, looking at sensitivity analysis as a new contract from the FDA. Obviously, no results yet, but try to identify which
sensitivity analysis and how can we interpret
and can we find guidance and how to interpret
them with respect to unmeasured confounding. Thank you very much. – Thank you. (audience applauds) Next is Bill. – We want to raise
our confidence level in quality of the
inferences that we can draw from observational data. The gold standard, of course,
is randomized clinical trials. Remarkably, there’s been
thousands, tens of thousands of observational studies and similarly, with
clinical trials. And the evidence of
comparing treatment effects from observational studies
and clinical trials is actually that there’s a
high degree of agreement. Now which to me is rather
surprising (laughs) ’cause you would think that
they would not be so similar in terms of their results
given some of the issues that we’ve heard about today and that Til was
just talking about. But that sort of very
broad brush comparison isn’t very useful. So how do we get to the
point where we’re actually trying to attempt to
draw casual inference in observational studies and something that
more closely mimics a randomized clinical trial. And we have no lack
of causal frameworks. There’s causal frameworks that
have come out of economics, simultaneous equation models
from agricultural economics, date back 100 years. More recently with
Heckman in the mid ’70s. He started looking
at the question of
sample selection bias in terms of labor
force participation which is, when you
think about it, exactly the same
statistical problem that we have in terms of
recruiting into clinical trials. More recently, oh and the
epidemiologist of course with Rubin and the
counterfactual model. More recently, Judea
Pearl has really laid down this groundwork
of thinking about, being really, really
explicit about defining the conditions under which you
can draw a causal inference and whatever all the
things that could go wrong and how confident are you in your assumptions
that your making? That sort of causal graph, do
calculus approach of Pearl. And then the machine
learning people, Mark van der Laan and Sherri
Rose, to name a couple, are putting machine learning
into a causal framework and using machine learning
as a statistical estimator. So we have lots of methods. And observational
researchers in economics and epidemiology just
don’t pull these methods out of the hat and use them
for a particular study. Methods are designed to
address very specific issues that a particular study
is likely to encounter, give the nature of
what the question is and what the data are. This is a great paper. I’ll snuggle up to Sebastian and say, this a great paper, Sebastian. (audience laughs) By Jesse Franklin and Sebastian. But it really hits some
of the same points. And Til it was just talking
about having active comparators, new users, high dimensional
proxy adjustment. But some of these things we
also have to be careful about. So the high dimensional
proxy adjustment can introduce colliders, we
have to worry about that. Controlling for
medication adherence, this is done, but
it’s often done wrong in observational studies. In particularly
the wide-spread use of medication possession
ratios is problematic and that’s not the way
that one should do it. So many different issues. But we need to be thinking
about these ahead of time. And that suggests that design maybe much more
important actually than the statistical methods that we’re using to estimate
the treatment effects. We don’t have a lot
of these comparisons, direct comparisons of
observational studies that have attempted to
replicate clinical trials, but we do have some. And this literature is growing. And I think one of the
things that’s interesting is that there’s an
increasing focus on doing the observational
study for products that are on the market for trials that ongoing
and doing the study before the trial
result is available. So you’re not sort of aiming at a target result,
but you’re trying to actually
duplicate the design. And the Noseworthy et
al, CABANA replication is a good example of that. All of these found
similar results. A classic example
of one that didn’t was the Nurses Health Study. And so we had 10
years of experience with the Nurses Health Study, indicating that hormone
replacement therapy was protective of
cardiovascular disease. And then the Women’s
Health Initiative came out and showed
exactly the opposite. And immediately, people said
it was ’cause of randomization, that randomization
was the problem. But it turned out that it
wasn’t so much randomization as it was the length of time since menopause for the women
and differential follow-up. And when that was adjusted
for any observational studies, the results were very similar. But again, this was
one of those situations where you had a
result to aim at. So I just wanna
tell you about this, several examples of
large-scale efforts to do clinical trial replication that are underway now. And one is going on at OptumLabs in collaboration
with a multi-regional
clinical trial center at Brigham and Women’s
is called OPERAND. And there’s a large
stakeholder group that’s associated with this. The idea of OPERAND is to have
two different academic groups replicating the same two trials, but have them be separate
from one another. And the two trials
are the ROCKET, atrial
fibrillation trial and the Lead2 diabetes trial. One with a clinical
endpoint, the diabetes one. And ROCKET AF one being
one that you could measure with claims data. The two universities
are Brown University and Harvard Pilgrim
Health Care Institute. And the idea is to have
them use different methods, different data, claims
data plus clinical data, and understand the
decision-making
of the researchers and how that influences things. So how did they decide,
how many, what variables to put in a propensity
score match? How did they code the endpoints for these trials, and the inclusion, exclusion
criteria and so forth? And these are like,
these are three of many different sources of variation in
observational studies. But the idea is to sort
of see if we can begin to understand the importance of these different variations. So we have two
measures of agreement. One is regulatory agreement which is defined as
having the same sign and statistical significant. And then the other
is estimate agreement where the point estimate
of the observational study falls within the 95
confidence interval of the average treatment affect estimated in trial. And these were designed
the sort of line up with the duplicate
study that’s going on at Brigham and Women’s,
funded by the FDA, so that we have some sort
of general similarity in terms of measurement, means of measurement
that we’re using. This is a result of just some of the preliminary results
from the trial from OPERAND. At the top, this is
for claims data alone. And looking just at methods, it’s not a different data
or the decision-making of the researchers. In fact, just sort of
avoid the potential of feeding information to
the different research groups about the methods. I’ve intentionally kept
this very non-descriptive in terms of what
the methods are, but I’ve interweaved the methods used by the different groups. So this reflects both
Brown and Harvard Pilgrim. The very top
estimate is the trial and the standard errors
around the trial. And you sort of see, going down, in general as you
move down towards the bottom of the page, the methods are getting
more and more sophisticated in terms of dealing with
things like sensoring and adherence confounding within
versus probability weights and the like. The conclusion is that every
one of these trial replications so far, using just
the claims data has replicated the trial
on the basis of both the policy replication and
the estimate replication. So just one last comment that I was asked to mention is what is the potential for
supervised machine learning methods in drawing
causal inference in observational data? And this is a very
interesting question because machine learning methods have not been used for causal
inference, generally speaking. They’re used for classification and prediction. We have, just as
in the epidemiology and econometrics literature,
we have tons of methods here for doing those predictions
and classification exercises. But there’s a couple of
ways forward I think. You know, one, sort of
the brute force method is to say, let’s just use
the machine learning methods “to identify new features
that we can then test.” ‘Cause, you know, if we also
test what we already know in terms of our hypotheses, we’ll never move
out beyond that. And so the idea of
using machine learning on the data to identify
the various features that are correlated
with the endpoint may give us new things
that we can test. But then we can
put those into our standard causal
modeling frameworks. The other is the
sort of Sherri Rose, Mark van der Laan
kind of approach where you’re using
machine learning, but you’re actually
putting it directly into a causal
modeling framework. And this is relatively new. And there is some evidence
here in this paper by Megan Schuler and Sherri Rose that provides some
simulation estimates from on targeted
maximum likelihood. I recommend this paper
just as a really brief, step-by-step description of what targeted
maximum likelihood is, so if you really wanna
kind of learn about the intersection, potential
intersection of machine learning and causal inference. It’s certainly one of
the promising methods. So I’ll stop there, thanks. – Great, thank you. (audience applauds) Okay. Sebastian. – Thank you very much. Thank you very much
to Til and Bill for setting this
up so beautifully. This is equally endorsed
by Executive Board of the International Society
for Pharmacoepidemiology that is working for
decades already, on issues, how to analyze secondary data that is generated by
the healthcare system. I want to highlight
that all the talks you have heard before
on data quality are enormously important because all the analytics
that we’re doing is really based on
the underlying data. My mother taught
me very early on that my attention
span is way too short to be good at data generation, so I switched over to analytics. Trying to accept
there are issues with these data,
quite clearly, right? But understanding, Patrick
has said it already, right, understanding limitations
and then building analytics around this in order
to accommodate that. How far can we go
understanding that? We love randomized trials. We all love randomized trials because they’re based
on randomization because of a controlled
measurement of the outcome, a clear measurement,
adjudication, but also because they’re
easy to understand and easy to communicate. The sweat and tears
going into planning and doing a trial,
but once they’re done, they’re easy to communicate
and to interpret most of the time at least. So we thought through and is very much reflecting
what Til and Bill already said. What are the key characteristics in non-interventional studies that make us confident
that we will get to causal conclusions when we study the
secondary data. I will not dabble
on this very long as Til mentioned this already, that when you work in the
target trial paradigm, you envision, you
conceptualize the trial that you would want to do, but you can’t for
whatever reason. How would you best
emulate that trial? And that very
clearly, very quickly gets you to this
new user design. And also you will
realize quickly that you’re better off if
you have an active-comparator if you don’t have the luxury
of baseline randomization. Now here’s some examples. Now on the left side
is the column, RCT and then two types of
real-world evidence studies, one with new users, the
other with current users. Whoops. And then Bill mentioned already the big pain point
of epidemiology. The Women’s Health Initiative
hormone replacement therapy and coronary heart disease, incidents of coronary
heart disease, showing this increase of risk in the first two
years of treatment. Miguel Hernan did a reanalysis
of Nurses Health Study, applying the new
user study design rather than current user design and came to a qualitatively
similar conclusion as at randomized trial which is increased risk of
CHD the first two years. And then on the very right side
you see the original papers using current users
rather than new users. And you see this tremendous
beneficial effect that led to cardiologists
happily prescribing HRT. So another example is here
Dabigatran versus Warfarin. The randomized trial
showing this reduced risk for stroke and systemic emboli. The new user design showed, kind of mimics that result quite nicely. And there’s this extreme
finding by a database study using current users that
shows this tremendous increase of risk of stroke
because of this risk selection of people who are
discontinuing one treatment and moving over to
the other treatment. Active comparators, we
spoke about that as well. It is an example, statin
all cause mortality in elderly patients, 65 plus. On the left side,
Pravastatin Pooling study. It’s about a 20%
reduction in mortality. With an active comparator,
we can kind of mimic that if you go to the
non-user comparator, you see an overly
optimistic result which is not really reflected
in the randomized trial. Similarly for hip
fracture statin. Hip fracture, there’s no
relationship between statin use and hip fracture shown
in multiple large
randomized trials. An active controlled against other lipid lowering medications mimics that finding. And then non-user studies show this enormously
beneficial finding. I don’t know how journal
editors don’t get nervous when they see effects
of larger than 50% because these things
should be clearly in the drinking water. So focusing on number
five here, avoid known design and analytic flaws. There are a bunch
of design flaws, and Til mentioned
some of them already that we know. We understand very well. They have published
many papers on this, yet we see them
over and over again. And this example here,
a more recent example on Canagliflozin
or SGLT2 inhibitors and all caused mortality
where the canvas trial shows just 13% reduction. All caused mortality. The real-world evidence
study with no immortal time in Sweden done mimics
that very nicely, but other studies like
the CVD-REAL on EASEL is suffering from this
immortal time bias, I cannot go into
the details here, and show this enormous,
again, mortality benefit of 50% or more. Again, we know what want wrong. I think what is
enormously dissatisfying if you have a non-interventional
and randomized trial, they’re different and you don’t
know why they’re different. In most cases, I would argue we actually know why
they’re different. In most cases, luckily, we
know what to do about it. In many cases, we
would also say, “Well, we need to do
a randomized trial “and we need to be honest
enough to ourselves “to know the difference.” And this gets me to the point, so how can we move beyond
that and help reviewers of this evidence like the FDA to quickly and confidently
and completely understand each of the studies
that are submitted? So we talk about transparency
of the implementation, the protocol,
possibly registration, the reproducibility of the
implementation of the study, and of course then, the
reliability and robustness of the findings. Nothing wrong with sharing
SAS code or data code, whatever, plus the
patient level data because what happens is you
get the data, you get the code, you execute it, you get
the same finding, right? I haven’t learned
anything with this because I cannot test whether what I wanted to
achieve with this study was actually implemented
the way it was because I cannot read that
code, that programming code. So it is disconnect between
the intention of the study and the actual
implementation preserves in this, even if you
share programming codes. So what we need is this
kind of visual description of the longitudinal
study designs to quickly give an overview of what the study design is,
the longitudinal study design, as well as recording of
the study parameters. What are the parameters
that you pick? And this is very much
motivated by Sentinel. There’s now an activity
going on that Shirley Wang is leading to get FDA
and industry consortion in order to identify
how could we structure such a table? Which is very much in line
of course with the joint task force between
ISPE and ISPOR. The other professional society that centers very much
around the analysis of secondary data. A complete list of the
parameters that we need in order to produce
real-world evidence findings. And Shirley, when they
had actually identified 150 published database studies and replicated them, 150 database studies, in the same data, the
same years of data, applying the same methods
that the authors reported in their articles, right? You should get exactly
the same findings. My friends from the
physics department they just say, “Sebastian,
it is nuts what you’re doing “because it should all be
around the green line, at zero.” They aren’t right? So there is something
along we heard already why in all the previous
presentations. We need to understand, overall,
I think it’s not that bad. ‘Cause the question’s
what’s implication really? But we need to explain why do
we have these outliers here. And what is the implication
for the point estimates then? Again, these should
all be on a diagonal. We do have these outliers there. We need to understand what
exactly was not reported in these studies so that we could not replicate
these studies. In the same data,
in the same data. And then of course,
process, process, process. As academics, we don’t like
that, but appreciate that. Decision-makers need processes. we have exit paths, exit ramps. We say, “I cannot go forward
with real-world evidence here. “I need to have primary
data collection. “I need to randomize
a baseline.” You want to register your
study at clinicaltrials.gov. You want to enable the regulator to do sensitivity analysis on the patient-level
data in order to check the
robustness of the data. And then of course, validity. Bill already mentioned
the RCT duplicate. I’m not going to detail. I have done this so
many times already, where we are identifying
30 of our RCTs that are completed,
seven ongoing, and we are trying to replicate
them in real-world data. Oh, here is an example of
one of this predictions that we had made which
is now published, predicting the Carolina trial that is LINAgliptin
against Glimepiride, two second line
anti-diabetic medications. Main outcome was
MACE, three, 3P-MACE. We predicted a
null finding there. We also predicted a substantial
benefit of LINAgliptin with regards to hypoglycemic
crises, hospitalizations, so severe hypoglycemia. The trial was then
a half-year later. It was published
at the ADA meeting. It’s now actually
published in JAMA, the same finding with regard
to the main outcome, 3P-MACE. The affect on hypoglycemia is even more extreme
in the trial. And when you look then, their
definition is very different. It’s very much driven
by lab test parameters and things like data that
we did not have available. So that explain, we can
now at least explain why we have come to the same
clinical recommendation, nothing wrong with that,
but the point is different because of the
measurement issues. And that leads me to
this proposal here. In order to kind
of this translation form the fairly complex
real-world evidence to the decision-making,
do we need something like evidence dossiers where a neutral
party kind of checks the relevancy of
the study questions? So study question reflected
in the actual analysis. What about the validity? Can we replicate the finding? When we do robust analysis,
sensitivity analysis, do we see the same findings? And then finally, are
the data transportable from one data source
to another data source? Thank you very much. – Thank you. (audience applauds) Okay, thank all three
of our presenters for those presentations. We have some comments now,
starting with Lucinda. – Oh, great. Thanks for having
me on this panel and I think what I’m
gonna say builds quite well off of what Sebastian
was just talking about. So when we think at ISPOR
and with our partners, the good science and
real-world evidence space requires four
supportive elements. And we’ve talked about
some of these already. Adequate underlying
data quality, appropriate methods,
transparency of processes, and reproducibility
and replicability. And one of the places
we’re focusing a lot on now is in this transparency space, especially for these hypothesis evaluating treatment
effect studies or studies looking
at causal inferences, especially in secondary data that are specified
for use in regulatory and other decision-making. You know, transparent data
practices include a priori remuneration of
research questions, objectives and analysis plans
that allow other researchers to have better chance of
reproducing the result and then assessing the
quality of this study. And a lot of times, that
is missing, I think, from real-world evidence in a way that we see it
from clinical trials. Additionally publishing and
posting the results themselves, even from a negative
study can put that study in context with the results
so that you can understand where studies are
differing or converging. And over time, if we’re
able to see those results in the places where people
are able to access them, we can add to the totality
of evidence available to decision-makers so
that we can understand more clearly that totality
of evidence that we have. And while real-world evidence,
yeah, as we’ve said over and over today does not
replace clinical trials, it certainly is additive to our understanding of
the treatment effect. And transparent study practices, including registering studies in public-facing
databases can help elevate the credibility of that
real-world evidence. So while registering studies a priori is not a
new recommendation, we’re certainly realizing
that the registration process for these types of
studies is not really fit, especially for secondary
data use studies. And a high proportion of
real-world evidence research does fall into that category. We also have no real
standards or checklists in place to talk about what
parts should be registered or what variables of the
study should be enumerated, but I think we have
a pretty good idea. So we’re working with
current registry holders to understand how we
can increase usage or whether we actually
need new places for people to
register these studies so that we can engender
this culture of transparency within these studies so
that we can understand the methods, we can understand what the researchers were doing and we can see the
results in the end so we can put all of
this into context. So anyway, I’ll stop there, but I think that’s one place where we see there’s
a need and a lack. – All right, thank you. (audience applauds) Patrick. – So I was tasked with
being provocative, so I will try best
at doing that. – Usual task, I think. – That’s usually why
you ask me here, Mark. So reflecting on the day, I really enjoyed Peter
Stein’s opening comments. And one of the things
that he drew us to was the notion that we
should be making sure that we establish
the appropriate
benchmarks of standard. And he specifically cited that
we already have regulation for adequate and
well-controlled investigations. And if you dig into those,
they really are just talking about how do you manage
threats to validity. And if you break down the
seven criteria that are listed, there’s investigator
bias, selection bias, measurement error
in the outcome, measurement error in the
indicated population, there’s confounding, and
there’s model misspecification. And what I would say is
that as Til laid out, as a pharmacoepidemiology
community, we’ve made tremendous progress on each of those
items individually. And we’ve seen
several presentations, and a lot of the discussion
really focused on threats of confounding as one
that we tend to jump on. But in fact, all of these issues are actually quite prevalent. And just to use the example
of investigator bias, when Sebastian talks about the
need for sharing source code, in part, this is really
about investigator bias as it relates to
prespecification and making sure that you’re
fully transparent and sharing. It’s not about that
everybody has to read code, so much as it’s about
that everybody needs to have confidence
that the result was prespecified and generated, just like any study would do. To that regard one of the things that we’ve seen as a community
is that while we seem to be evolving towards standards for each of these elements, we don’t necessarily
see all studies following best practices for all of these threats to validity in a
consistent fashion. And when we, this morning,
we had two sessions, I see this happen quite a lot, we have separate conversations that talk about
quality of evidence, oh, I’m sorry, quality of data, and then we separately start
talking about evidence. And I think it’s actually time that we start trying to think about how do we actually
quantify quality of evidence? So that’s why I’m excited
to see projects like OPERAND and the Duplicate
Project going on. Bill highlighted
specifically that the reason we wanna compare real-world
evidence studies to RCTs is because we trust
the evidence from RCTs. And I think it’s important
for us to actually think about like why do we
actually trust the RCTs? And a lot of people
jump to the fact that we trust it because
it’s randomization. But in fact, we trust
because it satisfies all those threats to validity. And in fact, if you would
just take away randomization, but actually
prospectively collect data in the regimented way
of a randomized trial, we’d get a probably
really good answer without the randomization. I also think it’s
really important from some of the earlier
conversations today to realize that if real-world
evidence was reliable, we do not expect it to
match randomized trials. So for example,
real-world data is bigger than randomized trials. That means that criteria
that Bill and Sebastian are using about statistical
significance agreement will not agree because
the observational study will be bigger and more
likely to be significant than the underpowered
randomized trial. Real-world data is going to be measuring
effectiveness, not efficacy. So this issue that Til brought
up about continuous length of adherent exposure,
that’s just different in randomized trials
versus real-world evidence. And it’s not that’s it’s wrong, it’s just that it’s different and should produce a
different estimator. And then of course, we talked
about generalizability. And I hope that
everybody took charge what Peter Stein
said this morning. He said, “We need to test this
question of generalizability “rather than just waving
our hands at saying “randomized trials
are not generalizable “and real-world data are.” We actually need to do some
more empirical research to prove that. So a couple points that
I just wanna highlight. Bill and Sebastian are
both doing projects that I think are gonna be
providing us useful information, but both of them
made an assertion that they’re selecting a metric for how to assess
study agreement. In work that we’re
doing within OHDSI, we’ve actually demonstrated
that the selection of what metric you use
actually matters quite a lot. And actually, I’ll highlight
that in Sebastian’s slides, I really look how he tried to
provide examples of studies, and then he said, “This one
matches and this one doesn’t.” And he’s got his
little smiley faces. What’s notable though, is if
you look at those smiley faces, they do not agree
with these statistics of statistical
decision agreement, but instead as he rightfully
said, qualitatively similar. And so either we need to
think about empirical measures or we need to come up with
the Sebastian smiley face test for study agreement which I’m okay
with, that’s fine. I think it’s actually
really important not just to decide what
metric we’re gonna use, but also to a priori define what’s our expectation
of agreement? Because one of the things,
since we know that trials shouldn’t match,
randomized trials, even if we’re doing
the best job possible. We need to know that if we
use one of those metrics and it gives us a
number of 80% agreement, is 80% good? Is 80% bad? You know Bob Temple
will gonna look and say, “Oh, it’s not perfect, “so that’s gonna be a problem.” Whereas Til might say,
“Oh, 80%, that’s better “than we should’ve
thought we could do.” Instead we need to actually
have a quantification of expected value of
this agreement statistic. So that then a priori when the
duplicate study finishes out, we can actually
know whether or not we did a good job. So from that perspective, I
think benchmarking is useful. In OHDSI, we decided to
actually ask the question, how well do randomized
trials replicate each other? And we use that a benchmark
’cause I don’t think it’s reasonable to expect that
RWE’s gonna replicate trials any better than trials
replicate trials. And so what we did was we
actually looked at hypertension as an example. We looked at all of trials
that actually contributed to the hypertension guidelines. We identified 31 pairs of
trials in observational studies that we could compare. And what was interesting
was that we actually saw that a priori, the
performance of, or replication of
real-world evidence was as good as trials
replicate each other. And we thought that was
actually encouraging despite the fact that their
our agreement statistics did not show 100%. And so I think it’s really
important for us to frame what we’re trying to accomplish as we move into this space. So really trying to define
what does replication and agreement actually mean? – Thank you. (audience applauds) All right, Kristin, you’re next. You have a couple of slides? – I have two slides, yes. Should I go up there? – Or I can do it
for you if you want. – Okay, just two. Okay, so thank you for the
opportunity to participate. I wanna apologize for my voice. I’m recovering from a cold. I an employee of Eli Lilly,
but the views expressed are mine as an
individual researcher. So I think we all know
that substantial evidence of effectiveness generally
consists of adequate and well-controlled
clinical investigations that show the drug will
have the claimed effect. In the essential
characteristics of adequate and well-controlled
investigations are described in the
CFR, section 314. It’s important to
note that regulations were written for
interventional studies, but the principles
may be applied to
observational studies. And I’ve bolded here the fourth
and seventh characteristics because they relate
to the comparability
of treatment groups and the adequacy of the analysis which are key concerns
in observational studies of treatment effectiveness. As Til and Sebastian noted,
in clinical practice, treatment decisions are
based on physician judgment, patient characteristics
and other factors. And observational databases
may have inaccurately recorded or incompletely
recorded information or other missing variables. And so the result can
be bias and confounding which compromises the internal
validity of the study. So in order to support
regulatory decisions, evidence needs to be of a
sufficiently high validity to allow causal determinations
with confidence. And there are formal
methods, as we’ve seen, to quantify causal effects
from observational data. However, the validity of
those causal inferences depends on the adequacy
of expert causal knowledge about the system under study and the credibility
of the assumptions that are needed to interpret
the estimate as causal. If you could go to
my second slide. So the standard
bias control methods rely on the assumption of
no unmeasured confounding which is often violated,
at least to some extent, in observational studies
of treatment effectiveness. Despite the importance of
unmeasured confounding, it’s impact is rarely
quantitatively assessed. My statistical colleagues at Lilly recently
published a paper that’s a review in collaboration
with Baylor and Stanford that identified 15
methods, analytical methods to address unmeasured
confounding. You might wanna note,
this actually builds on work from a paper
from Sebastian in 2006. And so first researchers
should determine if unmeasured confounders will be strong enough
to create bias. And if the answer
to that is yes, then the sensitivity
analysis should be performed. And the options for
sensitivity analyses depend on the availability
of information about the unmeasured
confounders. So whether that’s
no information, internal information or
external information. And it also depends on the goal of the unmeasured
confounding assessment. So my comments thus far
focus more on the validity of an individual
observational study. However, no single
study, whether randomized or non-randomized can produce
uncontroversial estimates of treatment effect or
causal effect rather. Researchers should use different
methods and data sources, and the estimates
from each can be used to bound the magnitude
of the true effect. So it’s more about
creating a body of evidence from studies that are
conducted according to established methodological
and procedural standards. And that show clinically
meaningful effect sizes that are robust to
variations in design choices and in sensitivity analyses. So understanding that
observational studies do not provide a
priori confidence, that their estimates are
causally interpretable, it’s necessary to identify
appropriate circumstances where they can be
used to support product effectiveness,
determinations, and labeling changes. FDA has used observational
data in limited instances in the past to space product
effectiveness determination. And there may be
other circumstances where there’s an
acceptable level of risk in which observational studies could be used as primary
or supportive evidence needed for labeling change. And this depends on
the clinical context, the regulatory context and
the relevant prior evidence. So some potential examples where depending on
the clinical context, observational
studies may be used to support a labeling change could be long-term
effectiveness in safety within the approved indication, additional endpoints or claims within the approved indication, minor changes in a
patient population, a change in the backbone therapy or combination therapy. Particularly when those,
there are backbone therapies that maybe viewed as
largely interchangeable. Maybe a change in
the prior therapy or therapy class
that’s specified in the indication statement. And even in some
appropriate clinical context that observational data could
be used to add an indication. We know that the FDA considers
the totality of evidence when evaluating a drug, with each successive piece of
data building on prior data to provide the quantity and
quality of evidence needed. So when considering a
potential label change that may rely on
real-world evidence, one approach would be
to determine the nature of the labeling change, identify the target
label language, and then identify and evaluate the relevant existing evidence from preclinical and clinical
pharmacology studies, clinical trials
and other sources. And then develop a plan
for the data package to support this labeling change. And you would propose
new observational studies and/or trials to provide that
additional evidence needed. Thank you. – Thank you. (audience applauds) David. – Okay, good. I was unprepared for the fact that this may have been
our first FDA, Duke real-world evidence meeting
to generate its own meme which I guess would be
the Sebastian smiley face turning into a frowny face and
back to a smiley face again. And I’ll know that
it’s actually caught on when the O in OHDSI
has been replaced at next year’s OHDSI
Symposium with that meme. So I expect to see
that next year. Okay. But you know, I’d actually like to
pick up on something that Mark said I think at
least twice this morning which is, and first, and
you referred to data, you know, “This is hard,” or “This sounds hard.” And I think we have the
same situation obviously with observational analyses, that you know, done well, they take resources. They take
multidisciplinary teams. They take time. And more and more, they take
significant infrastructure both to execute and to properly
report to decision-makers. And Duke is everywhere in DC. I was at a DCRI
meeting yesterday focused on clinical trials,
but we had an obligatory real-world evidence
session and Bob Temple and I had a little cross
fire situation last night which was fun. And it was actually more
aligned than less cross fire. But you know, I did mention
when the 21st Century Cures Act was first passed, actually it appeared
from the language that randomization
was sort of written out of real-world evidence and there was a lot of things
on the back end of that. And the legislation
was actually amended. And so obviously,
randomization is included in our RWE program. And but one of the
things I had mentioned was right after that sort of pre-amendment
language came out, there were a lot of
very excited people who sort of like, “Great,
we’re gonna whip together “some observational
studies from secondary data “and we’ll be sending
in supplemental NDAs “in like a few weeks.” You know? And this is wonderful. And you know, my first
thought at the time was, “That’s wonderful.” Right up until the point
that your competitor does the same thing and suddenly, there’s
an arms race of tons and tons of these types of frowny face studies
circulating around. So I think I will just
say a large amount of frowny face studies doesn’t help anyone. It might give the illusion
of helping the sponsor right at the beginning, and then it only
hurts them later. It doesn’t help payers
and HTAs, providers and most importantly, patients. So I think we all recognize that this won’t
and can’t be easy. Although, we
certainly wanna strive for efficiency and you know,
obviously, maximum utility. So Peter Stein did talk a
bit about persuasiveness. And obviously, you know, I’m
not making policy statements or anything like that, but we’ve sort of tried to model this in the Duplicate Project where we have the simulation
of protocol agreement with the regulator at the
pre-feasibility stage, then the feasibility stage. And that’s something that we
model in the duplicate project. And certainly, people
have heard me publicly say that it’s always
going to be fruitful in terms of actual submissions to actually engage
with FDA early on, just as in the trials world
where there’s tight engagement in the IND stage and then
the NDA stage or BLA stage. So the same thing holds true
on the observational side. And everybody here has
spoken, importantly, in fact it was really
interesting what Lucinda said about registration and
the need to right size registration solutions
for people engaged in this kind of work. And this is certainly
something we would like to see as well. And then the next
part of all of this is obviously is transparency. So you need the
appropriate reporting. And Sebastian referenced
some work going on with effectively
groups both inside and outside the agency thinking
about how do you report, so that the full
implementation parameters of the study are
really understood. And then there’s
sort of the trust, but verify aspect
of all of this. So even if all those things
at the beginning work, you rely on things
like replicability, sensitivity analyses
as Til mentioned so that you can go
through that sort of trust but verify process. And then of course,
the robustness checks like Sebastian mentioned with the even
transportability in some cases may be an issue. There’s a number
of things out there to think about. And again, I think that
while this sounds hard, I think in the
end, this is better for the field, ultimately, better for patients. I’ll just take a couple minutes just to give you some
very quick updates just kind of on the status of some of the FDA projects. So first of all, with the Duplicate Project. So five of seven
attempted trials have been completed. Those trials are
posted on CT.gov, the protocols are. We have not yet,
from the FDA side, released the results. But we will. We’ve had certainly some
interesting feedback from some pharma companies. I was just in the UK
a couple weeks ago and one of the companies
who’s had one of the trials said, “I would love to get
the people at my company “and we will do this. “The people who worked on
the trial that you replicated “and the people in our
pharmacopeia department “and stats department
and all other areas “and we really
wanna dig into this. “So we can’t wait till
you make this public “so we can really
think about this “because we operate on
both sides of the coin.” Just to say this is in process. There’s another set that the FDA has already looked
at the protocols and they’re in the
feasibility analysis stage and there’s a set after that where it’s entering
the feasibility, sorry, I take that back. We reviewed the protocols
and they’ve conducted the feasibility in
their proceeding. And then third is we’ve
reviewed the protocols and then they’re going
into feasibility. And then we’ve reviewed
the feasibility and they go out. So there’s basically another
couple sets of trials coming out soon after that. There is, Sebastian
briefly mentioned there’s a reporting
project and there was also, this was more driven by
the Sentinel program. And there was like a
visualizations paper that was focused around some
of these reporting issues. So just a couple extra
new demonstration projects that are going forward that
are potentially of interest to this area. So one, picking up on what
Bill Crown was talking about, we are going to
have a basically, an assessment of causal
effect estimation using machine learning methods and non-parametric
statistical methods. So actually, he mentioned TML which will be a
major focus of this. And the unique sort
of twist on this is that we’re trying
to use randomized trial data as like in a sense,
like a source of truth. And as we go through
that process. And then Til mentioned
the assessment of uncontrolled confounding work which will kickoff from UNC. And there again,
we’re going to try to use some trial data, as in a sense, a
source of truth. And they’ll be sort of a
team A and team B approach where data are perturbed
in certain ways that are knowable and
clinically relevant. And then team B, in a sense, works on the
sensitivity analyses and we see how things work. Because obviously, there’s
an endless universe of sensitivity analyses and we want to
try to find things that really relate to
the types of questions that we’re asking in this
SNDA, SBLA-type space, 21st Century Cures-type space and evaluate them. So I think that
that kind of covers a lot of what’s going on. We also have an additional
replication project kicking off with
the EMOA, CERSI. Just to add the
amount of replication, we’re obviously eagerly watching what comes out of OPERAND, what comes our of OHDSI. You know, a couple
learnings from this. So far, I think from replication
is comparators matter. It may be easier
to replicate trials where the trial itself
compared one agent to another versus what is often
common where you have what we call add-on trials, where there’s sort of
usual care, plus placebo. And that can be
more challenging. So this has been a great panel. It’s actually hard. There’s probably so
many things to react to. And I will, I think
I’ll just quickly react to one thing Patrick said because I don’t want
people to misinterpret how we set up the replication
projects internally or the kinds of conversations
we had with OPERAND. So first of all, number
one, we fully understand that even just for
statistical reasons alone and the absence of bias,
due to random error, we will not have
100% replication. And we do understand that. We were not necessarily going to have like a FDA
scorecard at the end where, like you know,
you see me next year and it’s like 29 out of
30 and everybody’s like, okay, let’s party
like it’s 1999, or two out of 30 or
something like that. This is actually a little
more nuanced than that. We think there may be
differences by therapeutic area. Sorry, the 1999 reference
is also due to the fact that guidance is due
by December 31st 2021. So someone at FDA
may still be working on December 31st, okay. But hopefully it will not be me. It should be a lawyer
in OCC, hopefully, who’s back doing that. I hope to be out in Time
Square or something. But anyway, so I think
they’ll be differences by different therapeutic areas. So Patrick, I’d like to
allay some of your fears, I suppose, from that standpoint. I think those are
valid concerns. But yeah, we don’t do
this as a scorecard. So I actually don’t
wanna take anymore time so people can ask questions, but great panel. – Great, thanks David. Great job. Appreciate the comments
and the updates. So I wanna thank
all the panelists for presenting a tremendous
amount of methodologic work. And while clearly
this, like other things we’ve talked about today
is a work in progress, hard work in progress
I guess I would say, there was a lot here. And let me state a
hypothesis and see what some of the panelists
think about it. So with the emphasis that
you all put on design and pointing out
that the difference between randomized
controlled trial for evidence and
real-world evidence or observational studies isn’t
just randomization or not. It’s this whole set
of design features. And if you get design right, that in many cases, seems
to lead to a lot of matching between what happens
in clinical trials and the observational studies. So the hypothesis would be, it may be possible to
develop a set of procedures and techniques to use,
some best practices for conducting valid
evidence generation in the real-world context, in
the absence of randomization, with good attention to design, with attention to
things like transparency that’s meaningful, not just
reproducing the SAS code, reproducibility and
actual replications. The evidence dossier
with the smiley faces. The others, Patrick’s
description of the addressing the seven threats to validity. Is that a hypothesis that
you’d all agree with? And if so, what are the
next steps to get there? – Well certainly, we need to
do something in that direction. The thing is that when you set
out with a randomized trial, you have a blank sheet of paper and you can design and enforce all sorts of control mechanisms. And you have high confidence
when you have planned out the trial that at the
end, you will have a result that lends itself to
causal interpretations. This level of confidence,
how can we get that level of upfront confidence in
real-world evidence studies that are often built
on secondary data, data that you have
not collected, where you were not in
control of what to measure, when to measure and
how to measure things. This is where most of
the issues come from. And there’s this added issue of the lack of based
of randomization. But really in that order
I would say, right? What we showed in this
2018 or ’19 paper in CPT kind of, some guiding
principles already. And I think we have
to fill that in with people like Til, like Bill and many others
in our community. The learnings that
we have done, OHDSI, to build a framework
that upfront provides us with confidence that we will get it right. Because if the decision upfront
is do a randomized trial versus do a real-world
evidence study, right? That you do or the or
maybe you do both, right? But that is what we
need to work towards. – And I just wanna
make the comment about the value, I think, of
the trial replication work doesn’t so much lie in the
replication activity itself, but rather this issue
of demonstrating
that whether or not observational data and design
and methods have the ability to be able to replicate the
treatment effect of a trial. So if you study the
same patient population with the some inclusion,
exclusion criteria. And so one thing I didn’t
mention about OPERAND is that the intent now is
once we’ve sort of finished this initial baseline
work is to then look at the patients that
were actually treated and release the, loosen the
inclusion, exclusion criteria and begin to understand
how different are those treatment effects
in the real-world population, that effectiveness estimate as opposed to something
that’s more like an efficacy estimate. I have a feeling that this is the ability of
observational studies to generate really
reliable inferences is not uniformly the same across disease states. And some disease areas
are likely to be better than others because the
data we have is better and we have more of those measures that are
really important in driving patient outcomes and the treatment
selection and so forth. And in others, cancer
is a great example, with claims data and
electronic health record data, it’s really hard to measure
things like progression and so forth. And it’s just really tough. And you really, you need
specially curated datasets to use observational
data for those. – I just wanted to add that we also need to move
into the direction of defining bias as
the difference of
the treatment effect with respect to
the true treatment in the population, in
the target population because it doesn’t really
help us to always talk about the potential, small
potential for bias in a randomized trial
and the large potential in the nonexperimental study. If we move away from that,
then we can see that, I mean, if you have a trial
in a very selective population where you whip everyone
into 100% adherence, then that estimate
will be biased with respect to the
treatment effect in the target population. And once we get
there, then this, we talk about the real issues
and the potential biases rather than just randomization
versus non-randomization. And then we can
actually start learning that transportability
of populations and of treatments matters tremendously
in many settings. And that’s pretty much what
we’re ignoring currently. – I’ll add a couple comments. So when we started
our work in OMOP, we were doing experimental
tests to figure out if some of these factors matter. So we found that
study parameters, like Sebastian said the layout, we showed that if
you pick parameters, you get different answers. We showed that if you
choose different databases, you get different answers. And when we finished that work, we felt like we
had done something that was helpful from a
methodologic perspective. But the challenge was,
well how do you do something about those problems
that you’ve encountered? And so within the
OHDSI community, we’ve started by thinking,
“Okay, we need to continue “to do methodologic research, “but we also need to
develop open-source software “so everybody can follow
the same procedures.” And we’ve developed
a bunch tools that are now publicly available. But then the problem we see is, well how do you know that people will follow the instructions
the exact same way? And so to answer your question, I’d within the OHDSI community, we’re actually testing
that hypothesis. So just this last month, we
released “The Book of OHDSI”. (audience laughs) – [Mark] No copyright
infringement for Homer, but. – I would get violated
for pimping a product except the book’s free. If you just go to
book.ohdsi.org, it’s free. – [David] Patrick, you’re
now a true spiritual leader because you have a
book, right? (laughing) – There it is, OHDSI. But the test that I’d say
that we’re actually doing is we’re saying what if
we have the same tools and the same best practices and
the same guiding principles? Do we start to actually
generate reliable evidence? And that’s the hypothesis that we’re trying to
test as a community. So I hope that we’re
gonna start to see that the variation that’s
caused by researchers performing the task gets reduced so that we actually
try to get towards the variation that’s
just inherent to humanity which is the problem that
we’re actually trying to study scientifically. – And that researcher variation, that’s a big source
of variation. So I see no one at
the microphones. I’ll keep going. We got a couple more
minutes for this session. David talked about
kind of a set of steps going forward at
FDA to help address some of the questions that
we’ve just been talking about. So the replication studies, and then you were talking also about some perturbation studies where you have, you mess with the data and see if they can– – Right, right to assess some sensitivity analyses.
– That’s good, that makes this a little hard. I didn’t know what they
were quite shooting at. So those are potentially
good for situations, at least the former, where you can do a
randomized trial. Obviously, a lot of the
interest in the studies we’re talking about
here are for settings where traditional randomized
trials aren’t feasible. So is there, what’s
the lead pathway for taking some of these methods and approaches into
that territory? – And Mark, just to be clear, are you talking about
ICH E10 situations where there’s like
a single-arm trial? Or you are you talking
about something different than that?
– Potentially or others. Just extensions. And again, as you all know, FDA has done a lot of
work with real-world data in non-randomized setting where these kinds
of design issues are adequately addressed given
the potential differences in the population or the
effect sizes and so forth. So there’s a foundation
to build on here. But it seems like from
the discussion today, there’s an intent to
go a good deal further, or at least a possibility. – But clearly, you would
apply the same principles to these extreme cases, right? But still, we would like to know how well they perform, right? And we see these studies, a colleague from
Genentech for example, published a paper wear
they are mimic in situation where they have a pilot
group of randomized trials, I think eight or nine of those. And then take away the
randomized comparison and would replace it with
the EHR comparison group and see how well that is
mimicking the situation. I think they need these
studies in order to anchor what we are preaching
so to speak, right, in these use cases. And the tricky part from
all of these exercises with be, okay, nice. We have done this now 10
times, or 30 times, whatever. What about number 31? What about number 40? How much can we extrapolate
from these findings? So exactly as David
had said already, this is not about can we
replicate, I don’t know, 25 out of 30 things. This is about the
learning experience there. Can we learn from
the mistakes way more than from the successes? – And one comment I’d like to
make as Patrick mentioned this and I think Til did too is some sense, it
doesn’t really make sense to compare a particular,
even a observational study where you’re trying to replicate the inclusion, exclusion
criteria trial, take that point estimate
and set of standard errors and compare it to the
results of that trial. ‘Cause what makes sense
is the kind of thing where if you were to do the
same trial a thousand times, you’d get a thousand estimates and you’d have a distribution. And the same way if
you did a similar set of observational studies, you’d get a distribution. And that’s the expectations
of those distributions that’s exactly the
statistical test. So we’re just sort
of approximating that by these one-off comparisons
that we’re making. – And I guess Mark, to sort of
answer your question broadly. You know, obviously the
original kind of remit in Cures really focused
on supplements rather than original approvals. And a lot of the
single-arm trial situations are original approvals. However, someone can, I
know there’s some people in the room, correct me if I’m misquoting the
framework right now. But it’s my recollection that we actually inserted
into the framework essentially that it’s sort of
like while we are evaluating observational methods
where there are two groups that are being compared, we might as well
simultaneously leverage all of this general effort we have in our RWE space to
also look at RWD and RWE methods in those type of settings, and certainly, a number of
RWD containing submissions to FDA have been in that space since the program was started. – [Mark] Label extension, yeah. – If I, just very briefly. I mean, we encountered this with the trial
transportability work. Do you have an
additional problem that the variables are not
measured in the same way and I presented some results. So we need more
work to figure out what are the effects of this
in the single-arm trials. – [Mark] And Patrick. – Lucinda said something
that was really important. I don’t think that our goal
should just be about seeing if real-world evidence can
replace clinical trials. When we talk about feasibility, obviously there’s some
ethical considerations, but there’s also the pragmatic that when we looked at
the hypertension space, there’s dozens of currently approved hypertension
treatments. And if there’s only 15 pairs of trials that
had the same drugs and the same outcome that
we could actually quantify. So feasibility is
also about the fact that we’re just never
gonna do the trials for the vast majority
of comparisons that we’d like good answers for. And I think that it’s
actually important for us to think about
feasibility as it relates to given that it could be done, but it won’t be done,
so how do actually overcome that real hurdle? – Well, I wanna thank
you all for the comments. And with apologies to
Marc, we are out of time for this session. We will have a comment period
later on this afternoon. But for now, can you join
me in thanking our panelists for a great discussion? (audience applauds) Very important issues. Okay, we’re going to
reconvene at 3:05. So that’s a little
less than 10 minutes to stay on time. See you all shortly. (people chattering) You’re doing a presentation… All right, good afternoon. I wanna welcome everyone
back from the break and welcome you to our
last big session of the day on Opportunities to
Ascertain Real-World Designs and Implications for
Causal Inference. So in this session, we’re
gonna build off the last one, actually build off most
of the sessions for today to discuss opportunities
for leveraging endpoints used in real-world studies. In this sessions, you’re
gonna hear from the presenters about ongoing work,
work that’s ongoing now in oncology to identify,
collect and validate real-world endpoints. And the panelists will also
talk about the applicability of these endpoints to
regulatory requirements and regulatory decision-making. We’re gonna start with a
presentation from Jeff Allen, the president and CEO of
Friends of Cancer Research. After Jeff, we’ll
hear from Sean Khozin, associate director of FDA’s
Oncology Center for Excellence. Then Nicole Mahoney, senior
director of regulatory policy at Flatiron health. Andrew Norden, the chief
medical officer Cota. And Jonathan Hirsch, the founder
and president of Syapse . So we’ll start with Jeff and I’ll turn it over to
you right now, thanks. – Great, thank you Mark. And thanks to the
Duke and FDA teams for inviting us to
have the opportunity to present a project that we’ve been working
with pretty much representatives from
everyone on the panel and a few more today. So it’s a pleasure
to be able to share some of these early
findings with you. Essentially, this is project that we wanted to
identify a case study to begin to look at the
potential for various endpoints that could be extracted
from a number of different electronic health databases that several of our partners
helped to contribute. And our goal was really
to try and understand the extent to which elements that could be readily
extracted from routine care and delivery could inform
post market research. So it wasn’t to necessarily
try and construct a perfect database, but
in order to really try and understand whether
there was potential for some of these nontraditional
endpoints that are more readily
available perhaps than trying to incorporate
traditional clinical measures into post-market databases. So this is a
continuation of a project that we initially started about, at this point,
probably two years ago. And this just gives a summary
of some of that initial information where we worked
with six of the partners that we’re involved
in some of the work that I’ll talk about
today to begin to examine what could be done from
looking at some of these secondary uses of
healthcare data and extractable information. So this just gives
a very quick summary of what we were able to do when we asked all
of them to answer specific questions around
the utilization patterns and outcomes associated with
using immuno-oncology agents, specifically PD-L1 inhibitors in advance non-small
cell lung cancer. And essentially what we found
through this first pilot was the ability for
each of these datasets to extract various
demographic characteristics about their population to begin to identify which
ones might be contributing to difference in outcomes. And then be able to look
at different endpoints, such as things like
time-to-treatment
discontinuation or time to next treatment which are more readily available from electronic
health data sources than more typical measures that are used in
oncology clinical trials such as survival or progression. Which as it turns out,
can be quite difficult to accurately extract from
these types of data sources. So following that data or that initial
research project, which the full findings
of that can be found in the July 23rd, I believe, issue of the “Journal
of Clinical Oncology” and their “Clinical Cancer
Informatics Journal” for the full depth of all
of the various analyses. But what it left thinking and what this
secondary partnership was really built around was not just looking
at the feasibility of extracting these endpoints and how they related back
to what you might expect to see or how they related
to clinical trials, but to begin to do an additional
level of experimentation and think about can you use some of these nontraditional measures in order to
differentiate between two different
interventions, for example. So that’s what we embarked on
over the last several months in this sort of project overview which you can see here. Again, we looked at
the same scenario around advanced non-small
cell lung cancer due to the availability of
multiple different therapies for multiple different
manufacturers that enabled us to
conduct this research without having to
necessarily readjudicate any one clinical trial, but
also the large availability of datasets to try and do
some of this early work. But we wanted to look
at again at the ability to do this quasi randomization. So you can see from the
project objectives here, we again wanted to
try and characterize different factors within this that contributed to different outcomes at the end of the day, but not just look
in the PD-1 treated subset of the population, but to be able to repeat this
in two different options. So we looked at PD-1
treated patients, where those patients that
received general chemotherapy. And again, the
endpoints that are used were similar to the
ones that we extracted in the initial sort of
1.0 pilot to set the stage for this broader
body of work here. And you can see that for all of the 10 different
data partners that were listed on
the previous slide for both the PD-1
included regimens as well as the
chemotherapy based one. All of them performed analyses
on both of those datasets to look at real-world
overall survival, time to next treatment,
time to discontinuation, and for those that were relying on electronic health records, instead of just pure claims that were able to
extract different metrics around progression,
they’ve also extracted from these datasets things like real-world
progression pre-survival. So to dive into a little bit of some of this
top line findings of what the 10 different
groups are able to do, all of these graphs
are sort of represented an objective one. The different characteristics
of the dataset. So in each of these graphs, I know it’s hard to see
probably from some distance, but each of the points
along the way represents the 10 different datasets. Which it doesn’t really matter who they are for
this point in time. I think we wanted to
just see how they related to each other. And then as next
step, we’ll begin to dive in a little bit to see
what some of the differences were and how and what
factors contributed to them. But just to give a
couple of examples of consistencies and
perhaps differences that were obtained through
the different sources of data that
participated in this. You can see on the
left-hand side, the three different treatment
modalities that we explored. So number one, PD-1 plus
chemotherapy, that combination, versus PD-1 monotherapy versus the bottom, the
chemotherapy subset. So these were the three
scenarios that were explored. And you can see that
for some of these, there were differences. And most notable
is probably things like the inclusion of
stage three patients and how different that is
depending on the regimen for these particular
three scenarios here. And that scenario that
I think the groups are very interested in
diving further into, it shows the diversity
of what is included in an advance non-small
cell lung cancer patient when it’s identified outside
in a more real-world setting and how diverse that could be. But if you look, for
example, in the monotherapy on the top-right there, you can see that there’s
a pretty wide variation of about 20% of patients
that are stage three actually being lumped
in there together. So given that it’s an earlier sort differently treated
stage of disease, it’s no surprise
that that perhaps this contributes to a
difference in outcome. On the alternative side, you can see that across
the different populations for each of these three
areas as it relates to male or female patients that were included
in these datasets, that there’s also some
variability between those. Although the pattern
between each of the colors of dots are the same
as you go up and down through the different treatments
that were analyzed here. This particular
graph looks at age which I think is an
interesting characteristics as we think about what is
the relationship between some of the findings from
this real-world dataset versus what was observed
in the clinical trial? So again, we weren’t trying to replicate a
clinical trial here, but we wanted to use a scenario that we had sort of some idea of the outcome. So for each of these treatment scenarios, there have been
large-scale clinical trials that were used as
the pivotal basis for approving these
interventions. So that’s what’s listed
at the bottom here, was the median age across the various different
pivotal trials and the clinical trials. And you can see the independent
of any of the treatments used that in the
real-world settings, the age of the patients, particularly in the
PD-1 monotherapy was pretty significantly older which obviously, could be a
factor that will impact outcomes in the long run should one want to compare
them back to the experiences within the pivotal trials. Just skipping to one more,
I think interesting factor that we saw as pretty
differentiating fact between the demographics of these real-world datasets versus what was included in
the initial clinical trials. On the left-hand side here is
the histology of the patients. So the three predominant
histology in lung cancer. And what you can see from
these in the center peak was the nonsquamous or
adenocarcinoma patients. And on the far-right, for
each of those smaller graphs was the squamous cell patients. And you can see again, denoted
in the little caption below that in the pivotal trials, you saw less than 20% of
the patients be included where those with
squamous cell carcinoma. But in the real-world
populations, these tended to trend higher which is not
necessarily a surprise because if you look at the
incidence of squamous carcinoma in the larger non-small
cell lung cancer space, it is closer to 30% which is what you
see particularly in the PD-L1 monotherapy arm. But again, could be
a contributing factor that one would need
to take into account in trying to understand
whether these endpoints would carry true in comparing
to what we’re seeing in their original
clinical research studies. And then the final
demographic snapshot here on the right-hand side was just the dates of treatment that were included in here. And no surprise with this graph, the PD-1 patients that were
included in these cohorts tended to be more recent than the overall
chemotherapy cohort, largely because of when
these drugs were approved which was toward the right-hand
side of these graphs. But again, will raise
questions, I think, about what the appropriate
comparisons are and how recent datasets
need to be in order to draw any inferences or accurate
comparisons from them. So in terms of objective two which is work that is just
kind of getting underway, but we have a little
bit of a snapshot here to share and discuss. These are two of the
endpoints that were analyzed. So on the left-hand side,
the overall survival data for each of the 10
different data partners you can see overlayed
with one another here. And then on the right-hand side, the time-to-treatment
discontinuation. I think what’s important
to remember here is it’s hard to really
assess from this are these differences
or are these similar? I suppose you could say that in the Platinum
Doublet Chemotherapy time treatment discontinuation, those look like they
overlay really well. But I think what would
be an interesting sort of comparison here, if one were take all
of the control arms from the clinical
trials that were used to approve these
agents for example, I think you’d still see
this level of variability. So to expect these to overlay
base on real-world datasets will probably be an
entirely unrealistic. But looking at this information on endpoints a different way, this just sort of shows it
in a different visual here for each of the
different factors. I think one of the things that the groups are
beginning to think through and we need to dive in a
little bit further here, what are the contributing
factors that lead to differences in
overall survival? And I think they could be
very significant things that have been
readily identified. Things like missing
mortality data or having it come
from various sources that were very different
depending on each of the different partners
and how they’ve constructed their datasets can certainly
contribute to this variability. And completeness in
terms of follow-up time, or different follow-up time that was available for
particularly the older patients that were involved
in these cohorts, or even the likelihood
of patient crossover for some of these
treatment arms. So you can imagine that in
the lung cancer cohorts, this is a frequent phenomenon which certainly
blurs the ability to calculate overall survival. But perhaps on the flips
side a bit more definitively could be things
around the utilization of time-to-treatment
discontinuation as
a potential proxy to indicate this as well. And you see a fair amount of
at least greater consistency. I think of interest, and
we haven’t done these direct to comparisons
yet, but even looking at, just sort of eyeballing
one of these graphs, this is something that drew
my attention pretty quickly. That when you look at things
like the lower right-hand side of the chemotherapy arm, you see around 2 1/2 months of time-to-treatment
discontinuation for the chemotherapies. But looking at the
PD-1 containing agents, you see that, in some
cases, almost double. Which perhaps may be an
artifact of clinical practice. But we saw the same
trends that occurred here with time-to-treatment
discontinuation, when the analysis was run around time to next
treatment as well. Which I think
leads us to believe that there could
be something here in terms of trying
to better quantify and thinking about
what factors contribute to some of these
differences in outcomes to further characterize
and potentially validate the use of some of these
alternative endpoints based on things like
time-to-treatment
discontinuation or time to next treatment as a potential proxy
moving forward. So you know, our
goal for this work, you know, just to wrap up. Obviously, additional
analysis is needed around the potential for
some of these measures to be able to be extracted
and what conclusions you can draw from them. But I think this was
a great collaboration from 10 different
partners who were willing to really dive deep
into their data and begin to look
at what factors were resulting in
differences in outcomes that were being
observed here to try and at least add some
clarity here on things, thinking about future
standards or even transparency about factors that
need to be related if you’re going to use
these types of datasets for this particular
intended use. Our preliminary results indicate that the framework
that was set up here, in order to look at
some of these endpoints and the strategy, do
enable the ability to distinguish between the
different treatment options, particularly looking at chemo
versus the monotherapy PD-1 versus the combo. And I think suggest a strong
rationale to dive into this a little bit further. So in terms of next steps,
we still have a lot of work to do to think about the
different contributors to outcomes here and
to further characterize the different endpoints
that were included in this particular
initial analysis. So that’s top of my number one. The second area that a
subgroup of these data partners are working on with
us here is to begin to look at some sort of
internal validation study where we’ve started to
construct an aggregate of the exclusion and
inclusion criteria that were used in the clinical
trials and are now trying to apply that to
these actual datasets to sort of modify
that population to be sure that the
population better reflects those that were included
in the clinical trials as a way of sort of
gut-checking the data. And if you’re able to
come relatively close in terms of that modified
exclusion criteria, defined dataset, would
it give you confidence in what you’re seeing in
the more diverse population once it’s built into
the overall analysis and thinking about these
potential endpoints. So I’ll stop there, but you
know, this took a large effort from a number of
different people within their own organizations who were willing to come
forward and collaborate here and think about what the
utility and potential of some of these different
endpoints could be, not necessarily with the
intention by any means to replace clinical trials, but in scenarios where a
clinical trial may otherwise not be able to be conducted or additional evidence is needed such as confirming benefit
for an initial approval of extreme magnitude
or something like that. Could there be things
that could be gleaned from additional
post market research that could still be an
indicator of benefit occurring such as maintaining a longer
duration on a treatment? But thanks to all of
them for their work and I look forward
to the discussion. – Great, thank you. (audience applauds) Gonna hear from some
of those partners and some other perspectives now. First, another perspective
from Sean Khozin at FDA. – Thank you, Mark. It’s great to be here. So there are a variety
of different ways to look at the utility
of real-world endpoints, but ultimately, the question
you see on the screen talks to the fact that
we need to optimize our understanding
of what is happening to patients in the real-world? Because as we all know, the experience with
patients in clinical trials may not always reflect
the experience of patients in the real-world,
especially in adult oncology where only 3% of
patients are enrolled in traditional clinical trials. I’d like to connect
some of my comments to what Jeff presented today in terms of
developing a framework for the use of
real-world endpoints and what questions to consider. Really at the core of the issue is the fact that we
have to be confident with the ability of
these of endpoints to allow us to make
causal inference. And at the end of the day, that’s what this is all about. Jeff also mentioned that there’s
some nontraditional methods that need to be used when we
query electronic health record data to develop these
real-world endpoints. And nontraditional
in a lot of cases doesn’t mean inferior to how
we do things traditionally. In fact, engineered
correctly, could be superior to some of the
traditional methods for understanding the
patient’s experience and also drug activity for
making causal inference. So nontraditional is not
necessarily a bad thing. It could be an
improvement in some cases. I’m going to sort of
another general concept. How can we ensure
that time-to-treatment
discontinuation or least time to
have been analyses, real-world
progression-free survival which is based on
tumor dynamics. Very hard to measure
because a radiology report, as we all know is
captured in a PDF file, sitting in the
electronic health record. So how can we have confidence
that these modifications that we’re making are
appropriate to make appropriate and valid causal
inference assessments? Well, these endpoints have
to be analytically robust. Meaning if we apply
the same method, we get the same answer. And this is the same phenomenon that we deal with all the time when we look at devices
and companion diagnostics. So are you measuring
things accurately if you have a real-world
progression-free survival? That end result, is it robust? Is it analytically valid? Can you measure it again
on different datasets and assuming that the
datasets are similar, get the same result? But that doesn’t tell you
if it’s clinically valid. Are the modifications
we’re making appropriate modifications to
make a clinical assessment? And again, that’s something
that we’re very used to dealing with in terms of
correlating these modified intermediate endpoints that
are analytically valid. Meaning the technology
and the processes, the business rules
are consistent. Then they can be correlated with more quote-unquote
traditional measures, like such as overall survival. So these are all things
that were part of the pilot which was a great
learning experience for me personally. And generally, a lot of
interesting insights. The idea of
analytical validation and there are different
ways of phrasing that. It’s very critical to make sure that the business rules that
are applied are consistent. And for clinical validation, obviously having
a way to correlate these modified
endpoints to meaningful, clinically relevant endpoints
such as overall survival, but even patient-reported
outcome data is critical to gain
enough confidence that these are actually
clinically valid endpoints. And based on the
experience so far, things are certainly moving
in the right direction. And I’m not gonna go
through the details of the challenges of structuring
the unstructured content and applying NLP. And these business rules
can be very complex. But something came up
earlier today, this morning. I believe it was Dr.
Rubinstein from CancerLinQ that talked about, well, when it comes to genomic
data for example, we have a digital asset. Because if we go
very, very upstream, it’s all digitized. That because of our workflows, they’re essentially
converted to an analog format and at the end of the day,
they end up on a piece of paper or a PDF file
that’s scanned into the electronic health record. Well that’s a very
interesting way of looking at things. Because as we, if we start
to move things more upstream, we can start to collect
already digitized information, obviating the need for NLP,
natural language processing and human visual inspection. So the idea of moving
things upstream is something that
technology, is feasible. And although there are
organizational barriers to that, something to think about. And why stop at genomic data? In medicine, we have a lot
of amazing digital assets that we for cultural
and workflow reasons convert to analog format. Waveforms, EKGs and EEGs, when they’re generated,
very, very much upstream. They’re digitized and we convert
them into an analog format and exposing them to
human visual inspection. And also, DICOM images. They’re digitized currently. So figuring out ways to
move things upstream, correlating them with
the real-world endpoints that are being
generated right now is a huge opportunity that’s more
organizationally challenging and culturally challenging
than technically challenging. So I’d like to just
end my comments with kind of highlighting a
very interesting phenomenon. And a lot of what we’re
talking about right now, moving things upstream, is kind of like
back to the future. Because if we think
about the late 1800s, medicine and engineering
were intertwined. You know, the first EKG
was 1895, I believe. And very quickly after that, we started to work on
electronic transmission. The idea back then
was never to print out the EKGs and
have trained eyes to interpret them. It was really use
that digital asset. So the first electronic EKG
transmission occurred in 1906. And that led to a series
of engineering innovations. Again, it was very
much intertwined with the practice of medicine that really peaked at the
first telemedicine expo. Which I don’t know if
people know when it was. It was in 1924. And then that led to
amazing engineering and data science
activities around analysis of these waveforms. And peaking in the
’60s with the AI that was being applied back then to a lot of these waveforms. So that is very
important to realize because we’ve been here before. Unfortunately, what happened, engineering and medicine split and now we have an opportunity
to reconnect the dots. So in that sense, these
nontraditional ways are very much ingrained in the way we develop
medical technologies. And again, the ECG is the
first biosensor, right? I’m fairly confident that
if we invented an EKG today, it would not scale. People would be very
much skeptical of it and it would be very abstract. You know, measuring electronic signals to
estimate cardiac function. And it was a very
abstract concept. However, because the orientation
back then was different, these things scaled
very, very quickly. And now we were at
a juncture right now with amazing
technology companies and a very innovation-driven
policy framework to start to integrate
these solutions, start to move things upstream as we validate these
real-world endpoints. And I believe this is something that is a very
interesting opportunity in designing the next generation of real-world studies and
developing the generation of real-world endpoints. Thank you.
– Thank you. (audience applauds) And Nicole? – So I wanna start off
by saying thank you for this opportunity to be here. And as a regulatory
policy person, that’s the lens
that I use to look at these conversations
about real-world evidence. And I recognize that
opportunities for stakeholders to work together through
efforts like the effort with Friends that Jeff described or the Duke Real-World Evidence
Collaborative Agreement are really critical for
advancing the conversation, including on endpoints. And so this Friends pilot that
we heard a little bit about brought together
10 organizations with different data sources
and different types of data, from claims to EHR. Just to give a
little bit of context for people who are unaware,
Flatiron’s real-world data comes from the
electronic health record and it includes the
identified patient data from oncology practices
across the United States. And this includes
structured information, such as lab values and
unstructured information such a physician’s notes or other reports that we’ve
heard a lot about today. And the way that we
get that information out of the unstructured data is by a clinical specialist using technology-enabled
abstraction. And so the panelists
today were asked to reflect a little
bit on our experience with the Pilot 2.0 project, and to frame the discussion or at least this part
of the discussion about one particular example of an endpoint,
overall survival. And as Jeff mentioned, we
aligned in this project on a very high-level
definition of overall survival. But differences occurred across
the different organizations. And how we actually
operationalize that, this comes down to
the business rules and the kind of data you
have available to you and how you implement
your business processes and get to the actual endpoint. And really, these differences
need to be considered when we’re interpreting
the results of the pilot. So we put some of the
preliminary results in slides, but we really have to think
about what we’re seeing. And overall survival is
a great starting point for discussion because
survival is clearly an important outcome
for patients, arguably one of the
most important outcomes of interest for cancer patients. It’s also important
to regulators because you can really
directly measure how a patient functions,
feels, or survives. So those are key words for
regulatory-minded people. And death is an
objective outcome, unlike some of the other
outcomes that we’re deriving. So this is a good
topic of discussion and a good tool for
us to use to talk about some of the
pluses and challenges of looking at
real-world endpoints. The hard part, this sounds like it might the low-hanging fruit and something easy
to talk about, but the hard part
that we realized is that death
information on patients is not captured readily
in routine care. And every organization
had a different way of capturing that information. So how does Flatiron capture high-quality
mortality endpoints? And what we do is we form a composite endpoint
for mortality. And we use a number
of different sources to build this endpoint. One is the electronic
health record, so yes, there is some
structured information about patient deaths that
recorded in the patient charts. But we also realize
that some of it is found in unstructured information. So we are able to look
at a patient’s chart and find death
certificates, for example, or condolence letters. And that gives a
sense that a patient had died and when they had died. But on top of that,
we actually link our databases through
patient matching algorithms to other sources of information. One is the social
security death index, and the other is a commercially
available obituary database. And so importantly, that’s what makes
up our composite and
mortality endpoint. And importantly, what we do
is another step after that. And what we do is we benchmark
against the gold standard. And that’s the
National Death Index that’s put out by the CDC. And so that benchmarking allows
us to look at sensitivity and specificity of our
mortality endpoint. And the sensitivity is a
measure of the completeness of that endpoint. And high completeness
is really important when it comes to
overall survival. Because missing data leads
you to overestimate survival. And then all of the subsequent
analyses that you do on overall survival
will be skewed if there’s something up
with that information. So from the Flatiron
perspective, completeness is a good measure of quality for overall survival. And we can all do the experiment of producing this
metric across datasets by benchmarking to
the gold standard. And it probably is irrespective of how you actually
derived your endpoint. So in terms of
reflecting on next steps for the Friends of
Cancers research, we saw preliminary
differences, as you heard, and what we need to do next
is a deeper dive on why. What is underlying
those differences? And we really need to tease out how much of the differences
are due to real differences in the populations that
each dataset captures? And how much is really
related to the methods that we use to derive the data? And so what was probably
missing from pilot 2.0, but what I’m hopeful that
we’re gonna experiment and work on moving forward is a discussion on the
quality assessments for different variables. And the quality aspects,
really as we heard, basically all day today, can really result in
variability across study design. So the things we might
wanna look at are how reliable or accurate
are the data elements that we use to
build our cohorts? That’s one thing. The other thing is how
reliable and accurate are the elements on exposure? And then how
reliable and accurate were the elements that we
used to make up our endpoints or to derive our endpoints? And we can ask the question,
have we benchmarked overall survival against
the gold standard? But of course, that
begs the question, what do you do when there
is no gold standard? And these are the
type of challenges that we really have
to face as a community by experimenting together. And so as a policy
person, I’d like to build in some more structure
around opportunities to do those kind of experiments and to continue the discussion. (cell phone chattering) (laughing) – [David] Someone
sent me the meme. (audience laughs) – [Mark] It is a meme. (laughing) It’s okay, David. Oh, that was? Came at the right
time, your smiley face. Thank you. (audience applauds) Andrew? – That was a great segue, David. Thank you for that. – [Jonathan] Theme
music for all of us. – Yeah, that’s my walk up song. So Nicole really set
this up quite nicely because I think I wanted to
dive a little bit further into a few of the points that you made about
overall survival. So I’m the chief
medical officer of Cota. Cota’s a company that actually
has quite a similar approach which is we work with
providers to extract data from the EHR, cancer data only, structured and unstructured. We curate that and
make the data available in deidentified fashion for research by
various stakeholders. And we were a participant
in the Friends project that you heard about as well. You know, I think one of
the interesting things as we think about endpoints is which endpoints are most
optimal for real-world data? And I too settle on
real-world overall survival as perhaps the most objective. Although, it has challenges
that we heard about. You know, the reality is
that other oncology endpoints like progression-free
survival and response rate which are the typical endpoints that we look at
in clinical trials that require
measurements from images. And depending on the time points when those images are obtained,
depending on the observer who’s doing the measuring, depending on technical factors, a whole host of
different things, you get different answers. So there’s a fair
amount of subjectivity. And I’ve heard Sean speak
publicly about the fact that even when you look at carefully controlled
radiologist assessments of response in clinical trials, there tends to be
some wiggle room. And that’s problematic. Time to next treatment
and time-to-treatment
discontinuation are interesting
endpoints because we’re
able to assess them using claims or EHR data. That’s why it’s appealing. On the other hand, you
could have two treatments that result in the very
same time to next treatment, but one confers its
time to next treatment by virtue of being toxic, and another confers its
time to next treatment by virtue of being ineffective. In other words, why did
the doctor stop the drug? Was it because you
had an adverse event or was it because
it didn’t work? Those are pretty different and those things can
be hard to capture. So we like overall survival. It’s black and white, but
it’s very rarely in the EHR. People always ask me, “Why
wouldn’t a doctor note “if the patient died
or not in the EHR?” And you know, that’s
a great question. I practiced medicine
for a dozen years. The reason you don’t write it
down is because the patient’s not gonna come back
and see you again and just becomes, it’s
not that relevant anymore for taking care of that
individual patient. The reasons we use the EHR in
practice are for patient care and we use it for billing. And neither of those things
are relevant after death. We needed to move this ecosystem to one where we also
capture things in the EHR that are important for
improving quality of care. And we need to make that easy. And we need to maybe even
incentivize physicians to capture that
kind of information. But there’s very rarely
pressure to do that. Amy Abernethy, when
she was at Flatiron, talked a lot about this
condolence note example. In a practice where
condolence notes are part of the
expected process, and sadly, they’re
not everywhere, but where they are, you end
up having that documentation. And it becomes a good
indicator of death. But most of the time, you don’t have that
in a systematic way. And so you’re relying
on conscientiousness or empathy of
individual physicians. And that’s fundamentally
problematic. So we have to link. (laughing) (audience laughs) Some of us are empathetic. So we have to rely
on linked datasets. Like you just heard
about from Nicole. The linkable datasets
that are available to us are all imperfect
in their own way. I’m getting the two-minute mark. I thought I was being brief. The SSDI, the Social
Security Index is incomplete. The National Death Index, people
consider the gold standard these days, but it’s
at least a year behind. So many of us now, I think,
are using obituary databases. And Flatiron’s done
pioneering work to show that those databases
are in fact quite complete. There are also is credit
bureau information. The financial industry has
figured out how to record deaths better than the
healthcare industry. So whenever I speak publicly,
especially in Washington, I like to make a plea that the government
policymakers in the audience help us figure out how to
get reliable, complete, timely death data
because it will move this whole thing forward. There is a non-trivial
matching exercise that’s required
to link our data, even when we have identifiers to link our EHR data to these
proprietary death databases because of what
we heard earlier, that patients’ names are
captured in different ways. Patients’ addresses are captured in different ways, et cetera. So I’ll probably stop
there and pass it to John. – [Mark] All right, thank you. (audience applauds) John? – So thanks so
much for having me, founder and
president at Syapse . Pleasure to be here. And especially thank
you to Jeff and the team at Friends of Cancer Research who did the job of wrangling
10 competing organizations or semi-competing
organizations together to actually work on
a problem together, including sharing definitions and creating shared definitions. Someone made a comment
earlier about transparency. Well, it’s an extraordinary feat to get 10 groups together who are sometimes at each
other’s throats in the market and get us all to actually
get in the same room with not just the execs,
but the actual people doing the hard data work, and have those people
transparently talk about issues. It was an incredible feat. And all of the definitions
are so far, public. And there will be a
publication at some point. So we’re committed to
moving the field forward through that sort
of transparency. So quite an undertaking. So reflecting a little
bit on the process. You know, one of the
things that emerges is the differences
between organizations. So of course, there’s the claims versus clinical division
of organizations. And then even within those
different organizations, the way groups source
data and the relationships that they have to the data
is a little bit different. So speaking for us, we’re
actually a little bit different than the other
two organizations, in that our approach
tends to be multi-source in that we engage
with health systems. These groups are
multi-specialty, multi-practice, multi-hospital, multi-region, and necessarily have many,
many, many, different systems. The EHR being one of them. And typically, they
have many EHRs. And even when it’s
the same vendor, it’s still many EHRs
from that vendor, different configurations
of Epic and Cerner across their enterprises due the mergers
and consolidations
across healthcare. So this multi-source approach
is incredibly difficult to gain a handle on,
but it bears benefit when you get to the
data comprehensiveness. In that, we might
have organizations who for example, have
palliative care facilities or they own hospices. So these organizations, when
it comes to overall survival, have more of a connection with the patient as they go
through their care journey. So in terms of capturing
something like mortality, as Andrew mentioned,
there needs to be a reason why it’s being documented. Well a very good reason is
you actually own the hospice or you have the
palliative care program and you have the software
systems to support that. So I can’t speak about
the results publicly yet, as we’re in the
publication process, but we too are doing a
similar validation work on our mortality data. And just the anecdote is that
the mortality information captured by an
integrated health system and integrated delivery network is significantly higher
than an outpatient practice due to that facet of having more and more of the care experience. So that’s one important nuance to our approach. The second one is
going upstream. So for example,
rather than trying to purely apply techniques to structure data
after the fact, we actually will go directly
to the genomic testing lab. So for example, Foundation
Medicine or Caris or the other common
clinical testing labs. And we actually work
with them to implement a standard
interoperability approach so that we get all of the data
coming in from those labs. And then the third element
is the closed loop nature. So the number one consumer
of the aggregated data, or of the integrated data is actually the physician
and the clinical staff. The number two consumer
are the regulators like sciences, companies
and others doing research. So if something is wrong, you’re gonna bet
that the physician’s actually gonna catch it and is going to
look at that data and say, “Wait a second,
I know this patient. “That’s not accurate. “Let’s go and fix this.” So the closed loop nature
actually has benefits towards improving the data. Speaking specifically now
about overall survival and the work that we
did together on Friends and the work that we’re going
to be doing going forward. A lot of the work, in terms
of looking at these endpoints, you really have to understand, especially with
overall survival, what is that
particular population? So a lot of the work that we did was we took a fairly
high-level population. We took those three
treatment regimens in the first line that you saw and we looked at OS according
to common definitions that we were using. But as Jeff and the other
speakers had pointed out, we didn’t really go and
whittle down that population and control for the
exact covariance. So one of the next
steps for this effort is looking very specifically
at how you narrow down to assure if you’re looking
at the same age range, the same histology,
et cetera, et cetera. What’s your OS? And how is that consistent
across the groups? Now speaking for us personally, one of the elements that
we are highly focused on as a company is the
validation of the measures that we have in place, both the underlying data as well as the calculation
methodologies. So it’s a publicly disclosed now that we have a
research collaboration with Sean and the
folks at the FDA. And the specific, one of the specific
pillars of that RCA that we’re focused
on is validation of the real-world endpoints
that we’re measuring in a specific set
of populations, both looking at OS and PFS as more traditional measures, but then looking at
the more real-world scalable nontraditional
measures. So TTD, TTNT and a
few others as well. And we’re in the
process right now of determining what does
validation actually mean? So does it actually
mean that you’re getting the same real-world OS
as the OS in the trial? Probably not because they
are probably significant covariates that you
still can’t control for in the real-world
population. Whereas Andrew was observing, if you look at TTD, is
the patient discontinuing because of toxicity or
are they discontinuing because of something else? Now of course in the
real-world setting, I know we said that OS, I’ll
disagree with you slightly just to spice up the panel. Real-world OS is not
totally objective, right? I mean the patient could be
hit by a truck, sorry to say. Now this happens, right. You need to understand what
is the mortality cause? And if you have this broad real-world
uncontrolled population, you have things
like that happen. And hate to break it to you, but SSDI and all
these other systems, they’re not terribly
great at capturing this. So you have to dig a
little deeper than that. So one of things that we’re
in the process of doing is looking at if
you have mortality and you have completeness
on mortality, do you know what the
cause of mortality is? Same thing with
discontinuation of treatment. Do you know what that
discontinuation is? I’ve been given the stop
sign, so I will stop. So thank you very much. – Thank you. (audience applauds) I wanna thank you all. It’s very impressive
work on an important area of improving
real-world endpoints. I’m gonna start out
with a question about, and again, we have a few
minutes for questions, so if anybody has one, please
head up to the microphone. We haven’t talked much
today about issues of governance or
pre-competitive issues in the real-world data and
real-world evidence space. But just as in, I think,
other areas of drug and medical product development, there are issues where
individual organizations may wanna compete and
do their own thing. Certainly, there’s a lot of
work that needs to be done to just figure out to put these real-world
endpoints together. As I think Nicole mentioned too, it’s not just the endpoints,
but the characteristics of patients, the exposures,
things like that too. But clearly you all found
some reasons to work together. I mean, maybe starting
with you, Jeff, but others, what are
the biggest motivators to a shared approach here? Is it identifying
best practices? Is it accomplishing things that couldn’t have
been done otherwise? Because it does seem important
to help make progress in the area with
so much uncertainty and opportunity for learning. – Yeah, while I
give a lot of credit to everyone who participated because they all sort of
went out on a limb here. Maybe it was hard for
all of you internally, but I think it speaks to the
importance of the opportunity at hand here, for one. And I actually think
that people came at this putting any competition aside. And the goal was
not to differentiate
between the datasets which was one reason
why you see all of them remaining anonymous. We didn’t want that to be, you know, this wasn’t
meant to be a trade show or a bake-off or
something like that. It was really meant to
be about the endpoints and the characteristics
that will allow this type of data to accelerate
in its use moving forward. I think the willingness of
Sean and his colleagues at FDA to be a part of this certainly helped us as well. It’s just the goal of being
able to try and do this faster, as opposed to having
each organization try and create this on their own which would just result in
even more mass confusion. So reaching definitions here
was actually pretty easy. And even to see the level of
collaboration around things like sharing strategy,
sharing code, a willingness to even, you know, this was the high-level data. You know, I think what
these folks were able to dive into and
really dissect out and will be doing
in the coming months about some of the
specific characteristics will be really important. But you know, I
think it’s heartening to see the willingness and the
effort that’s gone into this. – And I’ll just
add that, you know, if you look at the
broader industry trends, drug development, timelines
and costs keep increasing probably at an exponential rate. Though some experts in
the room can correct. Cancer remains one of the
top two leading causes of death in this country
and is only either staying the same or getting worse. And you just look
at this and say, “Something’s gotta give here.” Right, we are going
to have to look at new, innovative approaches. Real-world evidence is
one of the few approaches that has a chance
in the short-term to fundamentally
alter the dynamics of drug development and approval and making sure that especially
more rare populations get access to
innovative therapies. And the organizations
just looked at this and said, we need
to work together to establish some
common set of standards and fundamental
confidence in this area so that we can move it
forward and bring it forth as a regulatory position. – And Sean, you at FDA clearly put some effort into this too, must have been thinking
perhaps along the same lines? – Sure, our mandate is
to protect the wellbeing of the American public which goes beyond
individuals and patients who are enrolled
in clinical trials, especially in oncology
where the majority are not enrolled
in clinical trials. So some in fact argue that
showing real-world performance is a much higher bar
than showing activity in a well-controlled
traditional clinical trial with very narrow
eligibility criteria. And as Andrew mentioned, there is volatility
in how we assess images in traditional,
phase three clinical trials. The discordance is about 30%. Two different radiologists
in registrational studies looking at the same image, and then categorizing
that into resist. And if you look at basically
some of their recent reviews that FDA published
on our website, we now put both assessments, the investigator assessment and the independent
radiology review assessment. And there is a 30% discordance which is interesting because
that’s after categorization. And to resist which
already has a 50% margin of error built in. So it tells you that
the treatment estimates in traditional
clinical trials can, in some cases, be volatile. And already what the experience shows with this pilot, that there are opportunities
that exist today. For example, if
you have a therapy that has a large effect size. Using endpoints that
Jeff already showed, we’re probably going to see, most likely going to
see the signal emerge despite the slight variations. Because a large effect
size would overcome some of the technical and
data missingness challenges. If we have a drug, for example,
with an 80% response rate which is not that rare nowadays, that signal would emerge. There are definitely
opportunities today that we can start to
explore using the foundation that’s already been built. – Yeah, well that’s
a good model. And the engagement’s
really important. Paul? – [Paul] Hi, Paul
Bleicher from OptumLabs. And I’ve been involved in a
number of these discussions about replicating
real-world studies in several different databases which is obviously important if you’re gonna use it
to draw conclusions. So my question, and
I wanna preface this with I realize how hard this is to do, but have you
attempted to identify how much duplication
of patients there are between the databases? And have you thought about
whether there’s a methodology that we could all
use in the future to somehow make sure we’re
using non-overlapping patients in doing these kind of analyses? – That’s a really
interesting question. I’ll make a couple comments
and then maybe my colleagues here have something else to say. You know, we don’t like to
acknowledge it all the time, but there is a lift, on the EHR side, there
is a lift associated with sharing data with us. As a result, I think it would be
hard to find a provider in the country today
who is giving data to Flatiron and Cota and Syapse and a bunch of other datasets. Now to the extent that there
maybe overlap between claims and EHR, that’s you know,
with respect to patient, individual patients, that’s
maybe a different matter. But I do think on the EHR side, you know, the bulk
of Flatiron’s data comes from an EHR that they own. Syapse and Cota
have customer lists that are non-overlapping. So I mean, I think for
the most part today, these are unique data assets. – [Paul] So would argue
that you’d be surprised because once you’ve
done the work to make it available
for one data aggregator, it could be easier
to make it available. And let’s not just talk about
this particular indication of lung cancer, let’s just
think in general about this. It’s a problem. And I think we have
to think about how to methodologically
deal with it. – Yeah, I think it’s
fair that it’s a problem and one that is
probably unaddressed. – Well it sounds like
another good topic for cross-company,
industry-wide collaboration. Bob? – [Bob] I have a question
about the primacy and emphasize on overall
survival in these things. My recollection from
when I used to do these, and it’s still true in
the approvals I read, overall survival is often
less dramatically affected than things like
time-to-progression. And the reason is the people
cross over after they progress. It strikes me as pretty
good way to lose, to look at overall survival,
especially if you’re looking at first line therapy. So I just wondered what
you thought of that? – I mean, I would say
that I, we did not intend to assert as a panel
that OS is the primary and most important endpoint. I think it was the
straightforward ones used as an example due
to a fairly obvious start and a fairly obvious end. – [Bob] Nice endpoint. – Yeah, so let’s not read
this as OS is primary. In fact, if you, you know, when you
start digging in, which the groups
have done together, what you find is that it’s
actually a little easier. And maybe I’ll prematurely
make this comment, so someone tell me not to, but it’s a little bit easier
to look at approximate endpoint like time-to-treatment
discontinuation as long as you can dig into
the discontinuation reason. That’s significantly
more straightforward to do your validation on. Let’s, I would say from our
organization’s perspective, something like real-world
PFS is a measure that is so variable
and difficult due to the fact you’re looking
at a physician’s subjective documentation of progression, untied typically to
radiology assessment that who knows what
you’re gonna do with it? So we’re looking, we’re
capturing that right now and we have a
mechanism to do it, but I would not hold that
up and look at that measure and say that that’s something that at least our organization
is going to pursue at this stage as
validating measure. – Yeah, I think that’s right. You know, and I
would just point out that in the Friends pilot 1.0, we looked at correlation
of each of these endpoints independently with
real-world overall survival and it was pretty consistent that there was good correlation. Not perfect, but good. I agree with the
point that you make. Crossover’s a concern there. – And sounds like you have
some further work ahead potentially on
additional endpoints like time-to-treatment
discontinuation with the reason
attached and so forth. All right, we are in
the lightening round to finish up. Let me maybe start here. I think you were up first
and try to get the other two in quickly as well. – [Rebecca] Rebecca
Miksad from Flatiron. And Jeff, I wanna
congratulate you and Friends for pulling all these
groups together. Having seen some of
the background data, my question for the panel is there was a lot of nuance
in terms of applying or trying to align on criteria, as well as a lot
of nuance in trying to understand what
the cohorts were. And that doesn’t fit in the meme that David just got. How are you thinking
about trying to explain that or to communicate that
to the rest of the community? – Anyone else wanna start? – You ready?
– I’ll take it. Why not?
– Go for it. – And gotta be brief. – Yeah, so where we’ve got next
steps across the groups to whittle down the cohorts to basically make sure
that we’re looking at the exact same population. But my personal belief is
that we’re going to reach a limitation to what
we can do collectively, and then we’re going
to have to affectively, as individual organizations
start publishing on the exact cohorts
that we’re looking at. – I think the real issue
is as we move this stuff to regulatory decision-making, how are the poor
regulators gonna review all of that nuance and
use it to make sensible decisions about this? And I think that’s gonna be really hard–
– Well I hope you can collectively provide
some guidance for them and how to do that. – [Gracie] Hi, Gracie
Lieberman from Genentech. So absolutely great work. And I know way back when
first the whole thing started with real-world data and the efforts to
come up with pilots, I remember, it felt
like a little uphill. So great work. I have a question, well
one already came up with this time-to-treatment
discontinuation or the time to next treatment, especially when you compare
the PD-1, PD-L1 checkpoints to chemo or the chemo
plus PD-1 to chemo and placebo kind. So yes, you can see
that maybe patients stopped or changed therapy not because of
effectiveness, but safety. But there’s this other, when
you look at the other piece, this sort of buzz word about
the PD checkpoints, right? The time to sort of, or
treatment post-progression. How could that buzz word impact some of these endpoints
that we’re testing in the real-world so
that physician continue to test them? So is this actually the
right model to look at how sensitive these endpoints are? – Quick comment. So I think you know it’s useful to look at these
endpoints as a composite. That if all of these endpoints, thinking of them as vectors are moving in the
same direction, then that can give us
additional confidence that these are true
treatment effects. So at this point of, at this
stage, at this juncture, it’s best to, I think, to look at this endpoints holistically. And there’s a lot of information and each endpoint,
as was discussed, has pros and cons. And in terms of really
understanding progression and treatment
beyond progression, for example, for immune
checkpoint inhibitors, well progression is really
in the eye of the beholder because we know resistance, the best we can do
in clinical trials is very, very inadequate. And in the real-world,
chances are, and somebody can do
an informal survey, the majority of oncologists
have even never heard resist. Because the way you assess
the, community oncologists, you just look at a patient and use your cortical
neural networks, look at the patient,
clinical and symptoms. Laboratory imaging is only
part of the assessment when patients are
treated in the community. Because the radiology
report, as we all know, and that’s why it’s
very hard to rely on the radiologist report
that’s scanned into the EHR because one lesion
maybe smaller, the other lesion might be larger and maybe a couple of
new lesions, maybe not. A lot of times, you don’t
have the prior scans when you do these assessments. And at the end of every radiology report,
correlate clinically. Because it’s really up
to the primary oncologist that knows about the
patient far beyond what’s on the scan which is not a very accurate
assessment in some cases. You wanna look at
adverse events. You wanna look at how
the patient’s feeling
and functioning. So it’s much more holistic
in the real-world. And in fact, I think
part of our challenge is to capture what the
oncologist is thinking. Some exercises we did, I don’t know if there’s
time, can talk about this. Some of the early work
that FDA did with Flatiron was looking at the
clinician’s assessment versus what’s in
the radiology report and how they compare and we
combine them, is it better? Or are they individually
giving us more. I don’t know if, Nicole,
if wanna talk about that. But the real-world is
a very holistic milieu versus the traditional
clinical trials just because the
need to standardize, we’ve become very reductionist. But that doesn’t
mean it’s really, these are endpoints that are the truth because the truth is
a lot more holistic. – I think Richard said
real-world endpoints could be very helpful
given the complex issues involved
for these patients. Diana? – [Diana] Yeah,
I’m Diana Zuckerman from the National Center
for Health Research. When we talk to patients,
obviously, overall survival is the ultimate
patient-centered outcome, although there are obviously
other important ones. Given that importance,
I just wondered if there’d been any efforts to reach out to electronic
health record companies or the medical community
to include death in electronic medical records? – Definitely. You know, I would just point out that a lot of the
key structured fields that we know we need
and are built in EHRs, tend not to be completed. You know, we’ve looked
in one of the EHRs that Cota works
with has a lovely and prominently placed
field for overall stage that’s completed
about 20% of the time. And I think, I don’t
know for certain, but I believe that
there are EHRs that do have a structured
field for vital status. But the issue is, what
we talked about earlier, giving doctors a reason
and making it easy for them to complete
that information in a timely way so that then it’s
then available. – [Nicole] When you
think about policy, maybe this is a place where
policy could be helpful. And maybe physicians
need some incentive to actually do that. So this is something
that should be part of broader conversation. Some of this very
practical things we can do as levers
to get the information that we would like to
see or that we need. – Yeah, I’ll just add
that this question comes up all the time. “Oh, let’s go change the EHR. “Let’s change the clinical
workflow and documentation.” By the way, we’ve done
a bunch of work on that. Spoiler, unless you
pay the physician or give them some incentive,
things don’t change. So we’re in DC. I’d actually flip
it around and say what are the policy
folks in the room doing to change physician compensation and incentive models
such that completion of critical quality
associated fields are actually filled out? It’s amazing what happens when you flip that
incentive model. – All right, and that
sounds like a good topic for one of our next meetings. But for now, I wanna
thank the panel for an excellent discussion for the some important
path-breaking work. (audience applauds) All right, we are
almost out of time, but we did wanna
leave a few minutes before we wrapped to see if
there were any other comments or questions, other input
from those of you here today with regard to the topics
that we’ve discussed. We’ve already had a
tremendous amount of input, very useful input which
was a main purpose of this meeting. But we do have a
few minutes left in case there are any
other comments, questions, points to add. It’s been a very productive day. So let me say before we
leave, just a few thank yous. So I guess our meme is the
smiley face, right, David? And I think everyone who
contributed to this effort, our speakers, the ones
who did the presentations, all the panelists, all
of you who attended and provided comments and input, you all made this possible. I do wanna give a special shout
out to a couple of groups. One is the planning
team at the FDA that spent a lot of effort
to pull all this together, provided all kinds of
insights, comments, and help in pulling together this event. Jacqueline Corrigan-Curay, her team at the Office
of Medical Policy, including Kiara
Elzarod, Diane Perron, and David, of
course, Leonard Sacks were really helpful
for this meeting. And I also wanna thank our team at the Duke-Margolis
Center for Health Policy, Adam Kroetsch, Nirosha
Mahendraratnam Lederer, and Christina Silcox, Adam Aten, Kerra Mercon, Joy
Eckert, Sarah Supsiri, and Elizabeth Murphy. All of them put a lot of
time and effort into this. So I wanna thank all
of you for being here and all of those as well
who put this together. Thank you. All right, we’re obviously
not done on this issue. We have more information on
our website at Duke-Margolis. And we hope you’ll stay in touch about future events
and activities related to real-world evidence. Thanks and safe travels home. (audience applauds)

Leave a Reply

Your email address will not be published. Required fields are marked *