DCL Learning Series

Content From Multiple Clinical Sources

David Turner

Well, hello, everybody, and welcome to the DCL Learning Series. You're in for a real treat today. We've got a webinar that's entitled "Content From Multiple Clinical Sources: The Pitfalls and Solutions to Consistent Documents and Content." If you don't know me, my name is David Turner, and I'm a content consultant, and also Head of Partner Relations at Data Conversion Laboratory, and I'll be your moderator today. A couple of quick things, before we get started, I do want to just remind everybody the webinar is being recorded. It'll be available in the on-demand section of our website at dataconversionlaboratory.com. If you have any trouble with that, just reach out to me and I'll make sure that you get the right link. Second thing is I just want to invite you to submit any questions at any time during the conversation today. We're going to save some time at the end to answer questions, so if you've got them, just feel free to jump in and use the questions feature there.

All right, let me jump over here to this next slide here. Obviously, technology plays a really critical role in life sciences, because you get things like accuracy to worry about, traceability, compliance, and of course, speed to market, it's just absolutely critical. So, by improving content management and data management, along with your IT systems, compliance, program management, all of this works to streamline both research and drug development. So, with that in mind, Data Conversion Laboratory, and Court Square Group, and JANA Life Sciences have put together this Learning Series to really address how technology can contribute to your success. This is actually the fifth, if you look here on this slide, the fifth webinar of a series of seven webinars, and the other webinar topics are listed here. I will say this, while today's webinar does build on our previous webinars that were all about technology and content management, you will still gain a lot of insight today, even if you hadn't followed the entire series. But if you do want to go back and follow the entire series, it's easy to get to, we've got transcripts and videos on the Data Conversion Laboratory website, and we'll push a link to that page via the chat box, which I think has already been done by Leigh Anne. Thanks so much, Leigh Anne.

All right, just quickly about DCL. This webinar series is brought to you by Data Conversion Laboratory, or DCL, as we're also known. Our mission in the marketplace is to structure the world's content. Our services are all about things like converting content, structuring content, enriching content, and data. We're one of the leading providers of XML transformation services, XML conversion services, also content reuse analysis. We're an industry expert who has been a part of SPL conversion and the SPL working group for global pharma companies for many, many years. If you have any complex content and data challenges or initiatives, those are the kinds of things where we can help.

All right, let's jump in, and let me just introduce our panelists today. Three good friends of mine. First of all, we have Mr. Mark Gross, who is the President of Data Conversion Laboratory. We'll hear from him momentarily. We've also got Keith Parent who's the CEO at Court Square Group, who I think a number of you know, and then out on the west coast, we got Ron Niland, the President of JANA Life Sciences, I think some of you also know. Glad to have everyone here today. Let's get this thing kicked off. I'm going to start by passing it over to you first, Keith. Keith from Court Square Group, why don't you tell us a little bit about the work that you do?

4:03

Keith Parent

Hi, David, thank you very much, and I'm really happy to be on this webinar series for the fifth time, this has been a great series so far. Court Square Group is a provider of an audit-ready compliant cloud environment. So we host many applications in the life science space, particularly around document management, content management, integrating multiple applications together. We actually have a RegDocs365 platform as well, that hosts clinical content, and we host many different clinical trials using that content. We have submission systems for people to submit to the FDA or other health authorities. We're heavily involved in AI research, in what's going on with AI with the content in these documents. Our goal is to really look at the cloud environments that are out there, work with them across the board and connect the worlds as we can in the life science world.

David Turner

Excellent. All right. Well, let's move on over and let's let Ron jump in here and talk a little bit about JANA.

Ron Niland

Thank you very much, David. Like Keith, I would echo the same sentiment in terms of taking part in this series. It's been a real learning opportunity for us, and we appreciate the partnership with both DCL and Court Square Group. JANA is a company that was formed in 1973, JANA, Inc., that is, and we're a division, JANA Life Sciences. We have not only 48 years of experience and working with thousands of projects, but with some of the world's major corporations. One of those is Boeing, and for a handful of years out of the last eight, we've been a performance award, excellence award winner with Boeing.

What we do is basically four things. We do technical documentation, we then do what falls under the umbrella of operational excellence, IT systems, and program management. We work with a multitude of formats, including XML and DITA. We have a projectized organization at the heart of each one of our efforts, our project managers who work with teams, and basically we are not only ISO 9001: 2015 compliant, but we're also working toward ISO 1345 in the here and now, we'll be certified with that probably in the September, October timeframe. But basically, we do technical documentation, procedural documentation, how-to manuals, user manuals, maintenance manuals, as examples, operating procedures and work instructions, and we do training associated with that. We move mountains of data. That's an overview of JANA.

David Turner

Wonderful. Wonderful. What both these guys didn't tell you is all the years of experience that they have in working in different parts of life sciences. They're really expert panelists here today, and glad to have them.

7:05

Before we get too much further on, I'm going to hand it back to you in just a second, but I thought I would start with a quick poll. I'm going to load this poll here that talks about Where is your organization in terms of consolidating content from multiple sources? And I'm hereby launching the poll. All right, it will take just a minute, stretch a little bit, think about where you are in terms of this process and fill out the answers. We'll collect the responses here. I think we've got about 25% in, so far. All right. Bumping up a little bit, about 50%. Those of you who are thinking, "ahh, I don't want to vote," go ahead, vote. Come on. That's a little longer here... All right, still time, about 10 more seconds. All right, then, I'm going to go ahead and close that poll. Thank you, everybody, for that.

Let's see here. In terms of sharing this thing, let me put that up here. All right. What we got in our poll responses are really kind of interesting. It was funny, as the poll was going out, it started out almost everyone that answered was just starting to think about it. Then the second half of the minute that we sat there, we got the "well on our way towards embracing it, with multiple processes." It's interesting, we received no responses for either or the other ones, so, very interesting. All right. Well, with that, let me close this poll back out... boom. And we'll get back to our webinar, where let me pass it over to Ron, where you can talk about the overview of the webinar today.

Ron Niland

Sure. Thanks, David. Basically, today, we'll be talking about a half dozen or so concepts that really range from the idea of accessing information, managing it, analyzing it, and then ultimately presenting it. What we hope to do is give you some insights to what it means to do a mapping of that information, give you insights as to the melding process of it, and then how data is then migrated, consolidated, and in that process harmonized so that you can have that end game of, again, being able to look at your data, access it readily, manage it readily, analyze it, and present it and reuse it. That's going to be one concept that we want you to hopefully, at the end of this webinar, appreciate the potential further reuse of your information. Working with a multitude of vendors, sponsors usually then have this role where they need to bring together that information from the vendors.

And then there's an aspect of not just bringing it together, but understanding who's the keeper of the keys for the information. Those are some of the elements that we'll talk about. We'll talk about some of the issues that happen invariably when you bring that data in. Issues like PDFs aren't necessarily scannable, or searchable, that is. Then we'll talk about the idea of the formatting that will lead to the harmonization. Then again, we'll put a capstone on this presentation by talking about the summary data.

11:14

Next slide, please. The first question is, what is all of this clinical content? It obviously depends on your organization, whether you're a producer of biologics or pharmaceuticals or medical devices. But this sort of gives you a little sense, if you will, of those types of documents, and we broke them into three larger buckets here. The first is around study planning and execution. The second is around the idea of the study itself, and its generation of data, either during the study or at the end of the study, and then the last bucket here is a miscellaneous bucket. Each one of these can be parsed, and are, to very finite or more discrete levels. But if we just talk about, let's say, study planning and execution, this is the aspect of bringing in information associated with understanding the certifications of the bodies that are associated with the development efforts, the contracts that may be, again associated with these parties. Then there could be either amendments to those contracts, or even amendments to the protocols that may be executed in doing that clinical research. All of this obviously has to fall under certain regulations, those would be good clinical practices and GxPs, more generally. You've got patient-related information, like informed consent, and general study information. That's the first bucket.

The outputs of this, then, if we think about clinical development, one of the end games is the idea of that clinical study report, obviously. But in order to get there, there's a mountain of data that needs to be generated that includes things like data clarification, forms and electronic data capture. The idea of, then, information coming in from patients like in eDiaries, or their electronic medical records, there's a whole host of imaging information that can come in, that can be embedded within the electronic medical record for sure, sometimes it's managed separately. You've got laboratory findings, and then there are aspects of quality of life and disease-specific instruments that are being developed and basically run with patients, and those responses need to be gathered. Safety reports, obviously, throughout the life of a study and even thereafter. Then there's the SAS data sets and the generation of those. That's study-related outputs, and the mixed bag of miscellaneous that really runs the gamut from email correspondence to references and reports, specifications and controls. But that just is some of the clinical content that we as a collective unit will work with with firms such as yours.

Next slide. So, where does the study information generate or originate? It's a multitude of sources. It could run from the study sponsors, to the sites at the site level, you can have the investigators and their respective information. You've got other clinicians that may be associated with the care of a patient that are providing information, maybe to an investigator. Perhaps it's a cardiology study, and that patient has a cardiologist that they've seen for a number of years and they're working together with the investigator for the trial. You've got contract research organizations, and researchers at both the academic and government level, you've got laboratory information coming through. Again, we talked about patient diaries, and you've got managed care organizations, acute care organizations, medical writers, funders, the list goes on and on, and we can't forget about the lawyers, as well as the governing bodies associated with the conduct of the study. Whether it's an institutional review board, or ethics committees, and for the safety, then, you've got the data monitoring committees as well. Next slide.

15:36

So, where does all of this information come from, right? It's a multitude of sources, and there's an aspect of the functional view of that information, too, that we need to think about. At the top of the wheel we put manufacturing, and it really depends with a company, if we're talking about, let's say, pharmaceuticals, it may start more with the discovery side. But the fact is, each one of these functions come into play on the clinical side, because they're perhaps helping to either grease the skids, if you will, of the clinical development, producing maybe the clinical supplies, or taking the information from that study, and preparing it for regulatory bodies. Each one of these functions is working with a number of different types of documents, and the inputs and outputs are a little bit different.

But some of the systems that come into play, obviously, there's the QMS person, foremost, just ensuring that you've got the right procedural framework for the work to be conducted, that's the SOPs, and work instructions and enabling documents. We talked about the idea of manufacturing and the clinical supplying and having that insured to be ready for the clinical program. Sometimes that actually takes years of pre-work, if you will, before you get into a phase one. You've got your general content management system, and then on the clinical side, you've got the more specific trial master file system, the eTMF, the electronic version of that. At the enterprise level, you may have your accounting systems like your ERP that need to be tapped into. If you're producing product, you might have a product lifecycle management system. Each one of these then might be generating information associated with analytics, so you need to think about all of these. Then from a regulatory perspective, you've got the regulatory information management, and perhaps you have it, a RIM system in place, and maybe you don't, maybe you've cobbled together a few different systems. Again, the lawyers with their legal management and eDiscovery elements, these are some of those sources of the systems that we'll touch on, as we go on.

Next slide, please. And so, in this graphic here, you're seeing three different studies. Basically, we've replicated the image of what's happening in that study with an investigator in green, interacting, perhaps with that computer and providing information. In the lower left of each one of these images, you can see there are patient charts being referenced. Then in the lower right, you've got ancillary data, and that could be the imaging data, and up in the upper left, you've got the electronic medical records. With this information coming together for a particular study, you may have a series of vendors, and again, those vendors may have a series of systems. You can imagine how this starts to mushroom out. Then if you've got a multitude of studies, like in this case, three, you may have 3X vendors and 3X systems that you need to think about for this continuum of data management, and the data management group working with that information, medical writers working with it, and then ultimately, the regulatory body, the regulatory group working with it and queuing it up for presentation to a regulatory authority. So, the aspect of this data and tracking and tracing it, if you will, through, is something that you need to safeguard, because ultimately, you need to make sure your information is validated, so that that compilation, that submission to regulatory bodies is accepted as bona fide data that's again, been fully validated, where necessary, that is.

19:43

Keith Parent

You know, Ron, it's interesting, when you're talking about the different, amount of data and where the data comes from, as I've been dealing more with different organizations and actually doing a lot with biologics or regenerative medicine, you're also getting to the point where some of this is more broadly stroked across a series of studies for a particular drug product. But then when you're starting to get into regenerative medicine, where you're actually taking cells out of patients and dealing with biologics and pulling in patient data, it's actually a place where you're getting more distinct around that data for a particular patient, and making sure you're tracking and tracing those particular elements all the way through the process, particularly if you're going to go into some kind of manufacturing or dealing with the cells where they're going to be infected with a virus and then brought back into the patient, to tracking all the way from where it gets gathered with the patient all the way through, when it gets put back into the patient.

So, the concept of a needle-to-needle study is something that's out there, where needle-to-needle work is something that people are starting to think about more on how that works, and how they have to track even at that minute level, all the way through. So I think it's really interesting the way we look at certain things in a much broader picture, when it's maybe a large molecule or a small molecule, and then we look at large molecules of the biologics and stuff, it's really changing over the course of time. It's not so much the type of data or how we handle the data, but it could be the volumes of data or even how many people touch that data around whether it could be a clinical supply chain or something like that, that deals with it.

Ron Niland

Those are great points, Keith. What's happened over the years is, we went from blockbuster pharmaceutical agents to more targeted biologics and targeted pharmaceuticals. But now it's moved to the next stage, i.e., precision medicine, where it's very customized to that individual, it's based on their cellular makeup, their genetic makeup, and to your point, it's like a white-glove kind of service offering, if we're talking about cell therapy, that you need to be not just managing that logistical chain from the point of extracting cells, manufacturing them and re-infusing them, but just ensuring the safety, the integrity, the validity of that associated information.

Keith Parent

The inventory capabilities around that, too. We talk a lot about the clinical data and the actual documents coming in and the data coming in from some of those things, but even on the manufacturing side, the inventory capabilities around tracking and tracing of all those things all the way through. You know, when I first started, we used to have issues around tracking and tracing of big lots of pharmaceuticals going out there and who is getting what, and then how does it tie back to the manufacturing process, and which lot was out there. Now, it's just those minute areas. Again, it gets back to volumes of data and being more specific around where things are going and where they are in the supply chain as well.

Ron Niland

Great points.

22:41

Mark Gross

I think, I think that's an excellent point in terms of just the volumes of data that are coming along. You're tracing it by bottle by bottle. So you're really, it's not just – it's orders of magnitude more information that's being collected. Along with that comes the issues of the quality of the data and making sure that it's being traced correctly, and that it's going all the way across and that it's auditable. So all the issues we've been talking about become more and more important over time by orders of magnitude, not just a little bit.

Ron Niland

That's a great point, Mark, and just to build on that one further step here is the idea of not just the information increasing, but the expectation of the fidelity of that information. Meaning, if we talked about imaging, as an example, certain pixel presentation five years ago, may no longer be acceptable for, then, let's say, looking at an oncology scan and trying to then do that assessment. Not only is it the fidelity of that scan that's so much greater, but then what we're doing is we're layering levels of information on top of these images, such as annotations by clinicians. So the volume is increasing, and the overall memory requirements are increasing, too.

Keith Parent

Why don't we move on to a couple more slides, and I think that same point that I just made, I think we're going to bring up in a couple of other areas. So, as we get into the content mapping and the summary aspects of it, we'll talk a little bit more about cross-institutional and even within your own institution, cross-departmental. Let's pop onto that. Next, please.

When we talk about cross-institutional factors, we talk a lot about where is that data coming from and what are the systems that people are using? One of the key things that I see over time is it used to be that IT always drove everything that was happening. And as we start to become more open with cloud based systems, and different applications that are very specific to different areas within a company, you may be dealing with multiple cloud platforms. In the last webinar series we did, we talked about some of the cloud factors and how we dealt with those different things. In here, what we want to do is emphasize the fact that even within an organization, I may be dealing with lots of different cloud environments and the interaction of the same data going between multiple systems.

And for us in the life science world, we really have to worry about qualification and validation, and how all that works together, so I want to make sure that when we're talking about these, we look at who are the major vendors that we're dealing with? Do people understand life science versus the regular commercial world that's out there, or the consumer world, which doesn't have the rigor that we need, they can take downtime for certain things or different issues that they deal with, whereas for us, we have to make sure that it's an end-to-end process between different environments and how we pull that stuff together. So our goal would be to make sure that we understand the vendors that we're working with, we understand how they're working on it. You can go to the next slide, please.

26:03

In this example, we talked about this, we had another funnel design of this, talking about how you may have an eDMS system where we've gotten lots of data coming in from the different sources. To Ron's slide earlier, when we were talking about all the different areas where the data was coming in, there's a lot of data that comes in from whether it's CMC, preclinical, clinical systems, those are all going to be then taken, aggregated together, medical writers are going to put that stuff together, data management folks are going to put it into SAS data sets, and then take those SAS datasets and create tables out of them. They're going to be fed off to the medical writers, everything's going to then be pushed off to regulatory, who then is going to take it and put links between documents, so they can actually put it through a validator that knows that it's going to go to the FDA or the EU or Health Canada, or any other regulatory bodies that we're going to be talking to. Well, those are all interactions internally or externally, because you may have an external vendor, you may have a CRO that's doing some of that work, you may have a submission company that's doing some of that work for you, they may need to take in lots of data from that, they may need to clean up data.

You know, I recently started working with a company that did an internal project just to find out how much time, of their time, was spent on searching and rework of data, cleaning the data, doing stuff with data, and they found that 40% of most of the people's time that they had surveyed within their own organization was spent in searching for and just cleaning data that they already had. So imagine if we were able to identify that data, set it so that it was much more cleanly delineated and usable, and then how it could go much easier between multiple vendors. Because part of that cleaning comes from the fact that different vendors do different things with stuff, and we have to worry about how that data comes in. Are they keeping the metadata that's supposed to be there, are they keeping all the issues that are there? So, Mark and Ron, you guys see this as well with the stuff that you're working on?

Ron Niland

Oh, absolutely. Mark?

Mark Gross

Yeah, and I think the numbers are consistent across many other places. I mean, that 40: I'm not surprised that a number like 40% is just being, just cleaning up data and figuring out how it belongs together. So that's why the standardization kind of things that we're talking about are just so critically important. If you can cut out 40% of that process, imagine what kind of improvements you can put in there. Not only that, but it also reduces the chances of errors later on, because if you've got that kind of thing going on, with people spending so much time cleaning data, imagine what the risk is of missing something along the way. So...

Keith Parent

Yeah, absolutely.

Mark Gross

...you've got to go back to getting it right the first time.

Ron Niland

What I've found...

Keith Parent

That's a great point.

Ron Niland

...is that there's an increasing distribution of work product. Thirty years ago, there was an embracing idea around working with major CROs. Now, today, what I'm finding is that when it comes to studies, companies that we're interacting with are working with at least 15 or so parties. Maybe they do have a major CRO, and maybe that CRO is overseeing even a handful or more of sub vendors. But then the hybrid model that I'm seeing increasingly is where the sponsor is interacting with a series of vendors, and then they have a CRO working with a series of vendors. And so, it's, what gets muddied here is understanding who's got the parent data, who's got the child data, and who's really taking the responsibility for doing the information architecting? And so, the idea of 40% doesn't surprise me.

Keith Parent

One of the things I want to make sure that people realize is that we made this as a series because our experiences over time show that there's lots of things that come together to drive solutions for anybody. Some of those things are how we define the data, some of those things are how we put together what's known as an information architecture. Some of those things, you know, the rules around how we deal with the data, but all of those things at any one time, we could, we'd sit here for a week going through all those things with you. So by cutting it down into small pieces, and doing them over a series of this learning series, we're going to give you an aspect of how these things tie together. So this particular day talking about how these things tie together builds upon some of the things that we talked about earlier, particularly around metadata, things like that. Now, Mark's going to go into a few of those things later on.

30:33

But right now, we just want to move on to the next, which is going to be migration of data. So, David, we can move that along, there we go. Looking at how they migrate the data between these systems, and how we can then consolidate some of that data is going to be really important. Next. Content migration, making sure that it's a validated process of migrating data from any one place to another, maintaining the integrity of the documents themselves, and what they do and how they work, that's going to be really important in understanding how are we dealing with that data across the way? And if it's large volumes of data, is it somebody moving them by hand, or is it a bulk upload process, download process? Are they retaining the metadata in the taxonomy?

Those are those things that we talk about, that we rely on, for giving us that structure later on. If the processes that you've fallen upon don't take those into account, then what'll happen is you'll lose some of that. So you may start out early on in one section or one area within the company. But then because you're using somebody else's system, as Ron was illustrating earlier, it may lose some of that. Our goal is to say, how do we figure out how to maintain the fidelity of that data, all the way through the process, and make sure that it stays there, even through the migration aspect of things?

Ron Niland

Yeah, just to further the point here, Keith, I'd seen a few situations where companies had moved data from the vendor, and they put it into this temporary holding area, and they might call it a repository, content management system. But the question is whether that was validated and safeguarded, right? Because if that data gets placed into a non-validated system, you know what happens here, Keith, right? Basically, you've got oil in the tank of water, and that water is no longer useful to drink. Meaning, you can't use that data for a submission if it was intended, and needed to be valid– kept in a validated environment throughout its life cycle, if you will.

Keith Parent

Absolutely. 100%. Next, Dave. So a couple of things that we're going to talk about now is just some of the things that we've seen across the board in either, whether it's through vendors, through individual groups, through sites, through CROs, through sponsors themselves, and what's some of the pitfalls that we've had to deal with, and how we've overcome some of them. Some of the things that I've seen are the loss of fidelity. Ron had mentioned earlier about scanned images and things coming in. How often – you know, some people take for granted that everything's electronic these days. Well, the reality is, some of these sites where they're sending in CRF forms or whatever, are coming in on a fax machine, or they're going to be scanned in. Somebody actually may have printed it out and then scanned it in, and then sent it to somebody, and it could lose that. The other concept is some of the vendors on the electronic side of things are coming out with new ways of embedding documents together within multiple documents. Well, not every application can deal with that.

So what happens is, all of a sudden, you may get a document that has this really neat-looking little icon in it, but it's not actually the document itself that you need, because it needed to be expanded for what was there. We find emailing issues and things like that, or attachments are coming in, or sometimes you're gonna have a – we'll get audited, or our clients will be audited, and they'll look and they'll find that there's this big email PSD file that's got all these embedded emails. Well, somehow, how are you going to go through that and search all of them and pull data out of that? Again, rendering issues. If somebody has rendered a document using the wrong PDF rendering solution, and it doesn't keep the right fonts or format that you need from an FDA submission, those are things. Or if you've given up some of your controls to somebody else, are they actually going to be able to hit what you needed to do? You got to make sure of that when you're dealing with people.

34:36

You guys can jump in at anytime you want to talk about any of these real world content issues. I know that you're seeing the same things that we see on a very regular basis. Some of the other things that we see is, you can't search those images. If an image comes through and it hasn't been OCR'ed, optical character recognition, and you can actually see the data, and typically, when we have a document management system, we want to have something that is indexing everything all the way through the document, through the metadata, but then also through the content, so we can actually find those things. Well, if it's not searchable, if you can't OCR it, or you haven't set it up that way, it's just one image, and you can't get to it. So it's going to be important that you do that and understand how can you make sure, make sure that it's searchable. If zip files are in there, or bundled documents, and you can't get those documents out of it.

There are no naming conventions, and all of a sudden, and that was one thing Ron had mentioned earlier, when he talked about migrating the data, and sometimes you'll get the data in, somebody will start using that data, and they won't follow the naming conventions that were started with, or they come up with a different naming convention. Now you can't tie some of those documents back together, and now nobody knows where the source of truth was. And the goal is to try to figure out, how does it go back to the original document? Those are things. Or even multiple copies of the same file in different folders. Which one is the real one, and which one are you going to have to constantly be doing comparisons between multiple documents? Those are things that we're trying to avoid at all costs, because that's where that waste is going to come in, to what you're working on.

Mark Gross

Right, and all the version control that comes along in multiple copies. You mentioned before, OCR, yes, you could OCR it, but then what's the quality of image, and how good is the OCR? Is it good enough for the kind of applications we're talking about over here? Usually, in these kind of systems, it's not proofread to make sure it's all correct. If it came in as a copy of a fax that got re-faxed, you probably have very little that's accurate in this. These are all things we see all the time. Yeah.

Keith Parent

Particularly around somebody putting, like, wet signatures on things, and they feel like they've got to print it out to get a wet signature, and then they scan it in to send it back to somebody. Those are very common issues that we were running into on a big thing. I know that Ron and I spoke a little bit earlier about vendors dropping or incorrectly setting metadata on some of the things, on some of the documents. They may send summarized files instead of the data that's in there. I've seen this before, where you have an outside vendor, they'll put stuff together, and all of a sudden, it'll come with headers and footers, and watermarks, and all sorts of other stuff that can't – they make it almost impossible to use some of those documents that you're getting. They think they're doing a great job because they're protecting that document.

But the reality is, it makes that document less usable for what we're really trying to do something with, and we need to be able to facilitate the transfer and receipt of the documents between systems without having some of those extra characteristics that they're adding into that. So those are important things that we think about when we're dealing with integration, particularly between multiple systems, or one system is going to feed another system and drive some of that usability.

Ron Niland

Yeah. One situation I've seen a little bit is that companies, they, from a sponsor perspective, anyway, they have a tendency to work with certain vendors. If they're doing, let's say, clinical development, and they have a series of studies, the studies sometimes are cloned, and you get little variants in this study versus that one, and maybe there's a Japan-specific arm to it, but they start to look a little bit alike, and the vendors sometimes literally get confused. I've seen a situation where a vendor presented data back to the sponsor for a trial, and it was not the correct data. And, you know, it was for a similar study in the portfolio, but they got it wrong, and the submission ended up not being in a position where they expected it to be. It required some rework, for sure.

38:39

But this is where, going back, there's that aspect of engaging with the vendors, and not only just talking about who's responsible for what, in terms of the execution of the trial, but there really needs to be a very high-level understanding of who's managing that data and the documents and the associated taxonomies and the architecting, if you will, so that it's very clear what data sets are associated with which studies and what have you. So that at the end of the day, with it being fast and furious and trying to get to market as quickly as possible, a mistake isn't made, a baton is not dropped.

Mark Gross

Right. What, what, Ron, what you mentioned earlier about slight differences in places is a major issue, not just over here, but just all over. Because we're so used to now the concept of cut and paste and take something over and change something a little bit. We've built specific software, there's a product that we have called Harmonizer that will go specifically for that purpose across a hundred documents and find all the similar paragraphs there, because that's, people are doing this all the time and it's very easy for mistakes to creep in. I think it's interesting that you bring that up, I guess I'm just going to talk, I mean, what I've got here could probably take a whole session by itself, but I think really taking off of what Keith has been talking about, about maintaining the – really about what happens as you move documents from place to place, and the importance of moving away from just having images, and copies of faxes and stuff like that, into a more structured system.

And the issue of, as people move things from one system to another, they lose a lot of information. And this goes into, not just, the text isn't exactly accurate when you're taking OCR of an image. But also you're losing information ABOUT the documents, and you have to make sure that you're keeping a taxonomy and metadata about documents, tends – gets lost, when you just do a copy and paste from one place to another. I guess we're calling drag and drop, or I call cut and paste from the days when I was actually cutting and pasting. But, so, we're dealing with large amounts of documents, as we're talking about over here, you have to make sure to engineer the process as you're going, taking it from one place to another, that you don't lose that metadata, you don't assume that it's going to be there. And if you lose it, you're going to end up having to do it again, and all the risks of making errors and things like that. Next slide, please.

Keith Parent

Well, Mark, I think the point about losing that metadata, we talked about that 40% of wasted time. When you lose that metadata, that you could've used that metadata to set up views or to find that data, things like that, when you lose it, that's where that time has to come in. Because once you don't have the metadata, you got to figure out a different way of being able to find that same document. That's a big part of that process.

41:47

Mark Gross

Absolutely. And I think, actually, that redactions is another example of losing information that's in that document. When you look at documents, because of PHI and PII issues, a lot of information gets redacted, like sometimes the whole thing becomes black. But realize the information behind that has to be somewhere, it's just been, a mark has been put there to tell you that when it's displayed, that piece is redacted. The system itself and the way you use information, you have to make sure that you're keeping track of that, because the information is really back there. So, how do you know that when you transfer this record over, somebody can't get into that information you intended to redact, and vice versa, how do you know that when you transfer information over that was redacted at a certain point, when you need to get to it, do you have the correct information behind there? These are all things that become much more important, as you've gone through electronic methods of redaction and as the issues of privacy have become such a much larger issue, sometimes overwhelms a whole project both in here, the legal aspects later on, and all along. Next slide, please.

Ron Niland

Oh, if I could just –

Keith Parent

Actually, Mark... Go ahead, Ron.

Ron Niland

Just real quick. The other element and you were touching on it is later, what happens later? The archiving of information is a requisite with regulatory bodies around the globe. They will say often, you need to save this information for 20, 25 years. The aspect of that metadata, again, the preservation of it, and also the aspect of the redaction, those come into play with what's been sort of hermetically sealed in an archive that needs to be available if that regulatory body wants to see it five, 10, 20 years from now.

Mark Gross

How to keep both versions in sync.

Keith Parent

I was going to mention, I was also going to mention when you were talking about the redaction, the whole issue that as the industry starts to change, and we're doing more distributed trials, and more adaptive trials, all of a sudden, now, data is shifting over the course of a trial and things like that, and different people are getting involved. We're actually doing more patient-level data gathering as part of some of these. Sometimes it was basically for observational registries or things like that. But now, some of that are being embedded right into some of the trials, and it's going to be important that we understand, how do we deal with that kind of data along the way, particularly when we're going to be using multiple people dealing with that data and consolidating that.

Mark Gross

Right. Just seem much easier when you just have patient, you know, these patient IDs have traveled all over but now they're not really doing that anymore. So I was going to give actually what's probably like a three minute overview of XML and SPL. The whole issue that I think part of this is leading to that, it's really important to get information as structured as possible, as early in the process. To the extent you can, you can save a lot of that 40% that gets lost on the way. XML is really, when we talk about structured data, but really XML is, you know, the current state of the art in how you structure information. Actually, if you go to the next slide, like you just did, what is structured information, what is XML, what is structured information? Structured information means that instead of just having the text that you see in front of you, you've also gone in and tagged in the specific things that are really important to you.

45:36

This is the back of a journal article, but what's important to be here is not just to read it across, because that doesn't really, it's hard to tell what's what, but also you have to, you want to be able to identify what is an author, what's the name of the journal? What is the date that a journal is published? All those kinds of things that's the metadata about that article that's embedded right over here. And, so, what you're doing is separating just the text away from what identifies what the text is about. So, you take – so, what does that mean? We have a slide over here showing you what XML looks like. Many of you have not seen XML, it's not really as scary as it looks, especially all this color coding that has been put in over here. It's not just a text that's identified, but now you've identified, Singh is a surname, a Sanjay is the given name. That's one author, then there's a second author. And this is all, this is a structure that's identified elsewhere by a map of what the structures look like, which is called a schema. So each item in there is identified in that way. Now, what you looked at, as just a bunch of texts that, if you didn't know what order was in, you wouldn't really know what it is, here, everything is identified by what it is. So that's what XML looks like.

Let's go to the next slide. But that's not the way people read it, people will see it on a screen. Actually, go to the next slide, and we'll come back to this. So, taking that away from there, let's look: many of you are familiar with structured product labeling, and SPL. Well, SPL is a form of XML, where you're doing the same thing, not just for a journal article, but here it is, not for drug submission, but for the drug information. Thank you, going into that, you're identifying information that's involved over there. This is the standardized format for an SPL, what SPL filing would look like. So, it tells you what the DUNS Number is, which is very critical, what the product code is, what kind of material it is, what the manufacturer sequence is, all the kinds of things that are involved, that are needed, that are otherwise going to be in text, but it's not necessarily in a particular form, it gets put into these kinds. That's really extending what we looked at before as a simple XML format to something that might have hundreds of identified items on it, all of them very critical, and now you've identified explicitly.

Now, going back to the previous slide, this is not the way you would look at it. Once you've identified everything, this is the way you would end up looking at it. We'd want to enlarge that one. So this is what it looks like, would look like in the human readable form on the screen. Because you've identified each of those items, the computer can then go and show you a representation. This is a rendering. Keith was talking about renderings before, this is a rendering of that same information, to show you what's really there. And not only can you render one SPL listing over here, you can pull together many things and pull out the data from this. Once you've got it in that structured form, not only is it standardized, it can move from place to place internally by you and goes to the FDA, but then it goes up on these things, and then people can do all kinds of analytics, of course, not just on that one drug, but of course many other drugs. That's really a lot of the value of being able to go to a structured format. In particular, this is XML and SPL. Let's go to the next slide.

Ron Niland

Just, if I could build on that-

Mark Gross

It was supposed to be an eight-minute slide as a single introduction to XML. Ron, you want to go back to the previous slide? We now have time.

39:44

Ron Niland

Yeah, just to build on that, Mark, the aspect of XML, it's the universal translator, if you will, that enables the FDA not only to do cross-comparisons within your submission, but comparisons of your product with its safety and efficacy across a multitude of products. And that is exactly where they are going with regards to the initiative called IDMP, the identification of medicinal product. They want the track and trace capabilities, all the way down to this granular level, so they can really better ascertain the true benefit of your product, vis a vis a competitor, and the true risk of your product. All the way down at the very discreet safety level of side effects. And so that's something that, it's here to stay, and it's something that is on more of the tail end, if you will, in terms of a submission. But I think the theme of this program is you really need to think about that long game of managing your data, so that you can present it to the regulatory bodies and ultimately get your product reviewed, and in certain parts of the world, reimbursed, which is the most critical element ultimately.

Mark Gross

Right, and just as you mention that, this is SPL, and it's mostly a US-based standard. But it's been in effect for, I guess 15 or 18 years. But now other countries are going to go through the same process. Health Canada is going through a version of this, which will, is now, I think it's now voluntary, but will probably become mandatory within the next year. Europe is going through that. And we'll have somewhat different approaches to it. But as long as you've got your contents structured in this way, it becomes am easy transaction, an easy next step to moving to these other structured formats in other countries, and as Ron mentioned, more and more detailed information is going to be needed over time; the structure allows for all of that. With that, I'll turn it over to you, Ron, I think.

Ron Niland

Okay. What we wanted to do is summarize the hour here, if you will, with this last slide, if you can advance to that, please, we'd appreciate it. When you look at this, there's one key word that we want you to think about, and the idea is reuse. Not only for you as an individual with your company, or the institution you're associated with, but also from the regulatory body perspective that we were just talking about a moment ago. They want data presented, the regulatory bodies that is, and you, within your company, that data that you can access. And the data, then, they want to be able to manage it, and they want to be able to do some cutting, slicing, dicing. Working with the Ginsu knives to then present the data in a certain way, so they can do certain analyses, and then to ultimately present it back to different bodies.

53:01

And so, if you think of this as your long game for you with your company, for the 40% that maybe haven't gone there yet, and realize, I wasn't surprised with this bimodal distribution. Like, there's some people that haven't gone there yet, you're probably maybe associated with upstart companies, maybe, and then those that are more sort of evolved. But this model is really predicated on, when you work with your team or your vendor, or you're thinking about that regulatory submission to say, okay, how are we structuring this? How are we ensuring people can access the information, ultimately manage it? You can see, when we talk about managing their sets of data and subsets of data, and it sort of expands onward. This is sort of a, this is just a take-home slide that may help to frame discussions with different parties, and may help you to better organize and manage your data and documents. Keith, your thoughts?

David Turner

We're almost –

Ron Niland

I'm sorry.

David Turner

I was going to say, thank you, everybody, on this. We are getting close to the end here. So, I do want to run a quick poll that we're going to run here, give our panelists a chance to get a quick drink so we can answer one or two questions. So, let me get that launched here. Coming up against the hour here. Just quick question, where does your organization struggle with the most? I'm launching the poll now, we'll take about 30 seconds here to let you guys put in your responses. I do want to thank everybody who's attended, wow, 90% of our attendees have been here, start to finish, which is not always usual in webinars, so thank you so much for your attention and your participation. This is great. Going to give us about five more seconds here.

Oh, this is fascinating. This is fascinating. All right, we had about a half vote, and let me share the results here of the poll. Big, big number on data consolidation and harmonization. Then equal numbers between these other three groups.

Panelists, do you want to comment really quickly on the poll?

Keith Parent

Yeah, actually, I'll jump on that for a second, David. So, one of the things that I get very involved in are industry working groups, particularly around trying to put together standards that we can use, because it's, it's amazing how often different companies look at things differently because of the ways that they're working. So, by working with these different industry groups, you can actually help to drive those kinds of standards. Recently, I've been working on the DAA RIM working group, to put together a RIM standard, and I think it's important for people to look at regulatory information management and understand that, if they at least adhere to something, it's a good way to start. We've done a lot of work with EDM and eTMF reference models that came out of the DAA. I think those are important. But I think those things are really important to look out in the industry that you're in and understand: what is it that you can tie on to and help drive some of your decision making?

David Turner

Excellent.

Ron Niland

I would add to that-

David Turner

Go ahead, Ron.

Ron Niland

The content mapping made me think of this book that I just quickly pulled off my shelf: it's entitled, "If You Don't Know Where You're Going, You'll Probably End Up Somewhere." I think the issue is, if, without a map, you're going to be in the dark. And if you're working with vendors, it will get really scary. And so that's something that definitely, it's worth investing in. Sometimes, it's nice to bring in a neutral third party to help sort of work through that process. But starting with a map is always a great place, because that'll help people to understand the ultimate destination.

57:23

Mark Gross

The other variation of that I've heard is, if you don't have a map or a target of where you're going, how do you know you haven't gotten there yet? But I absolutely agree with you. I mean, you really, there's a need to lay out where you're going, what the mapping needs to be, and how to get – there's just, you know you can't just play it by ear anymore, there's just too much information, and it's too serious, and it's too much risk if you don't check every step of the way. And particularly, when you're migrating content, consolidating content, moving it... how do you know that you're getting it right? You have to have the right QA procedures in place at each step of the way to make sure that, that it doesn't go off that path on that map that you just talked about.

David Turner

Well, gentlemen, I wish that we'd had more time. There's just – ah! There's so many topics we could have just gone a lot deeper on, I think. Maybe next time we do one of these it'll be two hours. But this is great, and we thank you all for your expertise. Those of you who had submitted questions, we will follow up with you individually on those. And of course, if you think of additional questions, feel free to contact us. Leigh Anne, would you throw my email address in the chat for them? That way, anybody who has additional questions, they can contact me afterwards. We're really, really happy that you came in and we want to make sure that you get all your questions answered.

Anyway, I just do want to finish up by just saying thank you to everyone for attending. Thank you to our panelists, Ron, Keith, Mark, really good stuff. Just for everyone out there, so you'll know, the DCL Learning Series does comprise not just webinars, but also a monthly newsletter. We also have a blog. You can access many other webinars related to this topic or content structure, XML standards on the on-demand webinar section of our website, which is, I think, www.dataconversionlaboratory.com/on-demand-webinars. We hope to see you at future webinars. Everybody have a great day, and this concludes today's broadcast.

Mark Gross

I just want to thank David for his great work over here, and all the behind-the-scenes that are doing these webinars, thank you all.

Ron Niland

Thank you.

Keith Parent

Take care, guys.

Ron Niland

Nice work.