DCL Learning Series
Content Reuse Strategies for Pharmaceutical Clinical Content
David Turner
Why, hello, everybody, and welcome to the DCL Learning Series. Today's webinar is actually titled "Content Reuse Strategies for Pharmaceutical Clinical Content," not a mouthful for anybody, I don't think, but in any case, my name is David Turner, and I am a content consultant here, also the head of partner relationships at Data Conversion Laboratory, and I just want to give a couple of quick things here before we begin. First of all, just to let everybody know, this webinar is being recorded and will be available in the on-demand webinar section of our website, which is at www.dataconversionlaboratory.com.
I also will say that we do invite you to submit any questions that you have at any time during the presentation, or conversation, really, is what it is today. And we'll save some time at the end to get those questions answered, so please take advantage of either the question feature or the chat feature to submit those questions at any time. All right? Couple quick things here at the beginning, we are talking about digital transformation today because it is happening across all industries in various degrees, from highly regulated information like drug labels or package inserts to scientific research articles.
Content comes in a lot of different forms, and simply having it in a digital format, like a Word file or an image-based PDF is a good thing, but it's really not enough for today's digital-savvy consumer. It's really not enough for the processes that we need to work in in these different industries. So it's really not enough for the computer systems that find and deliver information. So at DCL, or Data Conversion Laboratory, our services are all about converting, structuring, enriching content and data. So we're a leader in XML conversion services, DITA conversion, structured product labeling, or SPL, conversion, S1000D, you name it. And while we're best known for content conversion services, we actually do some other related work: semantic enrichment, entity extraction, data harvesting, et cetera. So if you've got complex content and data challenges, we can certainly help, but this is not intended to be a commercial for DCL here today.
We also today are featuring one of our partners. As I mentioned, I manage partnerships, and one of our best partners in the digital transformation realm is Content Rules. The team of at Content Rules not only creates great books – one I constantly recommend to people, The Personalization Paradox – they not only create great books, but they also work with companies large and small on critical pieces related to digital transformation like content strategy, or globalization/ localization strategy, reuse, personalization, change management, technology selection, all those different things. And I will say this, furthermore, Content Rules is the only end-to-end content service provider that has experience in pharma, and they're doing a lot in pharma right now.
All right. Pushing right along here. Our topic today is content reuse, and I realize that some people have some different perspectives on what reuse actually means. What we're talking about here today is about the idea of using existing content components as building blocks in multiple contexts, instead of repeatedly creating them, or copying them and pasting them. Right? The idea here is is instead of having to manage the same section, or paragraph, or some other size text block and dozens of documents, you store it in a central place, and then you dynamically insert it where it's needed.
4:07
So we're not really talking about working from a Word template or something like that. I mean, obviously that can be, quote-unquote, "reuse," but that's not really the context we're talking today. We're not talking about doing a mail merge in a Word document. Again, that's a kind of reuse, but it's not really the focus. We're focusing on structured content management or structured content in data management, creating reusable components, managing documents not from the document level but more from the component level. And we're going to specifically hit on how it affects life sciences and pharma, although it does have an impact on other industries as well.
So we're dividing this kind up into three big parts. The first part is kind of the biggest part. Where do we find this reuse? How much reuse do we have? How do we get into all of that? And we actually have a really cool demo here of a technology that you might not be aware of for helping to find this reusable text. The second section is all about the benefits of a strategy like this. We understand that moving to structured content authoring or structured content management, it can be a big change, but we're going to suggest some reasons why we think it's worth making the switch. And then finally, we're going to walk through kind of the basics of how to get started, how to start structuring your content to take advantage of these different benefits.
I am excited to have two actual experts on the call with me today. So let me introduce those to you. They're two people who I actually spend a lot of time with, and two of my favorite people in this industry. The first is Chris Hill, who is a veteran of the component content management space. He's a frequent author of a lot of white papers. In my last company, there was a white paper that Chris had written that I must have used 700 times when trying to explain concepts of XML to people. Anyway, Chris here at DCL today leads the Harmonizer solution and also serves as a project manager for some of our clients.
Joining Chris is the great Regina Preciado. Regin is out in California, and she does fantastic work with, really, she's worked with companies in every industry, helping them to transform to structured content, but she's really been focusing on pharma these last several years, and so we're excited to get her perspective as well. So welcome, Chris and Regina, say hello.
Regina Preciado
Hello.
Chris Hill
Hi, there.
David Turner
There you are.
Regina Preciado
[Laughing] I followed instructions, David.
David Turner
I love it. I love it. Well, thanks so much. So I'm going to click over here and try to get us kind of set up. I'm going to do a couple setup slides, and then let you guys talk and dig in and show the demo. So we're starting here with the easy button slide, right? So pretty much everybody knows how to reuse content, right? It's, you just copy and paste it, right, or maybe you copy and paste it, tweak a little bit. This is easy. It's built into your computer, and everybody knows how to do it.
Well, maybe not. Isn't really that easy. I mean the easy kind of starts to wear off. Well, actually it wears off kind of big time when things get part of a bigger context, right? What may be easy in the moment with one or two documents might create big headaches over time. So I came up with this example to kind of start our conversation where I was thinking in pharma, if you're a drug company, you might have three drugs in a family, and those three might have really similar content, maybe similar instructions for use, or similar legal jargon, or similar adverse reactions. And so maybe you might have a document, and you can copy and paste what you use from drug A over into drug B, or you can take what you did for drug B, and you can hit "Save as," and kind of create a drug C, and that may not be that big of a deal.
8:03
But then you start thinking of about, well, for each of these three drugs, they might have five, or six, or maybe even a dozen different pack sizes or different presentations, and each one of those might have some required document that they have to submit to a health authority for each of those presentations, and there could be over a hundred of those health authorities.
And then, you think about that, when you're actually putting this thing out in the market you've got package inserts, you've got a carton. Maybe you've got some information on a mobile app or on a website. And while maybe you could copy and paste and tweak to create all these documents, what happens if there's a core change, right? I mean, if I make a core change back in drug A, that's not going to scale. So the idea is that if there's a change here to one of these drugs at the beginning that has to be then done across all these different documents, it might affect 50, a hundred, more documents, and I'm thinking it's going to take weeks, and months, and, just to be able to get through all of that. So I guess the question is is there a way to get both easy and scalable? So enough me talking. Regina, let's start with you. You've worked with a number of different companies, both inside pharma and out. Help us to understand a little bit about how we can get both easy and scalable with a good reuse strategy.
Regina Preciado
Right. Thank you. I want to start and say that easy comes a little bit later, acknowledging that the change is hard. So at first, developing your reuse strategy and implementing your reuse strategy is not necessarily easy, and then it gets so much easier and faster over time that you can find a whole bunch of other things that are challenges. We take content reuse for granted very quickly. So a reuse strategy, to make it easy, you do need to start with a strategy, and a reuse strategy identifies what content can be reused, also prioritize what content will be reused, because there's a lot of content that can be reused, but you can't do everything all at once on the first day. In the strategy we talk about how we will reuse the contents, and a short description of that would be reusing it identically as it is. That's what most of us think about when we think about reuse. That is the true reuse. There are options for reusing the content, and then making some changes where it is reused. And the more we can automate that, the better. Same sentence, but the variable that holds the data, the data's a little different.
There's also some governance around content reuse, who is allowed to change the reusable content. If you are reusing the description of a method of administration in 140 documents, and someone wants to make a change to the wording somewhere, not everyone, really, should be allowed to just change any old thing, because it'll change in all 140 documents. Where to use planned reuse, and with pharma this is awesome, because we know what kind of content is required to be delivered in what is currently, we mostly deliver in documents. The industry is moving forward into "Hey, we don't need the document. We want this piece of information. Deliver it here." Either way, when you can plan ahead, that's where you can get a lot of automation. As soon as the content is created, it can automate, automatically appear wherever it is needed throughout the dossier, throughout websites, throughout the submissions, whatever, but you need to plan it upfront.
12:00
It will be smooth, and easy, and automatable when you have a plan, and not just tackling it with enthusiasm, but without a plan. And then your reuse strategy will also define when to use ad hoc, or we also call it manual reuse, and that is where you're empowering your authors, the medical writers, and SMEs, and everyone to they're putting together an assembly of content, and they go "Oh, you know what? I need a description of such and such. I know that exists. Let me go get that building block of content, and reuse it in my assembly." So it's not planned ahead of time, necessarily. The plan might be you might need one of any of 20 things, so go get the right piece of content to put there, rather than writing another redundant version or starting from scratch every single time. So reuse strategy in a nutshell, those are some of the pieces we want to plan for upfront, and design the reuse program upfront, so that then it can just work.
David Turner
Gotcha. Gotcha. Well, I think all of that sounds excellent, and I think a lot of people are probably interested in that strategy, but it kind of begs the question, when we get started, what's possible for reuse? And, Chris, I think this is where you're going to fit in a little bit here. As we deal with some companies who are trying to figure out "Do I have a business case for reuse? Do I have some sort of a means to attack this?" Talk to us a little bit about this tool that you're going to demo, and I think... oh, did... yeah, we've already made you the presenter. Fantastic. Give us a little bit of an intro here. Talk about what this tool is, and what we're going to be seeing, and how it applies to starting to move forward with this reuse strategy.
Chris Hill
Sure. So one of the ways we've developed over the years at DCL to sort of jumpstart the analysis of all these reuse questions, you can't even figure out answers to a lot of these till you have a good sense of what stuff is being reused already. And that reuse isn't necessarily through a formal process. It's through that deceptively easy copy-and-paste mechanism. So we've all been doing for years, creating these documents, let's say Word documents, or different types of documents, whatever format I've chosen, and I may have done reuse by copying and pasting.
So one of the challenges in getting started with a reuse strategy is you're not starting with a clean slate. You're usually starting with a bunch of existing documents, and you're trying to figure out "Where do we even start?" Okay, so one way would be to go through each document one by one, and I can just read them, and try to make notes of where something matches something else, but when I've got more than one document, that quickly becomes kind of a hit-or-miss proposition, because it relies very heavily on your memory and really your knowledge of all these documents, and it's very time consuming to do. So harmonizing these –
David Turner
I can speak from experience on that.
Chris Hill
Yeah.
David Turner
Not too long ago, I had to do a project before I knew about this tool, and we just had four documents. They were like four 20-page documents, and so I pulled them up, and I got out a little Excel spreadsheet, and I started working, and it took me more than a week to really identify where all the pieces were, and that was just 80 pages of content.
Chris Hill
Yep.
David Turner
And, Regina, I think we were talking one of your clients not too long ago in the financial services industry, and they were like "We've spent a month on this already, and we're not anywhere close." So I mean, it definitely is a lot of time. So anyway, I'll shut up and let you keep going there, Chris.
16:03
Chris Hill
Sure. No, that's a great example, and it just gets exponentially a bigger task every time you add a document, basically. So what we developed was a tool that creates a report, and this report is just an HTML delivery. That's just the format of the report. It's browser-based, so you can open it on your machine and look at the report. And what we did is we put together a tool that takes all of the content in whatever format you have it.
It pulls all of the text blocks out and calls them paragraphs, and that's just for convenience. Realize a paragraph could be a heading, or a table cell, or whatever else you want to call it, but it's a block of text, and we pull all of those out of the documents. We let the computer do all the comparing, and the computer figures out: where are all the matches? What already is, quote, being "reused" by copy and paste? And those would be exact matches, but then it goes a step further and does something that's almost impossible for most humans to do reliably.
And that is it does a lot of close matching, and it does that based on a natural language processing algorithm that basically can take two blocks of text and can say: are they similar to each other? And by "similar" I don't mean are all the words the same. I mean do we have the same character sequences embedded in them, so that we're talking about the same subjects? And I'll show you some examples of that in just a moment. So what we did for this particular demonstration of the product was I asked David to provide me with some pharmacy related documents, and you'll see a list of them down there in the blue text. This is for Inlyta, and maybe, David, you can tell us a little bit about where you got these documents, what's in them.
David Turner
Yeah, absolutely. So I was thinking about that labeling example that I just showed the slide for minute ago. And I got to thinking if I'm in a pharmaceutical company, and I've got a drug, what are all the different labels that I might have to have, and what are all the different documents? So I just went online, and I found a random drug. It's one of Pfizer's drugs. It's not a drug that I take; it's not something that we work with. I literally, I found it online, and then I just started looking at other websites around the world, and pulling down the documents.
So I tried to find some physician-facing materials. I tried to find some patient-facing materials. I tried to pull from different geographies. So you'll see there's US content. There's European content. I got something from the Australian health authority, Malaysia, Israel, and then I tried to pull some different pack sizes, representations, things like that, with the idea that it would give us something where we could look at this. Again, this is all published information. If you are a pharma company, it doesn't have to just be published information. You can use this to look at information that hasn't been published yet. You can look at this to compare core company data sheets. You can look at it to compare whatever, but for this particular example, those are the kind of files that I found, and I tried to get a representative sample to see if we could find some potential reuse.
Chris Hill
Great. So what Harmonizer did is, David provided me with these Word documents, but again, they can be really any format. We are a conversion company after all, so we can get those all normalized, so they can all be compared regardless of what format you happen to have them in. And we went through, and we extracted the text blocks. So here for each document you'll see, we're telling you on this page of the report, how many paragraphs did we pull out of there? And then we do some filtering.
20:09
So one of the filters that we provide is a filter that's based on minimum words, and that just says "I only want to look at paragraphs that are five words or longer," and that's an adjustable number. You can adjust that down to one, if you want to compare every paragraph regardless of length, or you could raise it up to 10, or 20, or whatever to get the bigger text blocks, and that can be handy if you're doing, a lot of times you can't, if you've got a huge reuse problem that you're trying to solve, sometimes you've got to break it apart, and so you start with the bigger blocks first, and then maybe work your way down as you get more and more progress on it. So this just allows you to adjust that dial, so to speak. So that little number –
David Turner
And you typically, Chris, run this report a handful of times and see what you get.
Chris Hill
Yeah, yeah. It takes, depending on the size of the content, you can get a report back in under 24 hours. Oftentimes it's under an hour, and for a report this size, it's about 10 minutes, and the computer does all the work. So I just push the easy button, and away it goes, and then I get back this report and can provide it very quickly. And here you'll see for this particular document with that setting of five words, we're going to not look at 23 of the paragraphs.
So there's 23 blocks of texting here that are less than five words. Those aren't being included in report. So of what remains of those paragraphs, we look at them and put them into one of three categories. The first category is if it has one or more exact matches. So we compare every paragraph that we've extracted from every document with every other paragraph, and if they're the same text, then they're an exact match, and we tell you that's an exact match, and we'll show you where that is, and I'll show you that in just a moment.
And then we have a category called close matches, and this is where that little natural language processing magic comes in to find the things that may not be obvious to the human eye, and we'll look at some examples of that, but that can be text like if I told you "Before taking this medicine, consult with your doctor." So if I told you that in a set of instructions, and then I turn around and elsewhere I write it "Consult with your doctor before taking Inlyta." Well, those are different, and they may not look obviously the same because all the words are in a different place, all the letters are rearranged, but there's a high, high similarity between them.
So those will fall into the close match group, and so that would be considered a close match, and you can adjust how close you want close matches to be with this similarity setting, and that's just a dial that we offer to let you dial up or down the number of similarities you get, so how sensitive or how close do they have to be? So if I raise that number up to, say, closer to 100%, then I'm going to see fewer matches that are close, and if I lower that number, I can lower that, right now it's set to 70%, but I can lower that down to any other figure if I want more matches. [Coughs] Excuse me. So let's go ahead and look at some of these matches, because –
David Turner
Yeah, let's do that.
Chris Hill
– that's really where everything interesting happens. So this is the paragraph match page of our report, and it's got this handy little document map on the side. What that document map does is it shows you each document that we processed. So here's that first one I was talking about.
24:00
And it shows you all the paragraphs we found in there, and these are the ones that exceeded that minimum paragraph length. So these are five or more in this case, and you can go through this one by one. This is the order that that text appeared in in the document, and it only shows me the first part of the text. Obviously, if I was doing some real work on this, I would probably open this in Word, and have this on my second monitor over here, and I would be able to go through this document, go through my Word document, maybe I'd have several documents open, and I could see where all the matches were between this document and all the others.
So let's have a look. This first one here just is packaged leaflet information for patient, and that is a unique paragraph. It has a little star next to it, so all it's telling you is it's unique. But if we go look at one like, let's just go look at match group one over here, you'll see that this heading, "What is Inlyta and what is it used for?", appeared twice. It appeared in the same document at two different locations. And if I want to, I can click these numbers, and it will line it up with the document map, and you can see here at position 13, that's where that text occurred the first time, and at position 21, it occurred here the second time. So that is exactly the same text reused twice in the same document.
Now, I could obviously look at all these other exact matches, but what I'm going to do instead is take you down to some other examples that are maybe a little more interesting than just something that's easy to find like that. So let's go down here. I'm going to go down here to match group 27. You can see there's a lot of matches here, so I could go through this all day, but I'm going to just hit some highlights for you. This is a match group that would be very hard to find if you were just reading through each document one by one. So you can see that there's a paragraph here that is similar in a whole bunch of different ways. So what you see at the top are all the different variations of this text that's all similar.
So this is all talking about thyroid function, and all the black text is the same between all of these, but all the colored text varies, and may or may not be present in each variation of this paragraph. So I can see in the SMPC manual and the package insert, which I think is for Europe, that they talk about the thyroid function should be monitored before initiation of, and then they refer to the drug as axitinib. I can see why they use the trade name of in Inlyta. That's easier for me to say.
But anyway, axitinib is what they're using in these two documents, but down here in the Australian document they're using Inlyta, and they rephrase this. They aren't saying "Thyroid function should be monitored." They're saying "Monitor thyroid functions." So this is a command versus a passive sort of thing. Now, there may be a good reason for this, but the point is, is that when you have these close matches, you can look at this, and make an intelligent choice, and say "Is this a real difference that we should maintain, or should we be more consistent in our language?" And you'll see some others. This may have a good reason, but as I go down, I might see some editorial choices that were made.
So "monitor thyroid function" or "monitoring for thyroid function," and then they added "is recommended." So here in Canada, for whatever reason, they've kind of changed the language a little bit. Now, that's a subtle change, but it may be important, and the other thing it does is if this is a subtle change that is not important, then this just triggers a whole wave of different texts now that I have to maintain.
28:03
So to Regina's earlier point, by getting rid of these arbitrary variations, I can reduce the amount of text I'm trying to maintain and then really implement a much better reuse strategy. So what Harmonizer –
Regina Preciado
And let me –
Chris Hill
Yeah, go ahead.
Regina Preciado
Can I say in here that one of the things that we do at Content Rules with our five dimensions of content standardization, note that's not the science. Science can't be standardized. The content can be standardized. That is what we do when we look at something, and you're thinking, why is one word – how do I have a business case for one word, or these slight variations in grammar and tone? But when you cost it all out in time, in money, in the fact that your writers are rewriting here or starting over there, and they don't know that they could have just used something, maybe there is a difference for different regions that you need to say "You should talk to your doctor" versus "Talk to your doctor," or whatever, but you come up with those standards ahead of time, standardize the content.
It's faster to put together. You get more accurate translations. It's more accurate. It's more comprehendible for people who maybe don't, they're reading the English source, but English isn't their first language. There's all kinds of benefits to standardizing, and running the Harmonizer report, and finding there are no variations in the sentence, and there's things you can do with variables and other technology to automate some differences that have to be there. Use the generic name here. Use the market name there and so on, and just, the benefits, I want to say they exponentiate up to a certain point of standardizing. So anyway. jumping in –
Chris Hill
No, that's great.
Regina Preciado
– on how we use this report in real life.
Chris Hill
So let's look at one of those examples that might have some places where you, again, here's some different documents, and in some places we're using that general name, axitinib, and then in other places we're using the trade name, Inlyta, and in one place we're adding a parentheses-R, which I think is supposed to be the registered trademark symbol. Maybe for Singapore that's how they do it. I don't know. But again, this is telling you, okay, are these choices made for a reason? And if so, do all our authors understand that reason? And are we being consistent in applying the different ways that we refer to this drug? Now, you mentioned something that I think is interesting. Regina, you mentioned something about being able to use different names in different publications, even if you're reusing the paragraph. Maybe you could talk a little bit about what you mean by that.
Regina Preciado
Sure. So a variable can be many, many things in content, and I use the word a lot. I just mean a placeholder. So the placeholder could be for data from a clinical trial. In this case, it's the name of the drug, and a drug, of course, can have different trademark names or registered names in different regions of the world, and very, very recently the world has agreed on one standard of naming for the generics, but I don't think everybody is there yet.
So you can put, as you're writing, let's say you're putting together the USPI, the label for the United States market, and you're putting together the SMPC, the label for the European market, and one needs to use the generic, and one needs to use the brand name. So you can put a placeholder in the sentence that's, very simply, I'm going to say "drug name," and when you compile, or publish, or deliver that content to the health authority or to the printer to be printed in your packaging, that variable, which is a placeholder, will fill in with the name.
32:08
You obviously have to tell it. Okay, for this set of documents, we want this name. For this other set, we want this other name, and over here, and if we didn't specify at all, use the generic. So there is a file somewhere in your ecosystem that has that mapping of "Okay, system, every time you find this placeholder, replace it with this value." And I'm sure in pharma, most of the authors are familiar with that sort of thing from the data world and the data standards world. How this works with content, especially when you have a longer narrative piece of content describing adverse events, and what happened, and how many people had them, and things like that, you do need to look at this to standardize the sentence a little bit, because using a placeholder will change when you translate.
Different languages have different syntax. The words come in different orders. Even in English, we have the difference "a" or "an" depending on the consonant or the vowel that the word is going to have. So there is some nuance that we content people get very excited about figuring that out. Once it's figured out and decided, you have your strategy, and you just do that. And of course, with our machines getting smarter and smarter, we have natural language processing that can probably figure out "Ooh, do I use an 'a' or an 'an' in this case?"
And expert translators will also know where to put the word in the other language, but it's just things to, the details to keep in mind when you're designing your content up front, so that it's smooth, and easy, and repeatable, and fair, findable, accessible, interoperable, and reusable, which is why for me, I love the tool of Harmonizer, because it does the boring work of finding all the stuff. And once we find the stuff, we figure out the solutions for that type of content.
Chris Hill
Yep. Yeah, so –
David Turner
And I'll point out too that this is not pie in the sky. This could work this way. This is being done. This is being done by leading organizations around the world.
Regina Preciado
Yeah, we're doing it.
David Turner
And they say "I need a document for Turkey." And it says "Okay," boom, boom, "What's the pack site?" Boom. Out it goes, and it's got the correct nuance. Everything is built in because you figure that out ahead of time with your strategy, identifying your reuse, getting your technology in place.
Regina Preciado
Yeah. Yeah.
Chris Hill
Sure. So you look at, to your point earlier too, here are some spelling variations. Even within English, we have different ways of saying "Haemoglobin" and "Haemo–", "Haematocrit." Excuse me for, this is all new stuff to me, but you can see that they've varied here, not only on the spelling, but on the capitalization. Now, again, whether Australia really wants to capitalize these or whether the authors just happen to do it is a question that would be asked at this point when you were doing the analysis.
Regina Preciado
And I would answer you that, yes, in some times, some cases, the all caps is required, and sometimes it's not.
Chris Hill
There you go.
Regina Preciado
But there are fairly consistent rules about when it's required. We see a lot of variation because in the moment people don't necessarily know, and they hope the reviewer catches, or it's not really important, so it doesn't matter. And when it doesn't matter, it doesn't matter.
Chris Hill
Yep. So here, it would be obvious you could skim through it, and even do a quality check and say, were we capitalizing in the right places and not capitalizing in the right places? And you can see real quick which documents were doing which thing.
35:58
And there's a few other matches. I could walk you through this all day long, and we can talk about all kinds of different things. Here they're using the short acronym VEGF, and they spell it out sometimes, but, so down here in the Singapore document, it's spelled out in this paragraph, but it isn't in these other two. Now, I would probably want to look at this document and say, "Did we spell this out somewhere?" Because people may not know what VEGF is, or maybe they do know what VEGF is, and whoever the audience is for these documents don't require us to spell it out. I don't know. But again, these are proactive decisions.
David Turner
Chris, I think -
Chris Hill
The idea is to be proactive not reactive to your content. So, I think you were going to transition into the –
David Turner
In the interest of time, we certainly could look at a lot of, I was going to say, yeah, in the interest of time, I think we could look at a lot of examples, but if you would go ahead and talk a little bit about this overview page, and kind of the statistics and things, and then Regina, here in a second I'm going to have a question for you to kind of tie this together to how you take this information and actually use it when you move forward. But go ahead, Chris.
Chris Hill
Sure. So all of this gets summarized for you. There's a couple other pages that I can talk about. There's, real quick, a sequence match page, which just finds when groups of these things match up. So if you're looking for a bigger block to look at, I could look, here's six paragraphs that occur in these two locations. I could go look at those locations and see if maybe there's something bigger going on here that I want to look at as a group. So Harmonizer can help you find that. And then it summarize it all for you in a concise, nice little chart here. And all this does is it tells you of all those text blocks I looked at, how many of them were at exact matches? And you'll see it's about 30%, close to 30% here. About a third of it is a close match, and about a third of it is unique, a little more than a third. That's all summarized for you, and the numbers are presented here. That can help you get an idea of, for our content, is this a useful activity? Should we spend a lot of time on reuse? Because if I ran this, and 98% of the content was unique, I might tell you "Eh, you probably don't need to worry too much about the reuse," or if you do, it'll be real quick to go through because most of your content is unique.
Obviously, in publications like these that's not going to be the case, and so this gives you a good sense that there's a lot of productive activity to be done on this reuse front. And what we do here is also summarize for you how many blocks of text are potentially redundant, and all that does is it does a little calculation, and it says if I were to go into this paragraph match report that we were looking at, and I were to get rid of all of the duplicates and just use this once, so if I was to make this language consistent or reusable, and I reused it in all these four places, then I would go from four copies of this to one copy of this. And so, to show that map, the one copy is here in the needed category, and then there's three that could be potentially redundant. So this just tells me, hey, I've got a pretty good sized reuse problem. I can make a big dent in this content if I was to do some of these strategies and approaches that Regina was talking about. So that's really a quick overview of Harmonizer. Now, David can talk to you about, if you want, reach out to David, and we can show you a lot more, if you ever want a more detailed demo, for sure.
39:52
David Turner
Yeah. Yeah. Let me take over screen sharing again. I think I'm waiting for the little cue to come up here that says I can share again. And can you make me the presenter? Here we go. All right. Share my screen, and there. Okay. All right. So moving right along here. Obviously, like Chris said, if any of you're interested in getting more information, we do have a video about Harmonizer on our website. You can also request a customized Harmonizer demo where we can spend a lot of time and use your own content, and you can find that on this page as well. In fact, Leigh Anne, can you put the link for Harmonizer in the chat? That'd be awesome. All right. So, Regina, as I mentioned, now over to you. We've got this handy dandy slide with the Lego blocks, et cetera, here. You've worked with this Harmonizer data in the real world. How would you take this information from Harmonizer and put it to good use?
Regina Preciado
Right. Harmonizer is one of the most fabulous tools that we use in determining where is the reuse, and one thing I really like actually is that summary chart and that summary page to say, oh, you know what? There's almost no fuzzy match. There's almost no reuse in that set of content. Let's not prioritize that set. Now, that said, there could be a lot of reuse. It just wasn't, it was so wildly different when it was written originally that we don't start there. We also look at the contents in, at a much higher-level view, and with pharma, there's some obviously top-level views to start with. Well, the top top levels. Maybe a medium level, which would be, I mentioned the USPI, the United States label. The health authorities require certain types of information in a certain order, and there's a long history of documents that have outlines or templates, and so people writing this content and compiling this content already have a pretty good sense of structure on a high level, because they have worked in such a regulated and required, requirement-heavy field.
So what we do, then, is we look at the middle level. We take that top, here's what the document has to include in this order, and this document is part of a larger collection of documents that's part of a larger collection of documents, et cetera. And we can take information like Harmonizer, where you're really down to the one word in our five dimensions of content standards, which by the way, there's a white paper on that in the handout section of, I'm pointing. I'm pointing to my screen. In the handout section of the webinar, there is a white paper about the five dimensions of content standardization.
So the middle dimensions, we're really looking at paragraphs and components, and components are building blocks of text, or building blocks of tables, or building blocks of illustrations, whatever content you have. And this diagram is really showing, look, if we take, in this case, maybe we took a document, and we broke it out into building blocks of content, some are unique every time. A patient safety narrative has a lot of unique content for an individual patient. Some are not unique every time. The description of a contraindication maybe has some standardizable sentences. So we're looking at these blocks of content that you can use and reuse, and here we're showing you if we break the content into a repository of building blocks, we can create many different outputs with that content. So in this case, we have the package insert for the patient. We have a physician portal. We have mobile devices. There's electronic health records.
43:54
Obviously, there's the ECTD, which is currently, most ECTDs are all the PDFs of the reports included with a submission, and they're organized by, into a tree. And even that is changing and evolving to do more with reuse at a document level. So I mostly work personally in this middle level of where are our building blocks, and how do we standardize them, and what types of content do we have, and where is the unique content that would not have a lot of reuse? Because that's where we need our brilliant medical writers, and data scientists, and clinical trial writers working. We don't need them rewriting standardizable wrapper text. The end.
David Turner
I'm going to have you spend a lot more time on these. I'm going to have you spend a lot more time on those five aspects here in a second.
Regina Preciado
Yeah.
David Turner
I just slipped something into the slide deck on that, but I also got to point out here, we talked about labeling as an example earlier, and really our Harmonizer demo was related to labeling, but I think it's equally important to say reuse is not just limited to that part of the company. It could be used in other parts of the pharma company. It could be used in different parts of the drug development life cycle. So I'm thinking about like in clinical trials, right? Instead of copying and pasting pieces of your clinical trial's protocol and your SAP to make your CSR, you can basically, in that CSR, create references to the blocks of content that were important that need to be brought over, right?
Regina Preciado
Yes.
David Turner
Preserving that single source of truth, right? And then if it needs to be adjusted for a particular context, right, because in Turkey, it has to be this way, and in Mexico, it has to be that way, or for the patient, it needs to be this way, or for the doctor, it needs to be that way. We've got tools for that. We've got ways to handle that. If something needs to start with an "a" or "an," we've got ways to handle that. If it needs to be capitalized or not capitalized, we've got ways for that. And then from there, instead of copying and pasting again for your various outputs, you're using references again. And so if you need, I don't know, legal text for your website, you're not rewriting that. You're using what's already been authorized, what's already been approved. Maybe you need to create another CSR. You can take the same demographics descriptions that you used in this one, and reuse those across others without having to copy and paste, or you can put charts or tables in a presentation.
Regina Preciado
And, David, I'm going to interrupt you. I want to interrupt and say yes, you can. Because I'm pretty sure a bunch of people just thought "No, we can't," but, yes, you can to a certain extent reusable demographic descriptions in another CSR, because we've done it successfully.
David Turner
Absolutely, and it doesn't stop there. Obviously, there's the core company data sheet. There's a big push, I think, towards putting content on mobile apps and making it more personalized. We can have a whole webinar just on personalization, I think, but let's take this, and let's get back and talk a little bit more about some of the benefits of content reuse. I want to make sure that we do hit on that, and then I'm going to ask you a little bit about the five aspects here in just a second as well. So talk a little bit about these benefits, and, Chris, if you wanted to chime in on this too, you can.
Regina Preciado
I will jump in and say saying the same thing everywhere you need to say it improves the accuracy of your contents, and we talk about consistency in the customer experience. Customers in this case include the 75+ health authorities around the world, and these are familiar concepts, especially to pharma writers, because this is what has been done and continues to be evolving on the data side.
48:03
The single source of truth for the data, the digital data flow initiatives that are happening that can bring the data through consistently, so you're not saying "Oh, there was one here, but there was five there. Wait, which one is true?" So that is very much the same principles for the content, the ability to create it, and approve it one time, and reuse it. If you get pushback or a query from a health authority, or they say "Well, we'll approve it, but we want this change," the ability to go make that change, and know exactly where throughout the whole body of content that message is, so you can change it in one place and have it automatically change or not, depending on how you've set up your system if you don't want automatic changes, but at least you will know: where is this information throughout this whole body of work without having to contact every subject matter expert or every, you've got the safety folks here, and the efficacy folks here, and you don't have to go back and sort of guess. And then, of course, reducing cost to create, manage, deliver, translate also comes from being faster at putting this content together.
Chris Hill
Now, do you see one of the things I see in some other fields, and I'm not a, I haven't done big pharma projects personally, so I'm kind of bringing my background into a new world here, but one of the things that I know in some of the other worlds that I work in that are very useful is, they already have these data transmission initiatives where data gets sent instead of documents. Right?
Regina Preciado
Mm-hm.
Chris Hill
And you mentioned the move to that.
Regina Preciado
Yes.
Chris Hill
One of the things that happens, though, is you still need to be able to produce documents at times from the data, because humans want to look at the documents, or want to look at something that's coherent and not just bits of data randomly presented to them. Maybe you could talk a little bit about, do you, so this process doesn't preclude the use of documents.
Regina Preciado
Right.
Chris Hill
It really allows you to do both without any real extra effort; at least that's been my experience in these other worlds.
Regina Preciado
And in fact, one of the initiatives I've been part of a little bit, it's getting bigger, or my part is getting bigger, is the protocol digitization. And there are actually more than one group working on this cross pharma, but it's taking the protocol, which is at the beginning of your clinical trial. You write out your plan. That's a very simplistic definition of it, but I'm sure everyone on this call knows exactly what protocol is, and chunking it into those blocks of information. And when we talk about content reuse, we often think, "Oh, in this document to that document or this document," but actually these components of content can be in a repository somewhere, a database, and, yes, we use them to put together a protocol document, but we're also working on digital information exchange, where those chunks of content, and it's not just data, there's some descriptions, or there's some narratives, or it's all kind of neatly merged together, can go directly from a sponsor to a health authority and sort of sit over there.
And ultimately, eventually someday, we may not need these highly structured documents at the document level, because we're going to be getting more granular in our component basis. We may have a point where even at the health authorities, they're like "Well, we review for the safety. So just give me all the components that the sponsor sent us that are about safety and compile that, but I actually would like to read it over the weekend in my lawn chair." So, click a button, and it puts a document together for you.
52:08
Now, that's a silly example because of a lot of reasons. One of which, it's Saturday. But the technology is already there. You can already do documents on demand, compile just what you need. You can actually do this on Wikipedia, by the way, go in and choose some articles, and make a PDF for yourself. So the technology is there. There's a lot more complication around things like privacy, for example, and data integrity, for another example, that make the strategy, the plan, and the rules a lot more important with pharma content than almost any other type of content, but that data exchange, and information exchange, and components of contents, exchangeable and reusable, is a huge piece, especially as we're looking toward natural language generation, and AI delivery, and intelligent automation, and all the amazing technology that we're developing that we can use, but you gotta get your content ready for that.
Chris Hill
Yeah. Okay.
David Turner
Which gets us into: how do you structure your content for reuse, and it brings us back to the five dimensions that you talked about a minute ago.
Regina Preciado
Yes. And I know we're short on time, and we want to get to the questions. So here they are on the screen –
David Turner
Well, we've got a couple of minutes. We do have –
Regina Preciado
– are what we call the output type. That's because we needed a very common word. An output type in this case could be a document like a protocol. It could be a webpage like the clinical trial summaries that are at clinicaltrials.gov. It could be a poster that is printed out and taped on the wall at the doctor's office. So the output type is your assembly of content, what components, what data goes in, and in what order, and you publish it to a design, let's say. That's the biggest dimension.
And the smallest dimension is words, and that is your terminology. Using one term to mean one meeting makes a huge difference in the findability, usability, and comprehendability of your content. Sentences we talked about, and with sentences, you can really see in the Harmonizer report what a difference it makes, if you can standardize at the sentence level for the sentences that can be standardized, but also having standards. Writers are used to having style guides.
Your basic grammar. Your basic style decisions. Do you use the Oxford comma, also known as the serial comma, or not? And of course, yes, you do. But that's, making that consistent throughout your content makes a big difference when you're making the content building blocks smaller and smaller, so you can reuse them and digitally exchange them even without wrapping them in a document. So paragraphs, the paragraph dimension is often about voice and tone.
You want, David, you mentioned earlier, what do we say to the patient in patient education compared to what do we say to the physician or the pharmacist, compared to the health authority, compared to scientists communicating with other scientists? You may have a different set of standards for your paragraphs. And then finally, the components, which is your most powerful dimension for reuse. Most of us understand reusing the component, the building block of contents, and a lot of the tools that are out there today are built around this idea of a component being a collection of paragraphs, tables, illustrations, videos, whatever, that is reused together.
56:06
It can get very, very difficult to have a whole reuse strategy based on sentences. So we roll them up into components, and there again in the handouts, there's the "Five Dimensions of Content Standardization" white paper, which you can also get from our website, along with our white paper that's about content reuse and automation. And so at contentrules.com, you can find those to get more information about those two things.
David Turner
And we're also going to be talking about it –
Chris Hill
I do recommend that.
David Turner
– at our next webinar, which, so we've got here that we're going to be doing another webinar next month, and we've put a link to that, if you want to go ahead and register. And we also have a webinar we did last time that hits on some of this as well. So certainly are some resources. All right. We do have time for a couple of questions, and I want to make sure we hit those. And I think I can get the first question, which, somebody asked "My content collection comprises PDFs, Word, and HTML. Can Harmonizer look for duplication across my entire collection?" Absolutely. It does not have to be in the same format. We can look at Word, and PDF, and InDesign all at the same time and get that. Somebody else asked a similar question, "Can the tool also process PDFs?" Yes. So if you've got it in a textual format, then, yes. Chris, we have another question about Harmonizer that I'll throw to you. It asks about identifying tables. Can we identify tables with similar content? Any thoughts on that?
Chris Hill
Yeah. So tables are processed like anything else. Actually, we have a flag that we can optionally ignore them if you don't want the tables, but, yes, the text in tables, each cell is treated as a text block and is included in the analysis. So you will find the table data. And when we do a project for Harmonizer, we can even do some custom, some custom ingestion where we filter based on other dimensions. Those aren't strictly out of the box, but if you told me "I just want to look at tables today," that would be perfectly fine.
David Turner
Awesome. All right. Regina, I got one for you: "Is the same component content management system, or structured content management system, or whatever you're talking about, is that same system usable by the same organization in different parts to handle very different workflows and different use cases?"
Regina Preciado
Yes. All of the component content management systems, at least that I've worked with, so 15 or 20, have support for different content life cycles or different workflows. So your business process around how you create and manage this content maps to the content life cycle that's managed by the tool draft, review, approval, whatever. That was very simplified, so typically the system can have a different workflow for clinical, a workflow for labeling, and there's translation workflows, and whatever. Some tools that are newer on the market, I'm not as familiar with their workflow management capabilities, but I'm certain that if they launched with just one, their customers or potential customers said "Oh, no. We need more than that." So that's a pretty standard benefit of a component content management system, is that you can make different workflows for different types of content.
David Turner
Awesome. Well, that's brought us to the end of our time. If there are other questions, we're happy to answer those one on one.
Regina Preciado
Yeah.
David Turner
And I do just want to say thank you to our great speakers today. Thank you, guys, for being a part of this, and all the preparation you did. I also want to thank the attendees. Thank you for attending this webinar. Please know our DCL Learning Series actually comprises not just webinars, but also a monthly newsletter, our blog, and you can access this webinar and other webinars related to content structure, XML standards, and more from the on-demand section of our website, and we hope to see you at future webinars. So with that, have a great day, and this concludes today's broadcast.
Regina Preciado
Thank you, David, and Chris, and everybody.
Chris Hill
Thank you.
David Turner
Thank you.