A Podcast from The Content Strategy Experts — a Scriptorium podcast.
Christopher Hill is the Technical Product Manager for DCL's Harmonizer software. Chris recently sat down with The Content Strategy Experts Podcast and spoke with Scriptorium's COO Alan Pringle. They chatted about content reuse in learning content and there are so many kernels of wisdom in this conversation. Have a listen!
Transcript
Alan Pringle: Welcome to the Content Strategy Experts Podcast brought to you by Scriptorium. Since 1997, Scriptorium has helped companies manage, structure, organize, and distribute content in an efficient way. In this episode, we talk with guest Chris Hill of DCL about learning content and where you can find redundant duplicated content, what causes it, and how a reuse strategy can eliminate that duplication. Hey everyone, I am Alan Pringle and we have a guest here today, Chris Hill of DCL. Hey Chris, how are you doing?
Chris Hill: Doing well, thank you, Alan. It's nice talking to you.
AP: Great, yes as always. Chris, tell folks out there a little bit about yourself, DCL, and your role there if you would.
CH: Sure. DCL stands for Data Conversion Laboratory. And so we got our start doing data conversion, which is moving content between formats. And over the last, let's see, that started in the 80s, if you can imagine a tech company starting in the 80s.
AP: Yes, I can. I am of an age, yes.
CH: So since then, we've expanded out into lots of areas, but basically any kind of content transformation, workflows, content enrichment, all sorts of activities around content. So that's our key theme. I joined DCL about four years ago, and I've actually been in the content management space for a good more than 20 years now, and have a lot of experience with both migrating from, you know, using tools like Word and such, and then moving into a content management system. I actually managed, product managed a content management system and then got into conversion. And as part of my job here, I oversee a product called Harmonizer, which is our tool for doing content analysis and specifically reuse analysis to find places where content is redundant, duplicated, and help users figure out what they need to do to improve that situation.
AP: Well, in this conversation today, I think we're going to tap into all the wisdom that you bring to the table with your background and content and your experience at DCL identifying reuse. And let's start with just the concept of redundant content. And there are lots of ways to describe this. And I've heard it referred to several different ways. Redundant content, duplicated content, overlapping content. If you would kind of give people a bird's eye view of what we're talking about here.
CH: So we're really talking about any place where you've got similar or exactly the same content reproduced. And you usually know you're doing this because anytime you hit that Control C and Control V or choose the copy paste menu, if you're a menu person, anytime you're doing that, you're creating redundant content. And, you know, it's usually the easiest way to get it done if you're working in a tool like Microsoft Word or a word processor or a desktop publishing environment or something like that. Generally, you copy stuff from one document to another. And that can be fine for version one of those documents. Where you start to really run into trouble is when you need to make version two or you discover a problem with version one. So if I'm making some marketing materials, maybe I need to use some information from the actual engineering team or from the manuals for whatever product I'm marketing. I might just copy that engineering data or whatever information over and put it into my marketing materials. And then when we go to produce our training for that particular product, we might say, okay, I need that stuff. I'm gonna copy that from wherever I can find it, which might be their marketing or it might be engineering depending on where I look and who I know better or which repository is easier for me to get to. And the problem with that is that if anybody's made any edits along the way, they have to ensure that those edits are propagated through all these departments. And that doesn't always happen.
AP: It usually does not happen. You're being kind.
CH: Yes, I am. So when the engineers find out, oops, we made an error here in the technical manual, we better fix that or somebody is gonna do the wrong procedure or come out with a bad result, they might fix their manual, but. Are they aware that there's all this marketing material with that stuff in it? Are they aware that the education team actually copied the stuff from marketing? They may not have even talked to the engineers to tell them they were using that content. And so what happens is you pretty soon have a sort of information entropy where things start to go fall apart and the information gets out of sync and you may have inaccurate information through various departments that no one can really trace. So that's kind of where I see this grow and it's pretty much a natural feature of using computers and the more traditional desktop application approach to creating content.
AP: Absolutely, and you've really kind of covered the big picture here really well. And what I want to do is kind of move just a little bit away from that and talk now about, especially for people in the learning and training space, where they might see some of that content overlap. And you've kind of touched on one. Anytime that you have a new version of the product or service that you are creating content for, it's very common to just copy and paste the previous version, create a new file set, and then make your edits and updates in there. There's one scenario right there where you have used copy and paste and there's a very good chance there's a lot of overlapping information between those two versions that probably really should be maybe one set of files instead of duplicated content. So that's one example I can think of immediately off the top of my head. Where are some other places where learning and training people might see this duplication of content?
CH: Yeah, so, well, the copy and paste happens for a lot of different reasons. Sometimes you'll have product diagrams that somebody has or engineering schematics or something like that that need to be part of multiple divisions. And you'll see that stuff sort of get copied around, if you will. I think you make a very good point about new versions of the product or because even, even in the case where you wrote perfect content the first time, if you ever could do that. And I wrote it perfectly well for the current release of the product. When the product is upgraded or changed in some way or a new revision is released. When you make those changes, that doesn't mean all the old product disappears. People are still accessing that older content.
And if you start to find issues in the older content that get addressed maybe through your user support, is that getting pushed up to the newer stuff? Because if the newer stuff did what you described, which is I copied it, I may not even know that that's now inaccurate in the new release of the manual. Or vice versa, it could be someone using the new product who identifies a problem with our documentation, and we go back and we neglect to fix the old ones, well then all the users of the older product are going to run into that issue sooner or later.
AP: Yeah, and then you've got this whole layer too. What if you were delivering to all of these different delivery targets, different delivery formats, you're using Microsoft Word over here to create perhaps more study guides or scripts or something like that. Then you're also using PowerPoint over here to create slides. You are copying and pasting perhaps into some kind of software. That will help you with simulations or more audio video kinds of things. So in addition to what you and I've just talked about with the different versions, how those are out of sync, if you were copying and pasting content into all of these different tools that create these different delivery types, then this problem is multiplying rapidly because you're gonna have to go in and touch all of that source for all of those different delivery targets; your Word files, your PowerPoint files, your Articulate content, whatever else. So it kind of can explode pretty quickly in your face.
CH: It sure does. And we haven't even touched on if you're in different countries translating to different languages, what do you do about all the translated content? And that can quickly overwhelm you as well.
AP: Exactly. Yeah, so basically this problem becomes exponential, both from say, the versions of your product to service, across the different delivery targets that you're dealing with. And then if you have to localize that content, all of the, shall we say, bad behaviors that are in your, or let's call them inefficient behaviors, that's less judgmental.
CH: There you are.
AP: Yeah, less judgy. These inefficient, behaviors are then duplicated in every single language that you crank out. So yeah, it's it's it is it is very ripe for inefficiency. It's very ripe for errors because it is unfair to expect a human being to go through and keep track of all of this. So things go sideways and you even touched on something else a little earlier.
CH: For sure, yes.
AP: And then in some cases, you are pulling content from other departments, other content creating groups. And then that's another layer of this exponential explosion where if you've changed something and someone quote, borrowed that from your group, are you sure they're gonna know that you changed that? Or you fixed it when they copied and pasted it into their version? And then think about the poor end users, the content consumers who were getting this information they're probably not getting a consistent picture at all about what you're talking about because of all this copy and paste all over the place. It's a mess. Yeah. So let's go on and try and put a more positive spin on this mess and start talking about the process for identifying this duplicated content.
CH: That's good.
AP: So what can people do to start kind of taking the pulse of this problem?
CH: I think a lot of it depends on your resources and your organization's commitment, but there's always things you can do, whether they're smaller efforts or larger efforts. So, you know, at the very BMW view or let's say Cadillac view, if you're from the 80s like me. You would probably have a huge budget to be able to implement a whole new set of tools and workflows that allowed you to use all sorts of technologies to do what's called single-source publishing. And that's where you author in a format neutral format. And then you take those pieces and really you're creating sort of Legos of content, you could imagine.
AP: I call them puzzle pieces, so yeah. Yep.
CH: There you go, puzzle pieces, Legos. And they fit together in lots of different ways. You can put them together for training. You can put the little pieces together for your user manuals. You can put some of the pieces together for your marketing materials. But the key is that you're using the same piece in all of those places. And what these sort of advanced tools allow you to do is keep track of all those pieces. Use those pieces in all those multiple places and then still create your deliverables out of those pieces. So instead of authoring directly in PowerPoint when you're writing a course or writing in Word when you're writing a manual or maybe working on HTML when you're creating your website instead of creating content in those sort of single-use formats you create your content in a neutral format and then you have it output to those formats.
AP: Exactly.
CH: And so you still can deliver those end formats that you need to actually put out in the world, but you're doing it from a single source of truth. And that's that single content repository. Now that's the ideal, that's the perfect one.
AP: Yeah, and you're not, and you are not going to get to what you just described overnight. You are not going to snap your fingers and that's going to happen. So yeah, I think you're headed kind of where my brain was. And that is you can start small with this and you don't even have to think about tools. You can start very small and start thinking about, you know, where is this duplicated information? Just trying to ferret it out. And one way you can do that, you as a content creator have a very good idea of what is in your set of training content. You're the people who are creating it. You know where the bodies are buried, where things are going wrong, where you've noticed that there's this duplication. You could also work with a consultant like me who has been doing this kind of stuff for years and can help you by asking the right questions and maybe trigger some things in your brain. Oh yeah, I didn't think about that. But there is also technology out there like your Harmonizer tool that can help people start to identify that reuse. And I think it's worth noting it doesn't have to be things that are exactly the same. Your tool can help find things that are fuzzy matches that are sort of the same because that's equally valuable as well. And I want you to talk a little bit about how that process works because I think that's important.
CH: Sure. So the tool we developed, which was actually kind of a companion to our conversion work, is we had the same problems. People come to us and bring those content reuse problems, and they would ask us if we could help them in some way, because when they're converting content, even if they're going to move to some neutral format or they're just moving from, say, Word to FrameMaker or FrameMaker to something else. That was a lot of the work we were doing, but they would bring us lots of duplicated content. And sometimes at that conversion stage is a good time to nip that a little bit or make some headway against those duplications. So we developed Harmonizer as a tool that was very format neutral. It just basically extracts all the text from whatever content you have and puts those into blocks and then it compares every single text block to every other text block and it'll tell you which ones are all the same, which ones are close and that close can be pretty far apart actually. So I could do things like if I had a sentence and I told you when you go to the store pick up some milk and then somewhere else I tell you to pick up some milk when you are at the store. Those aren't exactly the same. And in fact, if you do a word-by-word analysis, they're completely different. But if you do a harmonizer style analysis, we use some linguistic algorithms to be able to tell that linguistically, those are essentially the same thing, or at least very close in what they're describing, even though the words and the letters are all in a different order.
AP: It's the intent of that sentence, basically. Yeah. Yeah.
CH: Very much, yeah. So we detect that as well and put that into groups. So then you can look and you can say, okay, I've got this block of text. It says this. Here's all the places Harmonizer will highlight where they're different, sort of like a diff tool so that you can see, oh, I use the word or here and I use the word and in this other place. Or maybe I used one version of our product name, in some of the content and I am using a different version of the product name in another part of the content. Or maybe I'm comparing two products and their manuals are 75 % the same content just every now and then the product name is mentioned and that has to be different. All of those things can really illuminate why you have duplication. It can also help you find those places where maybe you've made corrections in one place and haven't got to those other places because you might see, oh, this paragraph is the same except we added a warning at the bottom, do not do something. We better tell everyone else that warning in all the other formats that we've created. So that's kind of what Harmonizer does. It's not a magic bullet. It gives you a very large, well, if you have a lot of content, it'll give you a large report if you've got a modest amount, you'll get a modestly sized report. It'll scale to whatever amount of content you want to feed it. What we do is we use it very strategically. And for instance, we can use it to identify just why you have maybe close but not matching content. So maybe you're using inconsistent wording in different places. We can identify maybe if you have already some standard content, we can identify if there are places where maybe it varies in ways you didn't expect. So you can check your standard content libraries if you need to. There's all kinds of ways it can be used, but at its core, it's again, just giving you those matches and helping you see, really shining a light on where your content is as far as redundancy.
AP: And one thing point I want to make here is it really doesn't matter what tools you're using to create content. This work you can do is not dependent necessarily on those tools. Like I said, you yourself can kind of do a self-service thing where you start to think more deliberately about where you think content is. You can work with a consultant who can help you figure this out. You can use a tool like Harmonizer to help dive deeper and really find this content. So there are all these layers that you can do. And the first layer is you can start thinking about that yourself. So there's a lot of options there. So once you have started to identify this duplicated content through whatever those methods are that we just talked about, it's time to get into a reuse strategy. And you've already touched on this really well. The core of that reuse strategy is you have a single source of truth for every piece of content, every piece of information, there is one version, one format-neutral version that you can then pull into all your different delivery targets and all your different types of content. So that's kind of the core of that. Once you know where that duplication is, you can start coming up with this more formal reuse strategy. And I think you also pointed out to the copy and paste that is like the morning light going off, you've got duplicated content. There's copying and pasting going on. That's what you want to try to eliminate with the single source of truth. Give people a little idea of the benefit of the single source of truth. And I'm talking about both for content creators and for the content consumers because it falls on both sides, the benefit of that single source of truth.
CH: For sure it does. Content creators know this. We've already touched on when there's a problem found or a change needed in the documentation. Maybe the product's changing or was updated. If it's software, who knows? Maybe we've added a new menu item. So we need to add that to the documentation. Well, if we've got a single source for everything and everyone draws from that source, we update that source and it will flow out into all the other channels without any real effort. Now, you can simulate this with your copy-paste activities, but you have to really formalize how you do copy-paste. So you've got to only copy from say the source of truth, not from each other or something like that as a starting point. If you can't actually implement true, single source tool chain. Another area though where this really impacts is on the quality of the content you're delivering to your readers and your consumers. We all know and I deal with this all of the time and this should be make everyone feel a little bit better that even a giant company like Microsoft has this problem. I work in SharePoint quite a lot. And SharePoint has a lot of different versions. It's been around forever. One of the biggest challenges I have is when I go look for answers, and this isn't to pick on Microsoft, by the way. Every software company, you could probably find some of this.
AP: Absolutely.
CH: But I go looking in the content for something, I'll read it one way in one place. I'll read something a little bit different about the same feature in another place and Sometimes they're just describing them in two different ways Because maybe one was written by the engineering team and one was written by the marketing department Maybe another version was written by the training department. So that's going to happen. But then I also run into a lot of places where it's not easy to tell when this stuff was even created. So it might be very old stuff that I'm looking at that no longer is even applicable. All these issues become simplified if you're doing that single source of truth because you can start tying together a strategy to deal with that. When you just publish stuff out there and it all gets sort of thrown out in a fire hose to your consumers, that can become a very big challenge for them when there are all these inconsistencies in different language styles or different ways of writing the information.
AP: And if people are using all the different content that's available out on your website to make a purchasing decision, and it doesn't just have to be the marketing content, they can be looking at the product content. They can be looking at the publicly available training content. If they are getting mixed messages, different information that should be talking basically about the same thing, that can be a huge turnoff and it can hurt you financially because people will be like, I'm not comfortable buying this product or service because I'm getting mixed messages in the content that's available out here. The bottom line is people don't care what department or what your organization is like, what your hierarchy is, what your tree is, whatever you want to call it for your different departments and your management. They don't care about that. They just want a consistent message, consistent information and they want to get it from wherever they find it and they want to be sure that it's the same message they get regardless of what quote, department's content that they're touching.
CH: Yeah, when I'm working with your product, your product is really what I, how I see you. I see you through the product. I don't see you through your departments and your channels and whatever organizational structure you've created to manage your company. So I think that's a really important point you make to really make sure that that product experience is consistent and clean. And doing this, you know, even if all you're doing is just trying to make things more consistent than addressing the redundancy issue can help just in ensuring that we're presenting that unified view of our product to the world.
AP: This conversation really has probably given people a lot of food for thought. There's a lot to think about when we're talking about this duplicated content, redundant content. If we want to kind of back up a little bit and give people maybe one or two pieces of advice on where to get started, even if it's starting small, what are some things that people can start to do now to start thinking about this bigger picture of duplication, reuse, single source of truth? Any recommendations there?
CH: For sure. So the first thing is you've got to tame a little bit of your Wild West if you've got that of content. So if I can just go on the corporate network and start willy-nilly looking around and copying and pasting stuff out of anywhere I can find it, which is sometimes the case, that's probably a big area where you're creating a lot of content entropy. So you need to think about that. And it may just be even a training issue. It may be a network organization issue. But you should start considering how you can make the authoritative repository accessible to everyone and then limit where they're getting stuff to that authoritative repository. You don't have to implement a whole new content management system and tool chain to do that. You can do that using permissions, using training, and having regular contact between the groups that create the content that's getting copied around. So making sure there's some interface between them so that they can coordinate and know that they need to coordinate these content activities. A lot of times that simple piece just gets overlooked because a lot of companies really treat content as kind of the afterthought. I've built the product, okay, hurry up, make a manual, do some training, do whatever, because we're product-focused. And so it's kind of natural, but you really need to see your content as an integral part of that product that you're delivering so you can get started with just using the tools that you have and working on the processes and the consistency with which you apply those tools. You can also start strategizing for the future. So you can look at, okay, maybe we need to figure out, first of all, how much money is it costing us to copy and paste a lot? Again, knowing how much duplication you have, and then you can put estimates on, okay, if I've got all this amount of duplication, how much does it cost to make a change to this manual if we are to ensure that it gets to all the delivery channels, including marketing, training, all the languages, all the manuals? How much does a change cost? Once you start quantifying that, you might find out there's a better budget than you think for working on this problem.
Again, you're going to have to look at your content itself and figure out how much redundancy there is. So planning that strategy, figuring out how much it's costing you, all of that can be very helpful, I think. And then ongoing maintenance. How are we going to maintain it? I've been to a lot of organizations where they'll do a big push to clean things up and they'll say, okay, we're going to hire someone, we're gonna get some new tools going and man, we've fixed it, right? And so they fix it. And then three years later, they're in the same boat they were in because they didn't really follow up on that. They didn't plan to maintain the content. Nobody was charged with the duty to ensure that we were adhering to the strategies that the tools were providing and nobody really had the responsibility to look at that stuff. So if you're going to make the investment, you have to also have the follow-through. And a lot of times that involves consultants, because let's be honest, if this is the first time I ever do this, I'm not going to do it very well. And I usually don't get a chance to do this 100 times. I'm not going to do this over and over in my organization.
AP: Yeah.
CH: But if you find a consultant, you can find someone that's done this 100 times for a lot of different organizations. And they already know where all the pitfalls are and where all the trouble is. And they'll help steer you in the right direction the first time. Because you don't get a lot of bites at this apple. Like, your company's not going to say, oh, just keep working on content reuse for the rest of time. They're going to want to see some progress.
AP: Exactly. Yeah, and it comes down to return on investment. You do not do these kinds of things for fun. You're doing them for business reasons, and business reasons include making money and getting a return on investment on any kind of investment in technology, and that includes content technology.
CH: Absolutely.
AP: Chris, this has been very helpful. I think this is a good place to wrap up. Thank you so much for your insights. I think you've given people a whole lot to think about.
CH: Well, I appreciate the conversation.
AP: Thank you for listening to the Content Strategy Experts Podcast brought to you by Scriptorium. For more information, visit Scriptorium.com or check the show notes for relevant links.
Comments