Each time a new project is kicked off this question is on the table. Content publishers ask how much are we expected to document. Localizers ask how many new terms will be used.
Who knows these things when each project is different, deadlines and scopes change, everyone understands “new term” to mean something else, etc. And yet, there is only the need to agree on a ballpark volume and schedule. With a bit of experience and a look at some key criteria, expectations can be set for the project team.
In a Canadian study, shared by Kara Warburton at TKE in Dublin, authors found that texts contain 2-4% terms. If you take a project of 500,000 words, that would be roughly 15,000 terms. In contrast, product glossary prepared for end-customers in print or online contain 20 to 100 terms. So, the discrepancy of what could be defined and what is generally defined for end-customers is large.
A product glossary is a great start. Sometimes, even that is not available. And, yet, I hear from at least one customer that he goes to the glossary first and then navigates the documentation. Ok, that customer is my father. But juxtapose that to the remark by a translator at a panel discussion at the ATA about a recent translation project (“aha, the quality of writing tells me that this falls in the category of ‘nobody will read it anyway’”), and I am glad that someone is getting value out of documentation.
In my experience, content publishing teams are staffed and ready to define about 20% of what localizers need. Ideally, 80% of new terms are documented systematically in the centralized terminology database upfront and the other 20% of terms submitted later, on an as-needed basis. Incidentally, I define “new terms” as terms that have not been documented in the terminology database. Anything that is part of a source text of a previous version or that is part of translation memories cannot be considered managed terminology.
Here are a few key criteria that help determine the number of terms to document in a terminology database:
- Size of the project: small, medium, large, extra-large…?
- Timeline: Are there five days or five months to the deadline?
- Version: Is this version 1 or 2 of the product? Or is it 6 or 7?
- Number of terms existing in the database already: Is this the first time that terminology has been documented?
- Headcount: How many people will be documenting terms and how much time can they devote?
- Level of complexity: Are there more new features? Is the SME content higher than normal?
These criteria can serve as guidelines, so that a project teams knows whether they are aiming at documenting 50 or 500 terms upfront. If memory serves me right, we added about 2700 terms to the database for Windows Vista. 75% was documented upfront. It might be worthwhile to keep track of historic data. That enables planning for the next project. Of course, upfront documentation of terms takes planning. But answering questions later is much more time-consuming, expensive and resource-intense. Hats off to companies, such as SAP, where the localization department has the power to stop a project when not enough terms were defined upfront!