To ensure the highest quality and efficiency in a translation, proper file preparation is crucial and should not be overlooked. PDF to Word conversions can contain various formatting issues and unnecessary elements such as excessive paces, segmentation issues, erroneous characters etc. These elements need to be eliminated or adjusted before initiating the translation process to prevent any issues that could compromise the accuracy and fluency of the translated text. If the document contains images or non-editable fields, these elements need to be prepared as well to make all the text accessible for translation. Furthermore, properly preparing your documents before translation will significantly facilitate the post-translation DTP process, especially if the document is translated into multiple languages.
To address these challenges, we have prepared a comprehensive guide to help you identify key considerations when preparing your Word files for translation. This guide covers essential aspects you need to be aware of after converting from PDF to Word. By following these guidelines, you can ensure your document is in the best conditions for translation.
Displaying hidden characters
Before starting the file preparation process, it's important that paragraph marks and other hidden characters are visible. These are enabled by clicking this icon:
Apply correct language to the document
Ensure that the correct language settings are applied to the document so that spelling and other proofing tools work accurately.
When preparing your document for translation, keep an eye out for the following things:
Misplaced paragraph marks
Paragraph marks can appear in unsuitable places, disrupting the natural flow of the text by splitting sentences. It is essential to identify and correct these misplaced paragraph marks to ensure the continuity and coherence of the document.
Soft returns
Soft returns are used to control line breaks within a paragraph moving the cursor to the next line without starting a new paragraph. While they dont completely split sentences like the paragraph marks, they can sometimes confuse while working in the CAT-tool. Therefore, it's recommended to avoid soft returns unless absolutely necessary.
Tabs
Remove any tabs that appear in places where they disrupt the natural text flow.
- Tabs used to create indents:
- Tabs in the middle of the text:
- Tabs used to create tables:
This can happen frequently after automatic conversions. When an OCR tool attempts to replicate a table's layout, it often uses tabs to build the table structure. If you encounter tables created with tabs, it's advisable to reformat them into a standard table format to ensure proper text alignment and avoid complications during and after translation.
Manual hyphenation
Manual hyphenation in Word are hyphens that are inserted "manually" within words at the end of lines and end up splitting the word.
They should be replaced with automatic hyphenation that does not segment the word:
Section breaks
Avoid Section breaks at the end of the page (unless its use is strictly necessary for layout purposes) and replace them with Page breaks instead. Avoiding Section breaks helps ensure a smoother translation process and more consistent formatting in the final translated document.
Text control
Carrying out a proper text control while preparing the document is also an important step, especially if the source PDF is scanned or of poor quality. Here is what to watch out for when reviewing the text after a conversion:
Make sure styles are correctly applied
Ensure that the correct styles are consistently applied to the text to avoid numerous tags in the CAT tool. If multiple fonts are applied to a paragraph that should use only one, it can lead to many unnecessary and disruptive tags during translation. Be sure to watch out for this!
Multiple spacings
Excessive spacings can disrupt the document’s readability and layout. It's important to carefully review and adjust the spacing in the converted Word file to ensure a clean and professional appearance. Removing unnecessary spaces and correcting formatting inconsistencies will help maintain the document's intended structure and improve overall clarity.
No spacing between words
It can happen sometimes that the spaces between words are not detected in the conversion, resulting in text where words are joined together without proper separation. This can occur due to various factors such as poor-quality scans, unclear fonts, or the presence of noise in the original document.
Space between characters
This is another example of spacing issue to watch out for.
Erroneous characters
If the PDF being converted is scanned or of very poor quality, it may result in various corrupted characters during the conversion. These sections will need to be manually typed.
Image Text Preparation
The document might contain images with non-editable text that you may want to localize. There are several methods available for preparing image text for translation, check them out here!