Abstract
An approach to mining repositories of web-based user documentation for patterns of evolutionary change in the context of internationalization and localization is presented. Localized web documents that are frequently co-changed (i.e., an evolutionary dependency) during the natural language translation process are uncovered to support the future evolution of the system. A sequential-pattern mining technique is used to uncover patterns from version histories. Characteristics of the uncovered patterns such as size, frequency, and occurrence within a single natural language or across multiple languages are discussed. Such patterns help provide an insight into the effort required in retranslation due to a change in the documentation. The approach is validated on the open source K Desktop Environment (KDE) system. KDE maintains documentation for over 50 different natural languages and presents a prime example of the problem. The technique accurately predicts which documents in KDE are retranslated or updated in future versions.