Unicode ProjectGoalThe goal of the Unicode Project is to replace the current Big5 and UTF-8 mapping tables by EACC-based One-to-One mapping tables. This will bypass the inconsistency problem produced by the vendor's software when mapping internal codes in Big5 and UTF-8. It enables the Library to adopt the use of Millennium modules and to prepare the systems in the direction for total conversion to Unicode.HistoryInnovative approached the Library in January 2003 to recommend to run a patching program on the internal codes of CJK characters in the records. After studying the issue and obtaining more information from other local sites, the Library concluded that the fundamental problem lies in the many-to-one characteristics of the internal codes and the mixed usage of CCCII and EACC. The Library decided to take this opportunity to fix the long term problems. Upon requested from the Library, Innovative sent us the mapping tables of Big5 and UTF-8 in late May.MethodologyFirstly, dubious cases are extracted from existing mapping tables and re-mapped. Secondly, revised tables are created. From the revised tables, many-to-one pairs are extracted. They are to be used for data conversion. Finally, pure CCCII are removed and duplicated entries in many-to-one pairs are consolidated. New one-to-one mapping tables are created for each of Big5 and UTF-8.Study and Findings
Conversion test runAn in-house conversion program (in Perl and MarcEdit) is developed to test run on 183398 Bib records with 880 tags.OutstandingThe many-to-one re-mapping tables used in data conversion list out re-mappings in the format "{convert this}=>{to this}". This should be easily read by III's patching program. In our test run, the in-house developed program successfully converted all the Bib records using these re-mappings.Though the conversion part could be handled by local site with some effort, it is hoped that III continues to provide a patching program as they initially proposed. This patching program should be able to understand the format described above. Then each local site can easily generate their re-mapping table for III to patch. The remaining issue need to be solved between III and the local sites is, who is going to maintain the new mapping tables once replaced, and how? Library, City University of Hong Kong. Last revised: 4 July 2003 |