Skip to content

fix(qwen-mt): support zh-CN and region-qualified language codes in lang_mapping#1120

Open
octo-patch wants to merge 1 commit into
PDFMathTranslate:mainfrom
octo-patch:fix/qwen-mt-lang-mapping-zh-cn
Open

fix(qwen-mt): support zh-CN and region-qualified language codes in lang_mapping#1120
octo-patch wants to merge 1 commit into
PDFMathTranslate:mainfrom
octo-patch:fix/qwen-mt-lang-mapping-zh-cn

Conversation

@octo-patch
Copy link
Copy Markdown
Contributor

Fixes #951

Problem

QwenMtTranslator.lang_mapping() raised KeyError: 'zh-CN' when users passed region-qualified language codes like zh-CN via the CLI:

pdf2zh paper.pdf -s qwen-mt -lo zh-CN

The root cause is that langdict only contained "zh" as the Simplified Chinese key. When BaseTranslator.__init__ does not normalise the code (since QwenMtTranslator has no lang_map class variable), self.lang_out becomes "zh-CN", and the strict langdict[input_lang] lookup fails.

Solution

  1. Add "zh-CN" explicitly to langdict so the most common Simplified Chinese code is recognised directly.
  2. Add a prefix-based fallback: if an exact match is not found, try matching on the BCP-47 primary subtag (e.g. "zh-SG""zh""Chinese"). This future-proofs the method against other region variants without requiring exhaustive enumeration.
  3. Replace the bare KeyError with an informative message that lists supported codes, making debugging easier for users who pass unsupported languages.

Testing

Verified by unit-level inspection that:

  • lang_mapping("zh-CN") now returns "Chinese"
  • lang_mapping("zh") still returns "Chinese"
  • lang_mapping("zh-TW") still returns "Chinese"
  • lang_mapping("en-US") returns "English" via the prefix fallback
  • lang_mapping("xx") raises a KeyError with a helpful message

…ng_mapping

The lang_mapping method in QwenMtTranslator raised a KeyError when region-
qualified language codes like "zh-CN" were passed via the CLI (e.g.
`pdf2zh file.pdf -s qwen-mt -lo zh-CN`). The langdict only had "zh" as a
key, so any variant with a region subtag failed.

Changes:
- Add "zh-CN" explicitly to langdict so the most common Simplified Chinese
  code works out of the box
- Add a prefix-based fallback so other region-qualified codes (e.g. "zh-SG",
  "en-US") resolve gracefully through their base language code
- Replace the silent KeyError with an informative error message listing the
  supported codes

Fixes PDFMathTranslate#951
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: qwenmt does not work properly in 2.0

1 participant