Add cjk_friendly_emphasis extension for CJK underscore emphasis by sotanengel · Pull Request #1599 · Python-Markdown/markdown

sotanengel · 2026-04-18T20:47:55Z

Summary

Adds a new cjk_friendly_emphasis extension that enables underscore emphasis (_em_, __strong__) to work correctly adjacent to CJK (Chinese, Japanese, Korean) characters.

Motivation

Python-Markdown's underscore emphasis patterns use \w word boundaries (via (?<!\w) and (?!\w)) to prevent intraword emphasis in ASCII text like foo_bar_baz. However, CJK characters are classified as \w in Python 3's regex engine, which means underscore emphasis fails when directly adjacent to CJK text:

>>> markdown.markdown('これは__重要__です')
'<p>これは__重要__です</p>'  # Expected: <p>これは<strong>重要</strong>です</p>

Note: Asterisk emphasis (*/**) already works with CJK text because it has no word-boundary check. This extension only needs to fix underscore behavior.

Background

This is part of a broader effort to improve CJK emphasis handling across Markdown implementations. The root issue is documented in commonmark-spec#650. The markdown-cjk-friendly project provides a formal specification and implementations for CommonMark-based parsers.

While Python-Markdown follows Gruber's original Markdown rather than CommonMark, the CJK underscore emphasis issue is the same fundamental problem: word-boundary assumptions designed for space-separated languages fail for CJK text.

Changes

New file: markdown/extensions/cjk_friendly_emphasis.py

Defines CJK-aware boundary patterns: (?:(?<!\w)|(?<=CJK_CHAR)) instead of (?<!\w)
Creates CJKUnderscoreProcessor that overrides UnderscoreProcessor with CJK-friendly regex patterns
CJK character class covers: CJK Unified Ideographs, Hiragana, Katakana, Hangul Syllables, fullwidth forms, and related blocks
Follows the same pattern as legacy_em.py for extension structure

New file: tests/test_syntax/extensions/test_cjk_friendly_emphasis.py

14 test cases covering:

Japanese: __重要__, __「異常」__, __重要。__
Chinese: __强调__
Korean: __강조__
Mixed CJK/Latin
ASCII intraword protection preserved (foo_bar_baz, foo__bar__baz)
Asterisk emphasis unchanged
Without-extension baseline verification

Usage

import markdown
html = markdown.markdown('これは__重要__です', extensions=['cjk_friendly_emphasis'])
# '<p>これは<strong>重要</strong>です</p>'

Design decisions

Extension, not core change — opt-in via extensions=['cjk_friendly_emphasis'], no change to default behavior
Underscore only — asterisk emphasis already works with CJK; only _/__ need fixing
ASCII protection preserved — foo_bar_baz remains unaffected because the boundary relaxation only applies to CJK characters
Follows legacy_em.py pattern — minimal code, subclasses UnderscoreProcessor, same registration mechanism

Test plan

All 14 CJK-specific tests pass
All 1099 existing tests pass (0 failures, 110 skipped as before)
ASCII intraword underscore protection unchanged

🤖 Generated with Claude Code

Python-Markdown's underscore emphasis (`_em_`, `__strong__`) uses `\w` word boundaries which fail with CJK text because CJK characters match `\w` in Python 3, preventing emphasis adjacent to CJK characters. This extension relaxes the boundary check so CJK characters are treated as valid emphasis boundaries while preserving ASCII intraword protection (e.g., `foo_bar_baz` remains unaffected). Usage: `markdown.markdown(text, extensions=['cjk_friendly_emphasis'])` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sotanengel mentioned this pull request Apr 18, 2026

Emphasis with CJK punctuation commonmark/commonmark-spec#650

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cjk_friendly_emphasis extension for CJK underscore emphasis#1599

Add cjk_friendly_emphasis extension for CJK underscore emphasis#1599
sotanengel wants to merge 1 commit intoPython-Markdown:masterfrom
sotanengel:feat/cjk-friendly-emphasis

sotanengel commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sotanengel commented Apr 18, 2026

Summary

Motivation

Background

Changes

Usage

Design decisions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant