Skip to content

non-latin characters (such as japanese) and Markdown preview anchor #20626

@satokaz

Description

@satokaz
  • VSCode Version: Code 1.9.1 (f9d0c68, 2017-02-08T23:31:51.320Z)
  • OS Version: Darwin x64 16.4.0

This issue is recognized as not a problem of vscode itself, but because it is used in Markdown extension.

PR: markdown-it-named-header custom slugify for non-latin characters #20628

PROBLEM:

I want to realize an in-page link by using an anchor in Markdown Preview, but it does not work with non-latin characters.
If the header is written in non-latin characters (such as Japanese), it is judged to be an invalid character, so the value of the id attribute is empty.

BACKGROUND:

Markdown-it-named-headers (mdnh) generates an id attribute from Markdown's header element and uses the slug() function provided by strings.js to convert the text to valid URL slug.

example:

# Example Header   -->   <h1 id="example-header">Example</h1>

There is a problem with non-latin characters.
If the header is written in non-latin characters (such as Japanese), it is judged to be an invalid character, so the value of the id attribute is empty.

As a result of investigation, processing of non-latin characters can not be performed correctly.
(However, it may be correct in the sense that it deletes a character string that can not be used in the URL.)

Sample Markdown:

* [test](#test)
* [さくら](#さくら)
* [さくら 桜](#さくら-桜)
* [🌸](#🌸)

## test
## さくら
## さくら 桜
## 🌸

Rendered HTML:

<li data-line="0" class="code-line"><a href="#test">test</a></li>
<li data-line="1" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89">さくら</a></li>
<li data-line="2" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89-%E6%A1%9C">さくら 桜</a></li>
<li data-line="3" class="code-line"><a href="#%F0%9F%8C%B8">🌸</a></li>
</ul>
<h2 data-line="5" class="code-line" id="test">test</h2>
<h2 data-line="6" class="code-line" id="">さくら</h2>      <--- here
<h2 data-line="7" class="code-line" id="">さくら 桜</h2>   <--- here
<h2 data-line="8" class="code-line" id="">🌸</h2>         <--- here

In Markdown extension, there is a process of creating an href link from a header when anchor (#anchor) is specified and rendering <a href=""> tag.
In this process, the non-latin characters appear to be encodeURI.
I think it is a very good process.

Markdown-it-named-headers has an option to define custom slugify.
So, when a non-latin character was written in the header, I decided to define its custom slug() function.

.use(mdnh, {
	slugify: function (header: string) {
            return encodeURI(header.trim()
 							.toLowerCase()
							.replace(/[\]\[\!\"\#\$\%\&\'\(\)\*\+\,\.\/\:\;\<\=\>\?\@\\\^\_\{\|\}\~]/g, '') //remove symbol
							.replace(/\s+/g, '-')) // Replace spaces with hyphens
							.replace(/\-+$/, ''); // Replace trailing hyphen
					}
})

HTML rendered with custom slugify applied:

<li data-line="0" class="code-line"><a href="#test">test</a></li>
<li data-line="1" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89">さくら</a></li>
<li data-line="2" class="code-line"><a href="#%E3%81%95%E3%81%8F%E3%82%89-%E6%A1%9C">さくら 桜</a></li>
<li data-line="3" class="code-line"><a href="#%F0%9F%8C%B8">🌸</a></li>
</ul>
<h2 data-line="5" class="code-line" id="test">test</h2>
<h2 data-line="6" class="code-line" id="%E3%81%95%E3%81%8F%E3%82%89">さくら</h2>
<h2 data-line="7" class="code-line" id="%E3%81%95%E3%81%8F%E3%82%89-%E6%A1%9C">さくら 桜</h2>
<h2 data-line="8" class="code-line code-active-line" id="%F0%9F%8C%B8">🌸</h2>

Screenshot:

anchor

This is the best for me. Please tell me if there is any other better way.

Metadata

Metadata

Assignees

Labels

verification-neededVerification of issue is requestedverifiedVerification succeeded

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions