Custom Internationalization (i18n) with Jekyll

One blog and two languages: bringing a better reading experience to the readers.

This article is translated with www.DeepL.com/Translator (free version) and reviewed by Mincong.

Introduction

During my visit to my family back home in April this year, I came across a lot of technical content in Chinese when I was looking for information, but I also found that not many of them have good quality. So I started to write Chinese articles in my blog, hoping to contribute to the Chinese developer community. More concretely, I wrote in two languages: Chinese and English. But in practice, I found that it is not a good experience for reader to see two languages in one blog.

After I started writing in Chinese, several times my colleagues found my popular English articles in Google Search and started reading them. This thing made me embarrassed, because since April, my articles are all in Chinese. Let me think about it from their point of view: what happens if after reading, they are curious to read more articles? They may click on the home page and then surprised to see a bunch of Chinese blogs: they may feel confused and feel like they are in the wrong place. For someone who doesn’t know another language, the experience can be very bad. The opposite also holds true: when a Chinese friend sees my blog and sees a bunch of English articles, it feels hard to get interested in reading them. If the articles are written in their native language, it will be much more user-friendly.

That’s why I want to do internationalization: I want to provide a comfortable reading experience for every reader. I want to have a clear distinction between the different languages in the blog, so that when people visit, they can read the content in the language they are familiar with, no matter which page they click on. Then the blog itself can also provide options for people to switch to another language.

This post will share with you the internationalization of my blog.

Proposals

There are serveral proposals for internationalization, and I’ll discuss their feasibility below.

  1. Provide translation feature. Embed translation button in the article and use third-party translators (Bing, Google, DeepL, etc.) to translate when user clicks the translation button.
  2. Interlink English and Chinese articles. Embed a link to English article in Chinese article and another link to Chinese article in English article.
  3. Introduce the concept of page key. Chinese and English articles share the same page key.
  4. Use two collections: posts and cn.

Final plan: Use two collections.

Proposal 1: Provide Translation Feature

Embed a translation button inside the article and use a third-party translator (Bing, Google, DeepL, etc.) to translate when the translation button is clicked. The rationale is that my blog is not very visited, with about 18,000 visitors per month. And I am not a professional writer, purely writing for fun and not making money. There is no need to be so serious. This feature is inspired by Chrome’s Translate button, which allows you to translate a page in a non-frequently used language by clicking the Translate button in the URL input field, or by right-clicking on the page content.

The advantages of this proposal are:

  • It’s easy to implement

The disadvantages of this proposal are:

  • There are no two articles, there is only one.
  • No correction for translation results
  • Without articles, we can’t attract readers through articles

Proposal 2: Interlinking English and Chinese Posts

Add a link to the corresponding article at the beginning of each article in order to switch languages. That is, embed a link to an English article in a Chinese article and a link to a Chinese article in an English article. On the web page, add a button or an icon to achieve language switching. This way, readers can access another language version of the article by clicking this button or this icon while reading.

For example, for the article “Implementing backward-compatible schema changes in MongoDB”, the switch between the English and Chinese versions of the article can be implemented in the following form.

English version.

  ---
  layout: post
  title: Making Backward-Compatible Schema Changes in MongoDB
  date: 2021-02-27 17:07:27 +0100
  categories: [java-serialization, reliability]
  tags: [java, mongodb, serialization, jackson, reliability]
  comments: true
+ lang: en
+ version:
+ en-CN: 2021-04-30-mongodb-schema-compatibility.md
  ...
  ---

Chinese version.

  ---
  layout: post
  title: Is it really that easy to add and delete fields in MongoDB?
  date: 2021-04-30 23:09:38 +0800
  categories: [java-serialization, reliability]
  tags: [java, mongodb, serialization, jackson, reliability]
  comments: true
+ lang: zh
+ version:
+ en-US: 2021-02-27-mongodb-schema-compatibility.md
  ...
  ---

The advantages of this proposal are:

  • The links to existing articles remain unchanged and do not affect SEO

The disadvantages of this proposal are:

  • It is not possible to know the link of another version of the other article through the article link.
  • If you change the link to the article, you should remember to change the referer, i.e. in the other language page.

Proposal 3: Shared Page Key between Chinese and English

Introduce the concept of page key in each article. When a user accesses the article, the page URL contains both the language and the page key. More precisely, it follows the following expression.

https://mincong.io/{lang}/{page-key}
https://mincong.io/en/mongodb-schema-compability
https://mincong.io/cn/mongodb-schema-compability

English version (ideal state).

  ---
  layout: post
  title: Making Backward-Compatible Schema Changes in MongoDB
  date: 2021-02-27 17:07:27 +0100
  categories: [java-serialization, reliability]
  tags: [java, mongodb, serialization, jackson, reliability]
  comments: true
+ key: mongodb-schema-compatibility
+ lang: en
  ...
  ---

Chinese version (ideal state).

  ---
  layout: post
  title: Is it really that easy to add and delete fields in MongoDB?
  date: 2021-04-30 23:09:38 +0800
  categories: [java-serialization, reliability]
  tags: [java, mongodb, serialization, jackson, reliability]
  comments: true
+ key: mongodb-schema-compatibility
+ lang: zh
  ...
  ---

The advantage of this proposal are:

  • The article link contains the language. So you can also know the link of the other version by naming convension.
  • Hide date from URL
  • Thanks to the permalink pattern, you can write two articles a day (Chinese + English), which was not possible before when using date + title as URL.

The disadvantages of this proposal are:

  • May affect SEO

Originally this was my perferred solution. Unfortunately it is not possible to implement. Because not all variables are available as part of Jekyll’s permalink. For example, Jekyll does not support custom variable lang as part of a link. See the official documentation Permalinks for variables supported by Permalinks.

Proposal 4: Use Two Collections

The first collection is the default posts and the second collection is cn.

The advantage of this is the same as proposal 3:

  • The article link contains the language. So you can also know the link of the other version by naming convension.
  • Hide date from URL
  • Thanks to the permalink pattern, you can write two articles a day (Chinese + English), which was not possible before when using date + title as URL.

The downside of this are:

  • The default plugin jekyll-paginate only supports pagination for the default collection posts. If you need to paginate another collection, you need to use the jekyll-paginate-v2 plugin. However, GitHub Pages does not officially support the jekyll-paginate-v2 plugin.

Other Considerations

  • Whether the theme you are currently using has support for internationalization? For example, I use Jekyll TeXt Theme, which has some support for internationalization itself. The information in the header and footer of the browsing page can be automatically adjusted according to the language of the page. However, it does not translation for the page content directly.
  • If you’re using GitHub Pages, consider whether GitHub Pages has support for the plugins that you use. Only some of the Jekyll plugins are officially supported by GitHub, and others won’t work even if you install them. This will affect you unless you don’t use the official site generation, you can generate pages locally yourself or generate them from your custom CI pipeline.
  • Consider using another Jekyll internationalization plugin, such as jekyll-multiple-languages-plugin. I didn’t look into it at the time I wrote the proposals, and only found out about this plugin after the project was done… But this plugin is not supported by GitHub Pages neither.

Other Websites

How do other blogs work? Is there anything we can learn from them?

Elasticsearch Blog

Elastic’s blog is internationalized, and each article is available in multiple languages, such as the following article: How to design a scalable Elasticsearch data storage architecture.

Elasticsearch blog i18n

It is listed below in three languages.

Language Links
How to design your Elasticsearch data storage architecture for scale https://www.elastic.co/blog/how-to-design-your- elasticsearch-data-storage-architecture-for-scale
How to design your Elasticsearch data storage architecture for scale https://www.elastic.co/cn/blog/how-to-design-your-elasticsearch-data-storage-architecture-for- scale
スケーラブルなElasticsearchデータストレージを设计する https://www.elastic.co/jp/blog/how-to-design-your-elasticsearch-data-storage-architecture-for -scale

It is named in the following way.

https://www.elastic.co/blog/{post}
https://www.elastic.co/{country}/blog/{post}

English blogs do not have EN prefix, other languages use country abbreviation as prefix, for example, CN for China, JP for Japan.

TeXt Theme

Jekyll TeXt Theme is a highly customizable Jekyll theme for personal or team websites, blogs, projects, documents and more. It references the iOS 11 style with big and prominent headers and rounded buttons and cards. It was written by Alibaba’s engineer Tian Qi (kitian616). This theme supports internationalization. In fact, the documentation of this theme itself is internationalized. If you don’t believe me, see this table:

Language Links
Quick Start https://tianqi.name/jekyll-TeXt-theme/docs/en/quick-start
Quick Start https://tianqi.name/jekyll-TeXt-theme/docs/zh/quick-start

It is named in the following way.

https://tianqi.name/jekyll-TeXt-theme/docs/{lang}/{post}

Whatever the language, the language abbreviation is used as the prefix, for example, zh for Chinese, en for English.

Final solution

The final solution is proposal 4: use two collections. The first collection is the default posts and the second collection is cn. The main goal is to modify the article links to the following format.

https://mincong.io/{country}/{post}
https://mincong.io/{country}/{page}

The two parts of the link here.

  • country is the country, EN for English-speaking countries and CN for China. This expression was better than using locale en/zh because it’s not only a matter of language, but also the components loaded by the page: for example, the Chinese page will suggest WeChat but not the English version. In the future, I’ll also consider splitting the other components into two different versions: Chinese and English pages load different comment systems, different SEO scripts, etc.
  • The post or page is the ID of the blog post or the ID of another page.

Next, I want to share with you the specific tasks that need to be done when implementing internationalization.

Tasks

This section is a detailed explanation of the main tasks that need to be done. This section may be a bit long, it’s mainly for those who are interested in changing their blogs for real. If you don’t want to internationalize your site, I suggest avoid reading it into details.

Task 1: Modify Chinese Articles

Modify the article link to the following format.

https://mincong.io/cn/{post}

Since most of the Chinese articles were written after April this year, there is no need to keep the original links. At the beginning of each article, add two pieces of information: language and link redirection.

+ lang: zh
  date: 2021-04-20 11:21:16 +0800
  categories: [java-core]
  tags: [java, akka]
@@ -13,6 +14,8 @@ excerpt: >
  image: /assets/bg-ocean-ng-L0xOtAnv94Y-unsplash.jpg
  cover: /assets/bg-ocean-ng-L0xOtAnv94Y-unsplash.jpg
+ redirect_from:
+ - /2021/04/20/exponential-backoff-in-akka/
  article_header:

Then create a new collection called cn. Store it in the folder _cn according to Jekyll naming requirements, then put all Chinese articles in that folder and remove the “year, month and day” part of the file name.

Changes in article links.

In addition, in the global configuration file (_config.yml), configure the information about the cn collection, such as the permalink, whether to display the table of contents, etc. For details, see: https://github.com/mincong-h/mincong-h.github.io/pull/31

Task 2: Modify English Articles

I have 168 English articles on my blog, some of which have important page views. I don’t want them to lose any information because of the internationalization, such as comments and likes on Disqus. So my strategy for English articles is to not make any changes to existing articles and only change the new articles. For new articles, I use the new naming convention https://mincong.io/en/{post}. In the following paragraphs, let’s discuss it further.

For all existing articles, explicitly mark the article language as English in the front matter at the article level.

find _posts -type f -exec sed -i '' -E 's/date:/i lang: en' {} +

And after adding permalink so that they are not interfered with by the global configuration.

#! /bin/bash
paths=($(find "${HOME}/github/mincong-h.github.io/_posts" -type f -name "*.md" | tr '\n' ' '))
i=0
for path in "${paths[@]}"
do
    filename="${path##*/}"
    year=$(echo $filename  | sed -E 's/^([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+)-(. *)\.md/\1/')
    month=$(echo $filename | sed -E 's/^([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+)-(. *)\.md/\2/')
    day=$(echo $filename   | sed -E 's/^([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+)-(. *)\.md/\3/')
    name=$(echo $filename  | sed -E 's/^([[:digit:]]+)-([[:digit:]]+)-([[:digit:]]+)-(. *)\.md/\4/')
    permalink="/${year}/${month}/${day}/${name}/"
    echo "${i}: year=${year}, month=${month}, day=${day}, name=${name}, permalink=${permalink}"
    sed -i '' -E '/comments:/i\
permalink: PERMALINK
' "$path"
    sed -i '' "s|PERMALINK|${permalink}|" "$path"
    i=$((i + 1))
done

For new articles, use the new naming convention (_config.yml).

- permalink: /:year/:month/:day/:title/
+ permalink: /en/:title/

Also you need to modify the post generation script newpost.sh to make it generate both Chinese and English posts. Here is an excerpt from the script: we generate the paths for both Chinese and English posts, confirm that they do not exist, and then add new content.

title="${*:1}"

if [[ -z "$title" ]]; then
    echo 'usage: newpost.sh My New Blog'
    exit 1
fi

bloghome=$(cd "$(dirname "$0")" || exit; pwd)
url=$(echo "$title" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
filename="$(date +"%Y-%m-%d")-$url.md"
filepath_en="${bloghome}/_posts/${filename}"
filepath_cn="${bloghome}/_cn/${filename}"

if [[ -f "$filepath_en" ]]; then
    echo "${filepath_en} already exists."
    exit 1
fi

if [[ -f "$filepath_cn" ]]; then
    echo "${filepath_cn} already exists."
    exit 1
fi

append_metadata_en "$filepath_en" "$title"
append_metadata_cn "$filepath_cn" "$title"

# Not for EN, because EN post is translated.
append_content "$filepath_cn"

echo "Blog posts created!"
echo " EN: ${filepath_en}"
echo " CN: ${filepath_cn}"

For more details, see: https://github.com/mincong-h/mincong-h.github.io/pull/37

Task 3: Adding a Chinese Homepage

Adding a Chinese homepage sounds easy, as if all you need to do is copy index.html from the blog home page to cn/index.html and translate a few words. Actually, it is way more complex than that. I use the official Jekyll plugin jekyll-paginate (v1) for my home page. But this plugin only supports pagination for the default set posts, not for other sets, such as cn. So the real meaning of adding a Chinese homepage is to upgrade the plugin to jekyll-paginate-v2 to support pagination for the Chinese collection cn.

Install and use the new plugin in the site configuration (_config.yml) at

- paginate: 8
- paginate_path: /page:num # don't change this unless for special need
+ pagination:
+   enabled: true
+   per_page: 8


  ## => Sources
@@ -238,7 +240,7 @@ defaults:
  ##############################
  plugins:
    - jekyll-feed
-   - jekyll-paginate
+   - jekyll-paginate-v2

Modified the paginator of the TeXt Theme theme itself to avoid using site.posts directly as a source for posts. And also add a specific prefix to the homepage, so that English and Chinese have their own homepage, i.e. https://mincong.io/ and https://mincong.io/cn/.


- {%- assign _post_count = site.posts | size -%}
+ {%- assign _post_count = paginator.total_posts -%}
      {%- assign _page_count = paginator.total_pages -%}
      <p>{{ _locale_statistics | replace: '[POST_COUNT]', _post_count | replace: '[PAGE_COUNT]', _page_count }}</p>
      <div class="pagination__menu">
@@ -51,7 +51,7 @@
              </li>

            {%- elsif page == 1 -%}
-             {%- assign _home_path = site.paths.home | default: site.data.variables.default.paths.home -%}
+             {%- assign _home_path = site.paths.home | default: site.data.variables.default.paths.home | append: include.baseurl -%} 
              {%- include snippets/prepend-baseurl.html path=_home_path -%} 

There are actually some other modifications to consider, but I won’t expand on them due to the timing. Here is the final result for the home page, a comparison between English and Chinese:

comparison for the home page between English and Chinese

For more details see: https://github.com/mincong-h/mincong-h.github.io/pull/32

Task 4: Modifying Build and Deployment

You can no longer use the old automatic deployment method because of jekyll-paginate-v2, a plugin that is not officially supported by GitHub. Now you need to deploy it manually or via the CI. That is, you no longer deploy from the master branch. After the code is merged into master, the new pages are generated manually or by CI (core command: jekyll build). Then, the generated content, which is in the folder _site, is uploaded to the gh-pages branch for deployment.

To do it manually, the main steps are as follows: create a new, master-independent branch gh-pages, add an empty commit as the start of the branch, then empty the local Jekyll generated files folder _site and connect it to the new branch gh-pages

git checkout --orphan gh-pages
git commit --allow-empty -m "Initialize gh-pages"
rm -rf _site
git worktree add _site gh-pages
# "jekyll build" or equivalent commands

To implement this task, you also need to change the branch to deploy from master to ph-pages in the GitHub project settings.

Change the deployment method: no longer use master branch but gh-pages branch deployment

For more information, see: Sangsoo Nam, Using Git Worktree to Deploy GitHub Pages, 2019.

To do it via the CI (GitHub Actions in my case), you can use the following workflow:

name: Deploy to GitHub Pages
on:
  push:
    branches:
      - master
      - docker # testing

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    env:
      JEKYLL_ENV: production
    steps:
    - name: Checkout source code
      uses: actions/checkout@v2
      with:
        persist-credentials: false
    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: 2.6 # Not needed with a .ruby-version file
        bundler-cache: true # runs 'bundle install' and caches installed gems automatically
    - name: Install dependencies in the Gemfile
      run: |
         bundler install --path vendor/bundle
    - name: Build Jekyll website
      run: |
         bundle exec jekyll build
    - name: Deploy GitHub Pages
      uses: JamesIves/github-pages-deploy-action@4.1.4
      with:
        branch: gh-pages
        folder: _site

Task 5: Modifying More Pages

In the task above, we mainly mentioned modifications for Chinese articles and English articles. But a website has many other pages besides articles, such as categories, series, archives, about, etc. These pages also need to be modified before they can be used properly.

The main objective is to ensure a consistent user-experience for browsing. When navigating between pages, all links in English pages will lead to English pages, and all links in Chinese pages will lead to Chinese pages. This creates a comfortable reading experience for the user: because all pages are in a language they are familiar with. As for the pages that already exist, we need to redirect them to the new links. The following is a list of new pages and the redirection of existing pages.

Home page:

https://mincong.io/
https://mincong.io/cn/

Category pages:

https://mincong.io/en/categories/
https://mincong.io/en/categories/{category}/
https://mincong.io/cn/categories/
https://mincong.io/cn/categories/{category}/

https://mincong.io/categories/ -> https://mincong.io/en/categories/
https://mincong.io/categories/{category} -> https://mincong.io/en/categories/{category}/

Series pages:

https://mincong.io/en/series/
https://mincong.io/en/series/{serie}/
https://mincong.io/cn/series/

https://mincong.io/series/ -> https://mincong.io/en/series/
https://mincong.io/series/{serie} -> https://mincong.io/en/series/{serie}/

About page:

https://mincong.io/en/about/
https://mincong.io/cn/about/

https://mincong.io/about/ -> https://mincong.io/en/about/

Archived pages:

https://mincong.io/en/archive/
https://mincong.io/cn/archive/

https://mincong.io/archive/ -> https://mincong.io/en/archive/

For more information, see: https://github.com/mincong-h/mincong-h.github.io/pull/34

Task 6: Language Switching Button

Provide a language switch button in the website to enable languange switching. There are two main buttons here: one in the top right corner of the page, displayed as a flag, and another button in the title section of the article, highlighted in red for the current language and white for the optional other languages. The difference between these two buttons is that the top-right button will switch to the home page in another language when clicked, while the language button on the page will make the page jump directly to another version of the same article. I call them “global switching” and “article switching” feature.

English post page example

Chinese article page example

For the global switching feature, the main tasks are to write the flag, link, and other information of another language in the configuration file of the page navigation, and then use these information when the page is generated.

Register information to the data file of the page navigation (_data/navigation.yml).

site:
  ...
  # switch to the other langage
  urls2:
    en : /cn/
    zh : /
  urls2_src:
    en : /assets/flag-CN.png
    zh : /assets/flag-US.png
  urls2_alt:
    en : "Switch to Chinese"
    zh : "Switch to English"

The header (_includes/header.html) should include this element as well.

<li>
  <a href="{{ _site_root2 }}">
    <img src="{{ _site_root2_src }}"
         alt="{{ _site_roo2_alt }}"
         class="navigation__lang_img">
  </a>
</li>

For the local toggle feature, the implementation is quite different. This is achieved by looking for articles with the same name in a collection of other languages. Here, articles in different languages must use the same filename, otherwise they cannot be found. Specifically, we first get the article ID, then extract the characters after the last slash / (with the slash /), then take this information to traverse other collections and return the corresponding link:

{% assign _id = include.article.id %}
{% assign _filename = _id | split:"/" | last %}
{% assign _suffix = _filename | prepend: "/" %}
{% assign _matched = include.collection | where_exp: "item", "item.id contains _suffix" | first %}

{% if _matched %}
  {% assign __return = _matched.url %}
{% else %}
  {% assign __return = nil %}
{% endif %}

For more information, see.

Remaining Tasks

Having done this, the entire internationalization task is basically done. The following tasks can be addressed in the future to improve the situation:

  1. Implement both Chinese and English RSS feeds.
  2. Load more Chinese components on Chinese pages, such as loading WeChat’s SDK for sharing, loading Baidu’s SDK to improve search presence on Chinese search engine, replacing Disqus with another commenting system that can be loaded in mainland China without VPN, and links to other Chinese developer platforms.
  3. Automate the Chinese-to-English translation by scripting translation requests directly to third-party translation platforms, such as Google Translate, DeepL, etc.
  4. Fix the tag-cloud feature in the archive page. The tag-cloud currently uses site.tags for tag-related statistics. However, all the tags of Chinese articles (under the cn collection) are not taken into account.
  5. Fix the article category pages. Now the category page can show text in Chinese, but the actual article list is retrieved from English collection posts rather than Chinese collection cn.

If you have other suggestions, please feel free to leave a comment!

Going Further

How to go further from this article?

  • If you’ve never heard of Jekyll, you can visit official website to learn about this great blogging engine.
  • If you’ve never tried the free GitHub Pages, visit the official website and try to build your and host your own personal blog for free!
  • If you haven’t tried Jekyll TeXt Theme by Qi Tian, maybe you would like to try it.
  • If you want to learn more about jekyll-paginate-v2, you can visit their GitHub project.

Conclusion

In this article, we have seen the process of internationalizing of this site https://mincong.io, an internationalization based on Jekyll and TeXt Theme. We compared the pros and the cons of the four proposals; we looked at other blogs’ implementations of internationalization; we listed the six of the more important tasks; and we looked further into the next steps for internationalization. Finally, I also share some resources for you to going further from this article. I hope this article has given you some insights. If you’re interested in more information and advice, please follow me on GitHub mincong-h. Thank you all!

References