【超高性能】無料＆軽量の日本語-英語の翻訳特化AI「FuguMT」をWindowsローカル環境で動かす

英語と日本語の翻訳に特化した軽量かつ高精度の翻訳AI「FuguMT」を動かす方法を解説し、使いやすいように改造したコピペ用コードを載せておきます。

モデルサイズが僅か150MB程度のため、誰でもローカル環境で使えます！！

1. 必要なものをインストール
2. 実際に動かしてみる
- 英語→日本語翻訳を行うコード
- 日本語→英語翻訳を行うコード
3. 入力言語を英語か日本語か自動判定し、翻訳するようにしてみる

1. 必要なものをインストール

Pythonが使える状態にして、下の2つをインストールしましょう。

pip install transformers sentencepiece

pip install sacremoses

2. 実際に動かしてみる

今回はHuggingFace公式通り、pipelineを使います。

英語→日本語翻訳を行うコード

from transformers import pipeline

input_text = """Kyoto is one of the tourist destinations where you can feel the traditional culture of Japan. 
There are many famous shrines and temples, and you can enjoy the beautiful scenery of the four seasons."""

ej_translator = pipeline("translation", model="staka/fugumt-en-ja")
result = ej_translator(input_text)
print(result)

このように僅か数行で翻訳できます。所要時間は数秒でした。

上のコードを実行した場合、このように出力されます。

[{'translation_text': '京都は、日本の伝統的な文化を感じることができる観光地の一つで、有名な神社や寺院が多く、四季折々の美しい景色を楽しむことができます。'}]

日本語→英語翻訳を行うコード

from transformers import pipeline

input_text = "京都は、日本の伝統的な文化を感じることができる観光地の一つで、有名な神社や寺院が多く、四季折々の美しい景色を楽しむことができます。"
je_translator = pipeline("translation", model="staka/fugumt-ja-en")
result = je_translator(input_text)
print(result)

同様にし実行して、下のような翻訳結果が得られました。

[{'translation_text': 'Kyoto is one of the tourist destinations where you can feel the traditional culture of Japan, there are many famous shrines and temples, and you can enjoy the beautiful scenery of each season.'}]

3. 入力言語を英語か日本語か自動判定し、翻訳するようにしてみる

コピペ用コード

import string
from transformers import pipeline


def is_halfwidth(s):  # 半角のみならTrue
    return all(
        c in string.ascii_letters + string.digits + string.punctuation + " " for c in s
    )


ej_translator = pipeline("translation", model="staka/fugumt-en-ja")
je_translator = pipeline("translation", model="staka/fugumt-ja-en")

while True:
    input_text = input("翻訳したい文章入力(enかjaか自動判別します): ")
    if is_halfwidth(input_text) == True:
        result = ej_translator(input_text)
    elif is_halfwidth(input_text) == False:
        result = je_translator(input_text)

    print("=" * 15, "翻訳後", "=" * 15)
    print(str(result).replace("[{'translation_text': '", "").replace("'}]", "") + "\n")

上記コードをコピペすれば、ターミナルで繰り返し翻訳が行なえます。

初回のモデルロード以外は、1秒未満で1文を翻訳してくれます！！

動作具合と翻訳精度の評価

こんな具合に。

翻訳したい文章入力(enかjaか自動判別します): I love watching anime
=============== 翻訳後 ===============
私はアニメを見るのが大好きです

翻訳したい文章入力(enかjaか自動判別します): アクション映画が好きです
=============== 翻訳後 ===============
i like action movies.

翻訳したい文章入力(enかjaか自動判別します): みかんは、日本でよく食べられる柑橘類です。冬になると、スーパーなどで売られるようになります。皮をむいて食べるのが一般的ですが、ジュースにしたり、煮物やマーマレードにしたりすることもできます。
=============== 翻訳後 ===============
Mikan is a citrus that is often eaten in Japan. In winter, it is sold at supermarkets, and although it is common to eat with peeled skin, it can also be made into juice or boiled or marmalade.

翻訳したい文章入力(enかjaか自動判別します): Mikan is a citrus that is often eaten in Japan. In winter, it is sold at supermarkets, and although it is common to eat with peeled skin, it can also be made into juice or boiled or marmalade.
=============== 翻訳後 ===============
みかんは日本でよく食べられる橘類で、冬はスーパーで売られ、皮をむいた状態で食べるのが一般的ですが、ジュースや煮物、マーマレードにすることもできます。

凄い！！！！

みかんをMikanを適切に表記できています！！！

どんくらいコレが凄いことかと言うと、VRAM13.6GBも必要なNLLB-200(3.3B)さえみかんをMikenとしてしまうので、ローカルで動く翻訳でコレは素晴らしい。

NLLB-200は200言語以上に対応している翻訳AIなので汎用性では「FuguMT」が劣りますが、実用上日本語↔英語があれば十分です。

勿論DeepLやGoogle翻訳も同様にMikanとしますが、モデルサイズ150MBでこの精度は最早異常。

これぞ特化型AIの真髄ですね！！！！