「縺」「繧」「繝」だらけの文字化けの原因は?

UTF-8 のバイト列を Shift_JIS(CP932)として解釈したのが原因です。ひらがなの UTF-8 先頭バイト E3 81/E3 82/E3 83 が CP932 では縺・繧・繝に対応します。正しく UTF-8 として読み直せば復元できます。

「�」(置換文字)と「□」(豆腐)の違いは?

「�」(U+FFFD)はデコード時に不正なバイト列が置換された印で、元のバイトは失われており復元不能です。「□」はフォントにグリフがないだけでデータは無事。フォントを変えれば表示できます。

MySQL で絵文字を保存すると Incorrect string value エラーになる

MySQL の utf8(utf8mb3)は 1 文字 3 バイトまでで、4 バイトの絵文字を格納できません。テーブルと接続の文字コードを utf8mb4 に変更してください。

Why does my text turn into sequences like 'ã' or 'æ—¥'?

UTF-8 bytes were decoded as Latin-1 / Windows-1252. Each multi-byte UTF-8 character becomes 2-4 Latin letters with diacritics. Re-decode the same bytes as UTF-8 to recover the original text.

Can mojibake always be reversed?

Only if no lossy step happened. Wrong-decode mojibake (縺/ã patterns) is reversible; characters already replaced with '?' or U+FFFD are gone and must be re-fetched from the source.

文字化けの原因特定と直し方 - 「縺」「ã」「□」化け方パターン逆引き

最終更新日: 2026-06-11公開日: 2026-06-11執筆: DevToolBox編集部

文字化けは化け方を見れば原因がほぼ特定できます。「縺」が並ぶなら UTF-8 を Shift_JIS で誤解釈、「ã」「æ」が並ぶなら Latin-1 で誤解釈、「?」や「�」ならデータは既に失われています。本記事ではパターン逆引き表と、HTTP ヘッダ・HTML の meta・DB・ファイル保存の 4 か所を順に確認する手順をまとめます。

Garbled Japanese text (mojibake) reveals its own cause: 縺/繧/繝 sequences mean UTF-8 bytes were decoded as Shift_JIS, ã/æ/è sequences mean they were decoded as Latin-1/Windows-1252, while "?" and U+FFFD mean the data was destroyed by a lossy conversion. This guide maps each pattern to its cause, recoverability, and fix.

TL;DR

「縺薙ｓ縺ｫ縺｡縺ｯ」= UTF-8 を Shift_JIS で解釈。読み直せば復元可
「æ—¥æœ¬èªž」= UTF-8 を Latin-1/CP1252 で解釈。これも復元可
「??????」= 変換時に表現できず ? に置換済み。復元不能
「�」(U+FFFD) = 不正バイトの置換。復元不能。「□」はフォントの問題でデータは無事
確認順序: HTTP ヘッダ → HTML meta → DB の文字コード → ファイル保存時のエンコーディング
正体不明の文字は Unicode Inspector でコードポイントを見れば一発で判別できる

Unicode Inspector

化けた文字列を貼り付けると、1文字ずつコードポイント(U+XXXX)と文字名を表示。「�」(U+FFFD・復元不能)なのか、フォントが無いだけの正常な文字なのかを即座に判別できます。

今すぐ試す →

1. 化け方パターン逆引き表 / Identify the cause from the pattern

化け方の例 / Pattern	原因 / Cause	復元 / Recoverable?
縺薙ｓ縺ｫ縺｡縺ｯ(縺・繧・繝が頻出)	UTF-8 のバイト列を Shift_JIS(CP932)として解釈	可。UTF-8 で読み直す
æ—¥æœ¬èªž(ã・æ・è が頻出)	UTF-8 を Latin-1 / Windows-1252 として解釈	概ね可(例外は §2)
譁・蟄など見慣れない漢字の羅列	UTF-8 の漢字(先頭バイト E4〜E9)を Shift_JIS 解釈	可。同上
??????	Unicode → 旧エンコーディング変換時に表現できず ? へ置換	不可。元データを再取得
��(U+FFFD)	デコーダが不正なバイト列を置換文字に差し替えた	不可
□□□(豆腐)	データは正常。フォントにグリフがないだけ	壊れていない。フォント変更で表示可

判断に迷ったら、化けた文字列を Unicode Inspector に貼ってコードポイントを確認します。 U+FFFD が見えたら復元不能、U+7E3A(縺)などまともな漢字に化けているなら誤デコードで復元可能、正しいコードポイントなのに表示できないならフォントの問題です。

The garbled shape tells you the failure mode. Wrong-decode patterns (縺/ã) are reversible; "?" and U+FFFD mean bytes were already destroyed; tofu boxes mean the font lacks a glyph.

2. 「縺」系・「ã」系は復元できる / Reversing wrong-decode mojibake

誤デコードは「化けた文字列を間違えた方のエンコーディングでバイト列に戻し、正しい方で読み直す」だけで復元できます。Python は CP932/CP1252 を標準で扱えるので確認に便利です。

>>> # 「縺」系: UTF-8 を Shift_JIS(CP932) で読んでしまったケース
>>> "縺薙ｓ縺ｫ縺｡縺ｯ".encode("cp932").decode("utf-8")
'こんにちは'

>>> # 「ã」系: UTF-8 を Windows-1252 で読んでしまったケース
>>> "æ—¥æœ¬èªž".encode("cp1252").decode("utf-8")
'日本語'

仕組みは単純で、ひらがなの UTF-8 表現は E3 81 xx / E3 82 xx / E3 83 xx の 3 バイトであり、先頭 2 バイトを CP932 の 2 バイト文字として読むとそれぞれ「縺」「繧」「繝」になるためです。なお CP1252 には 0x81・0x8D など未割り当てのバイトがあり、そこに当たった文字は上記の方法でも完全には戻らないことがあります。また一度「?」や U+FFFD に置換された部分は、この方法でも戻りません。

Encode the garbled string back with the wrong codec, then decode as UTF-8. Hiragana's UTF-8 lead bytes E3 81/82/83 map to 縺/繧/繝 in CP932 — that is why those characters dominate.

3. Webなら HTTP ヘッダと meta を確認 / Check HTTP charset and meta

ブラウザ表示が化ける場合、まず疑うのはサーバが返す Content-Type ヘッダです。HTTP ヘッダの charset は HTML の <meta charset> より優先されるため、meta を UTF-8 にしてもヘッダが Shift_JIS なら化けたままです。

$ curl -sI https://example.com/
HTTP/2 200
content-type: text/html; charset=Shift_JIS   ← 本文が UTF-8 ならここが原因

ヘッダに charset がない場合はブラウザが <meta charset="utf-8"> を参照しますが、HTML 仕様上この宣言はファイル先頭から 1024 バイト以内に置く必要があります。巨大なコメントや SVG をその前に挟まないこと。サーバ側は nginx ならcharset utf-8;、Apache なら AddDefaultCharset UTF-8、アプリ側でヘッダを返すなら Content-Type: text/html; charset=UTF-8 に統一します。

The HTTP Content-Type charset overrides the meta tag. Check it with curl -sI first, and keep <meta charset> within the first 1024 bytes of the document.

4. DBの文字コード: MySQL の utf8 と utf8mb4 / Database charset pitfalls

DB 経由の文字化けで最頻出なのが MySQL です。歴史的経緯で utf8 は 3 バイトまでのutf8mb3 の別名であり、4 バイトの絵文字や一部の漢字を保存しようとすると次のエラーになります。

ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x80' for column 'body' at row 1

対処は utf8mb4 への統一です。さらに厄介なのが接続文字コードの不一致で、クライアントが latin1 のままだと「化けた状態のバイト列が DB に保存される」二重エンコードが起き、SELECT しても化けたまま、という最悪パターンになります。現状確認は次の 1 行です。

mysql> SHOW VARIABLES LIKE 'character_set%';
-- character_set_client / _connection / _results がすべて utf8mb4 であること

ALTER TABLE posts CONVERT TO CHARACTER SET utf8mb4;  -- テーブル側
-- 接続側: DSN に charset=utf8mb4 を指定(例: mysql://...?charset=utf8mb4)

MySQL's utf8 is 3-byte utf8mb3; emoji need utf8mb4. Verify client, connection, and results charsets with SHOW VARIABLES LIKE 'character_set%' — a latin1 connection silently stores double-encoded garbage.

5. ファイル保存時の確認手順 / File-save checklist

VS Code: 右下ステータスバーに現在のエンコーディングが表示される。化けて見えたら「Reopen with Encoding」で解釈だけを変えて確認する(非破壊)。「Save with Encoding」は保存し直しなので、化けた表示のまま実行しないこと。
Excel と CSV: 日本語版 Excel は CSV を既定で CP932 として開く。UTF-8 の CSV は先頭に BOM(EF BB BF)を付けると UTF-8 として認識される。逆にプログラムで読む際は BOM の除去を忘れない(JSON の position 0 エラーの典型原因)。
最終手段はバイト列の確認: Hex Converter で 16 進ダンプし、E3 81〜E3 83 が並べばひらがな・カタカナの UTF-8、82・83 始まりの 2 バイト列なら Shift_JIS と判断できる。

In VS Code use "Reopen with Encoding" (non-destructive) before "Save with Encoding". Excel needs a UTF-8 BOM for CSV; when in doubt, inspect the raw bytes in hex.

まとめ / Summary

文字化けの調査は「化け方パターンで原因を当てる → 復元可能か判断する → 経路(HTTP・meta・DB・ファイル)のどこで食い違ったか特定する」の 3 段階で進めます。「縺」「ã」系なら慌てる必要はありません。データは生きており、正しいエンコーディングで読み直すだけです。「?」「�」になっていたら元ソースからの再取得が必要です。再発防止は全経路を UTF-8(MySQL は utf8mb4)に統一することに尽きます。

Mojibake debugging is pattern matching. 縺/繧/繝 sequences mean UTF-8 bytes were decoded as Shift_JIS; ã/æ/è sequences mean Latin-1/Windows-1252 — both are fully reversible by re-encoding with the wrong codec and re-decoding as UTF-8. Question marks and U+FFFD replacement characters mean a lossy conversion already destroyed the bytes, and tofu boxes simply mean the font lacks a glyph while the data is intact. Walk the pipeline in order: the HTTP Content-Type charset (which overrides the meta tag), the meta declaration within the first 1024 bytes, the database charset (use utf8mb4 in MySQL, and check the connection charset too), and finally the editor's file encoding. Unify everything on UTF-8 and the problem stays fixed.

Unicode Inspector

BOM(U+FEFF)・ゼロ幅文字・置換文字(U+FFFD)など目に見えない文字もコードポイント単位で可視化。文字化け調査の最初の一手にどうぞ。

今すぐ試す →

よくある質問 / FAQ

「縺」「繧」「繝」だらけの文字化けの原因は?: UTF-8 のバイト列を Shift_JIS(CP932)として解釈したのが原因です。ひらがなの UTF-8 先頭バイト E3 81/E3 82/E3 83 が CP932 では縺・繧・繝に対応します。正しく UTF-8 として読み直せば復元できます。
「�」(置換文字)と「□」(豆腐)の違いは?: 「�」(U+FFFD)はデコード時に不正なバイト列が置換された印で、元のバイトは失われており復元不能です。「□」はフォントにグリフがないだけでデータは無事。フォントを変えれば表示できます。
MySQL で絵文字を保存すると Incorrect string value エラーになる: MySQL の utf8(utf8mb3)は 1 文字 3 バイトまでで、4 バイトの絵文字を格納できません。テーブルと接続の文字コードを utf8mb4 に変更してください。
Why does my text turn into sequences like 'ã' or 'æ—¥'?: UTF-8 bytes were decoded as Latin-1 / Windows-1252. Each multi-byte UTF-8 character becomes 2-4 Latin letters with diacritics. Re-decode the same bytes as UTF-8 to recover the original text.
Can mojibake always be reversed?: Only if no lossy step happened. Wrong-decode mojibake (縺/ã patterns) is reversible; characters already replaced with '?' or U+FFFD are gone and must be re-fetched from the source.

文字化けの原因特定と直し方 - 「縺」「ã」「□」化け方パターン逆引き

TL;DR

Unicode Inspector

1. 化け方パターン逆引き表 / Identify the cause from the pattern

2. 「縺」系・「ã」系は復元できる / Reversing wrong-decode mojibake

3. Webなら HTTP ヘッダと meta を確認 / Check HTTP charset and meta

4. DBの文字コード: MySQL の utf8 と utf8mb4 / Database charset pitfalls

5. ファイル保存時の確認手順 / File-save checklist

まとめ / Summary

Unicode Inspector

よくある質問 / FAQ

関連ツール / Related tools

関連ガイド / Related guides