コピペしただけのコードで SyntaxError が出るのはなぜ?

Web ページや PDF からのコピーで U+200B(ゼロ幅スペース)や U+00A0(ノーブレークスペース)が混入するためです。見た目は普通のコードでも、パーサーには未知の文字として扱われます。該当行を一度削除して手で打ち直すか、不可視文字を正規表現で除去してください。

同じに見える文字列の比較が false になる原因は?

片方の文字列にゼロ幅スペースなどの不可視文字が含まれていると、表示は同一でも length とコードポイント列が異なるため一致しません。length を比較するか、各文字を codePointAt で16進ダンプすると確認できます。

CSV の1列目だけ row["id"] が undefined になるのはなぜ?

Excel が出力する UTF-8 CSV は先頭に BOM(U+FEFF)が付くため、最初のヘッダ名が "\uFEFFid" になっているのが典型原因です。読み込み時に先頭の U+FEFF を除去するか、Python なら encoding='utf-8-sig' を指定します。

trim() でゼロ幅スペースは消える?

消えません。JavaScript の trim() は U+00A0 や U+FEFF は除去しますが、U+200B(ゼロ幅スペース)は空白扱いではないため残ります。明示的に replace(/[\u200B-\u200D\u2060]/g, '') する必要があります。

Why does copy-pasted code throw a syntax error?

Text copied from web pages or PDFs often carries zero-width spaces (U+200B) or no-break spaces (U+00A0). They are invisible but illegal in source code, so the parser fails. Retype the line or strip the characters with a regex.

How do I find invisible Unicode characters in a file?

Use grep -P "[\x{200B}\x{FEFF}\x{00A0}]", pipe the text through od -An -tx1 to inspect raw bytes, or enable VS Code's built-in Unicode highlighting (editor.unicodeHighlight.invisibleCharacters).

見えない文字(ZWSP・BOM・NBSP)が起こすバグの見つけ方

最終更新日: 2026-06-11公開日: 2026-06-11執筆: DevToolBox編集部

Web からコピペしたコードが SyntaxError: Invalid or unexpected token で落ちる。画面上は完全に同じ文字列なのに === が false。grep しても絶対あるはずの行がヒットしない—— こうした「目で見ても分からないバグ」の犯人は、ほぼ間違いなく不可視の Unicode 文字です。本記事では U+200B(ゼロ幅スペース)・U+FEFF(BOM)・U+00A0(ノーブレークスペース)などの混入経路と、エディタ・正規表現・od/hexdump を使った特定・除去の手順を解説します。

Invisible Unicode characters such as the zero-width space (U+200B), the BOM (U+FEFF) and the no-break space (U+00A0) sneak into code via copy-paste and break parsers, string comparison, grep searches and CSV headers. This guide shows how to detect and remove them.

TL;DR

コピペ直後の SyntaxError は ZWSP(U+200B)か NBSP(U+00A0) の混入をまず疑う
「同じに見えるのに不一致」はまず length を比較。長さが違えば不可視文字が確定
CSV の1列目だけ取れないのは BOM(U+FEFF)。ヘッダ名が "id" になっている
検出正規表現: /[ -‍⁠]/g
VS Code は editor.unicodeHighlight.invisibleCharacters(既定で有効)が黄色枠で警告してくれる
trim() は NBSP と U+FEFF は消すが ZWSP は消さない。明示的な replace が必要

Unicode Inspector

疑わしい文字列を貼り付けると、1文字ずつコードポイント(U+200B 等)と名前を表示。不可視文字が混ざっていれば一目で特定できます。ブラウザ完結・外部送信なし。

今すぐ試す →

1. 犯人はこの4種類 / The four usual suspects

文字 / Character	UTF-8 バイト列	主な混入経路 / How it gets in
U+200B ZERO WIDTH SPACE (ZWSP)	`E2 80 8B`	Web ページの折り返し制御。ブログ・チャット・スプレッドシートからのコピペ
U+FEFF BOM / ZERO WIDTH NO-BREAK SPACE	`EF BB BF`	Excel・メモ帳が保存する UTF-8 ファイルの先頭。CSV / JSON の冒頭に付く
U+00A0 NO-BREAK SPACE (NBSP)	`C2 A0`	HTML の ` `、Mac の Option+Space、Word からのコピペ
U+200C / U+200D ゼロ幅(非)結合子 (ZWNJ / ZWJ)	`E2 80 8C` / `E2 80 8D`	絵文字の結合シーケンス、アラビア文字等の組版。SNS からのコピペ

いずれも幅ゼロまたは普通の空白と同じ見た目で描画されるため、目視では発見できません。なお U+FEFF はファイル先頭にあれば BOM、途中にあればゼロ幅ノーブレークスペースとして扱われます (文中での使用は非推奨で、現在は U+2060 WORD JOINER が代替)。

All four render as nothing or as an ordinary space, so you cannot spot them visually. U+FEFF acts as a BOM at the start of a file and as a zero-width no-break space elsewhere.

2. 典型的なバグ症状4つ / Four classic symptoms

2-1. コピペ由来のシンタックスエラー

見た目はまったく正しいコードなのにパーサーが落ちます。実際のエラーメッセージは次のとおりです。

// JavaScript (Node.js / Chrome) — 行内に U+200B が混入
SyntaxError: Invalid or unexpected token

# Python 3 — こちらはコードポイントを教えてくれる
SyntaxError: invalid non-printable character U+200B

# シェル — コマンドと引数の間が NBSP だと1語に連結される
bash: curl -X: command not found

Python 3 は問題のコードポイントを明示してくれますが、JavaScript の V8 は「どの文字か」を教えてくれません。エラー行を一度削除して手で打ち直すのが最速の応急処置です。

2-2. 文字列比較の不一致

const a = "admin";        // 5文字
const b = "admin\u200B";   // 末尾に ZWSP。画面表示は a と同一
console.log(a === b);      // false
console.log(a.length, b.length); // 5 6  ← 長さの差で気付ける

2-3. grep / エディタ検索でヒットしない

検索語を手入力し、対象ファイル側に admin のように不可視文字が挟まっていると、「確実に存在する行」が検索にかからないという不気味な現象になります。逆方向(検索語側に混入)もあります。

2-4. CSV ヘッダ不一致(BOM)

// Excel が出力した UTF-8 CSV を読むと…
Object.keys(rows[0]);  // ["\uFEFFid", "name", "price"]
rows[0]["id"];          // undefined ← 1列目だけ取れない

# Python (pandas) は utf-8-sig で BOM を自動除去
df = pd.read_csv("data.csv", encoding="utf-8-sig")

Symptoms: parsers reject visually correct code, identical-looking strings compare unequal, grep misses lines that clearly exist, and the first CSV column comes back undefined because the header is actually "id".

3. 検出方法 / How to detect them

3-1. エディタで可視化する

VS Code 1.63 以降は editor.unicodeHighlight.invisibleCharacters が既定で有効になっており、不可視文字が黄色の枠でハイライトされます(表示されない場合は settings.json で true を確認)。あわせて "editor.renderWhitespace": "all" にすると NBSP と通常スペースの描画差も見えます。

3-2. 正規表現で機械的に探す

// JavaScript: 主要な不可視文字をまとめて検出
const INVISIBLE = /[\u00A0\u200B-\u200D\u2060\uFEFF]/g;
console.log(INVISIBLE.test(suspiciousText)); // true なら混入あり

// 1文字ずつコードポイントを16進ダンプして目視確認
[..."adm\u200Bin"].map((c) => c.codePointAt(0).toString(16));
// → ["61", "64", "6d", "200b", "69", "6e"]

3-3. コマンドラインで探す

# GNU grep (-P: PCRE) でリポジトリ全体を走査
grep -rnP "[\x{200B}\x{FEFF}\x{00A0}\x{200C}\x{200D}]" src/

# od で生バイトを確認(E2 80 8B = U+200B, C2 A0 = U+00A0)
printf '%s' "admin" | od -An -tx1
#  61 64 6d e2 80 8b 69 6e

# ファイル先頭3バイトが EF BB BF なら BOM 付き UTF-8
head -c 3 data.csv | od -An -tx1

Detect them with VS Code's built-in Unicode highlighting, a regex like/[ -‍⁠]/g, grep -P with hex escapes, or by dumping raw bytes with od -An -tx1 and looking for E2 80 8B / EF BB BF / C2 A0.

4. 除去と再発防止 / Removal & prevention

// ゼロ幅系をすべて削除し、NBSP は通常スペースに置換
const clean = s
  .replace(/[\u200B-\u200D\u2060\uFEFF]/g, "")
  .replace(/\u00A0/g, " ");

// JSON.parse 前は先頭 BOM だけ剥がすのが安全
JSON.parse(text.replace(/^\uFEFF/, ""));

trim() の限界: U+00A0 と U+FEFF は JS の空白扱いで除去されるが、U+200B / U+200C / U+200D は残る
normalize("NFKC") は NBSP を通常スペースに変換するが、ZWSP / ZWJ / BOM はそのまま。正規化だけでは不十分
ZWJ は絵文字の結合(家族絵文字など)に必須。ユーザー入力全体から無差別に削るのは危険で、識別子・キー・コードに限定する
CSV は読み込み側で対策: Python は encoding="utf-8-sig"、JS はヘッダ名の先頭を strip
ファイル保存は「UTF-8(BOM なし)」を明示。エディタの既定エンコーディングを確認する

Strip zero-width characters explicitly — trim() removes NBSP and U+FEFF but not U+200B, and NFKC normalization only fixes NBSP. Never strip ZWJ from arbitrary user input, because emoji sequences depend on it; restrict cleanup to identifiers, keys and source code.

まとめ / Summary

「見た目は正しいのに動かない」と感じたら、推測でコードをいじる前にlength の比較とコードポイントのダンプを行うのが最短ルートです。混入源(Excel・Word・Web ページ・SNS)を特定したら、取り込み口で正規表現による除去を仕込めば再発も防げます。

Invisible Unicode characters are a recurring source of "impossible" bugs. A zero-width space pasted from a blog post makes V8 throw SyntaxError: Invalid or unexpected tokenon a line that looks perfectly fine; Python at least names the culprit withinvalid non-printable character U+200B. The same characters make equal-looking strings fail ===, hide lines from grep, and break CSV parsing when Excel prepends a BOM so the first header becomes "id". Detection is mechanical once you know the byte signatures: E2 80 8B (U+200B), EF BB BF (U+FEFF) and C2 A0 (U+00A0). Use VS Code's Unicode highlighting, scan with grep -rnP "[\x{200B}\x{FEFF}\x{00A0}]", or pipe the text through od -An -tx1. For cleanup, remember that trim()removes NBSP and U+FEFF but not the zero-width space, and that NFKC normalization only converts NBSP — an explicit replace(/[-‍⁠]/g, "") is required. Do not blanket-strip ZWJ from user input, since emoji sequences rely on it. Finally, save files as UTF-8 without BOM and read Excel CSVs with utf-8-sig to stop the bug at the source.

Unicode Inspector

このページで紹介した検出を GUI で。文字列を貼るだけで U+200B / U+FEFF / U+00A0 などをコードポイント名付きで列挙し、混入位置を特定できます。

今すぐ試す →

よくある質問 / FAQ

コピペしただけのコードで SyntaxError が出るのはなぜ?: Web ページや PDF からのコピーで U+200B(ゼロ幅スペース)や U+00A0(ノーブレークスペース)が混入するためです。見た目は普通のコードでも、パーサーには未知の文字として扱われます。該当行を一度削除して手で打ち直すか、不可視文字を正規表現で除去してください。
同じに見える文字列の比較が false になる原因は?: 片方の文字列にゼロ幅スペースなどの不可視文字が含まれていると、表示は同一でも length とコードポイント列が異なるため一致しません。length を比較するか、各文字を codePointAt で16進ダンプすると確認できます。
CSV の1列目だけ row["id"] が undefined になるのはなぜ?: Excel が出力する UTF-8 CSV は先頭に BOM(U+FEFF)が付くため、最初のヘッダ名が "\uFEFFid" になっているのが典型原因です。読み込み時に先頭の U+FEFF を除去するか、Python なら encoding='utf-8-sig' を指定します。
trim() でゼロ幅スペースは消える?: 消えません。JavaScript の trim() は U+00A0 や U+FEFF は除去しますが、U+200B(ゼロ幅スペース)は空白扱いではないため残ります。明示的に replace(/[\u200B-\u200D\u2060]/g, '') する必要があります。
Why does copy-pasted code throw a syntax error?: Text copied from web pages or PDFs often carries zero-width spaces (U+200B) or no-break spaces (U+00A0). They are invisible but illegal in source code, so the parser fails. Retype the line or strip the characters with a regex.
How do I find invisible Unicode characters in a file?: Use grep -P "[\x{200B}\x{FEFF}\x{00A0}]", pipe the text through od -An -tx1 to inspect raw bytes, or enable VS Code's built-in Unicode highlighting (editor.unicodeHighlight.invisibleCharacters).

見えない文字(ZWSP・BOM・NBSP)が起こすバグの見つけ方

TL;DR

Unicode Inspector

1. 犯人はこの4種類 / The four usual suspects

2. 典型的なバグ症状4つ / Four classic symptoms

2-1. コピペ由来のシンタックスエラー

2-2. 文字列比較の不一致

2-3. grep / エディタ検索でヒットしない

2-4. CSV ヘッダ不一致(BOM)

3. 検出方法 / How to detect them

3-1. エディタで可視化する

3-2. 正規表現で機械的に探す

3-3. コマンドラインで探す

4. 除去と再発防止 / Removal & prevention

まとめ / Summary

Unicode Inspector

よくある質問 / FAQ

関連ツール / Related tools

関連ガイド / Related guides