上周,我正在帮助某人将CSV文件中的数据加载到Neo4j中,我们在过滤掉其中一列中包含空值的行时遇到了麻烦。
数据如下所示:
load csv with headers from "file:///foo.csv" as row
RETURN row
╒══════════════════════════════════╕
│row │
╞══════════════════════════════════╡
│{key1: a, key2: (null), key3: c}│
├──────────────────────────────────┤
│{key1: d, key2: e, key3: f} │
└──────────────────────────────────┘
我们想过滤掉任何将'key2'设置为null的行,所以让我们调整一下查询来做到这一点:
load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
(no rows)
嗯,这很奇怪,它摆脱了两行。 我们希望看到第二行,因为它没有空值。
在这一点上,我们可能会怀疑我们在屏幕上看到的实际上不是数据的样子。 让我们编写以下查询来检查标头值:
load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒═════╤═════════╕
│key │SIZE(key)│
╞═════╪═════════╡
│key1 │4 │
├─────┼─────────┤
│ key2│5 │
├─────┼─────────┤
│ key3│5 │
└─────┴─────────┘
第二列告诉我们,“ key2”和“ key3”或“ key2”和“ key3”的列中还有一些额外的字符。 在这种情况下,它们是空格,但很可能是另一个字符:
load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, replace(key, " ", "_SPACE_") AS spaces
╒═════╤═══════════╕
│key │spaces │
╞═════╪═══════════╡
│key1 │key1 │
├─────┼───────────┤
│ key2│_SPACE_key2│
├─────┼───────────┤
│ key3│_SPACE_key3│
└─────┴───────────┘
如果我们清理CSV文件,然后重试,一切将按预期工作:
load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒════╤═════════╕
│key │SIZE(key)│
╞════╪═════════╡
│key1│4 │
├────┼─────────┤
│key2│4 │
├────┼─────────┤
│key3│4 │
└────┴─────────┘
load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
╒═══════════════════════════╕
│row │
╞═══════════════════════════╡
│{key1: d, key2: e, key3: f}│
└───────────────────────────┘
翻译自: https://www.javacodegeeks.com/2016/10/neo4j-detecting-rogue-spaces-csv-headers-load-csv.html