我最近在处理的图上遇到问题,因为我没有应用任何唯一的约束 ,我设法创建了重复的节点。
我想删除重复项,并遇到了吉米·鲁茨(Jimmy Ruts)的精彩文章 ,其中显示了一些方法。
让我们首先创建一个包含一些重复节点的图形:
UNWIND range(0, 100) AS id
CREATE (p1:Person {id: toInteger(rand() * id)})
MERGE (p2:Person {id: toInteger(rand() * id)})
MERGE (p3:Person {id: toInteger(rand() * id)})
MERGE (p4:Person {id: toInteger(rand() * id)})
CREATE (p1)-[:KNOWS]->(p2)
CREATE (p1)-[:KNOWS]->(p3)
CREATE (p1)-[:KNOWS]->(p4)Added 173 labels, created 173 nodes, set 173 properties, created 5829 relationships, completed after 408 ms.
我们如何找到重复的节点?
MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes
WHERE size(nodes) > 1
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10╒══════════════════════╤═════════════╕
│"ids" │"size(nodes)"│
╞══════════════════════╪═════════════╡
│[1,1,1,1,1,1,1,1] │8 │
├──────────────────────┼─────────────┤
│[0,0,0,0,0,0,0,0] │8 │
├──────────────────────┼─────────────┤
│[17,17,17,17,17,17,17]│7 │
├──────────────────────┼─────────────┤
│[4,4,4,4,4,4,4] │7 │
├──────────────────────┼─────────────┤
│[2,2,2,2,2,2] │6 │
├──────────────────────┼─────────────┤
│[5,5,5,5,5,5] │6 │
├──────────────────────┼─────────────┤
│[19,19,19,19,19,19] │6 │
├──────────────────────┼─────────────┤
│[11,11,11,11,11] │5 │
├──────────────────────┼─────────────┤
│[25,25,25,25,25] │5 │
├──────────────────────┼─────────────┤
│[43,43,43,43,43] │5 │
└──────────────────────┴─────────────┘
让我们放大所有具有“ id:1”的人,并计算出他们有多少关系。 我们的计划是保持连接最多的节点,并摆脱其他节点。
MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes
WHERE size(nodes) > 1
WITH nodes ORDER BY size(nodes) DESC
LIMIT 1
UNWIND nodes AS n
RETURN n.id, id(n) AS internalId, size((n)--()) AS rels
ORDER BY rels DESC╒══════╤════════════╤══════╕
│"n.id"│"internalId"│"rels"│
╞══════╪════════════╪══════╡
│1 │175 │1284 │
├──────┼────────────┼──────┤
│1 │184 │721 │
├──────┼────────────┼──────┤
│1 │180 │580 │
├──────┼────────────┼──────┤
│1 │2 │391 │
├──────┼────────────┼──────┤
│1 │195 │361 │
├──────┼────────────┼──────┤
│1 │199 │352 │
├──────┼────────────┼──────┤
│1 │302 │5 │
├──────┼────────────┼──────┤
│1 │306 │1 │
└──────┴────────────┴──────┘
因此,在此示例中,我们要保留具有210个关系的节点并删除其余的关系。
为了使事情变得容易,我们需要基数最大的节点在列表中排在第一或最后。 我们可以通过在对节点进行分组之前对节点进行排序来确保确实如此。
MATCH (p:Person)
WITH p
ORDER BY p.id, size((p)--()) DESC
WITH p.id as id, collect(p) AS nodes
WHERE size(nodes) > 1
RETURN [ n in nodes | {id: n.id,rels: size((n)--()) } ] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10╒══════════════════════════════════════════════════════════════════════╤═════════════╕
│"ids" │"size(nodes)"│
╞══════════════════════════════════════════════════════════════════════╪═════════════╡
│[{"id":1,"rels":1284},{"id":1,"rels":721},{"id":1,"rels":580},{"id":1,│8 │
│"rels":391},{"id":1,"rels":361},{"id":1,"rels":352},{"id":1,"rels":5},│ │
│{"id":1,"rels":1}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":0,"rels":2064},{"id":0,"rels":2059},{"id":0,"rels":1297},{"id":│8 │
│0,"rels":1124},{"id":0,"rels":995},{"id":0,"rels":928},{"id":0,"rels":│ │
│730},{"id":0,"rels":702}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":17,"rels":153},{"id":17,"rels":105},{"id":17,"rels":81},{"id":1│7 │
│7,"rels":31},{"id":17,"rels":15},{"id":17,"rels":14},{"id":17,"rels":1│ │
│}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":4,"rels":394},{"id":4,"rels":320},{"id":4,"rels":250},{"id":4,"│7 │
│rels":201},{"id":4,"rels":162},{"id":4,"rels":162},{"id":4,"rels":14}]│ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":2,"rels":514},{"id":2,"rels":329},{"id":2,"rels":318},{"id":2,"│6 │
│rels":241},{"id":2,"rels":240},{"id":2,"rels":2}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":5,"rels":487},{"id":5,"rels":378},{"id":5,"rels":242},{"id":5,"│6 │
│rels":181},{"id":5,"rels":158},{"id":5,"rels":8}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":19,"rels":153},{"id":19,"rels":120},{"id":19,"rels":84},{"id":1│6 │
│9,"rels":53},{"id":19,"rels":45},{"id":19,"rels":1}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":11,"rels":222},{"id":11,"rels":192},{"id":11,"rels":172},{"id":│5 │
│11,"rels":152},{"id":11,"rels":89}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":25,"rels":133},{"id":25,"rels":107},{"id":25,"rels":98},{"id":2│5 │
│5,"rels":15},{"id":25,"rels":2}] │ │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":43,"rels":92},{"id":43,"rels":85},{"id":43,"rels":9},{"id":43,"│5 │
│rels":5},{"id":43,"rels":1}] │ │
└──────────────────────────────────────────────────────────────────────┴─────────────┘
现在是时候删除重复项了:
MATCH (p:Person)
WITH p
ORDER BY p.id, size((p)--()) DESC
WITH p.id as id, collect(p) AS nodes
WHERE size(nodes) > 1
UNWIND nodes[1..] AS n
DETACH DELETE nDeleted 143 nodes, deleted 13806 relationships, completed after 29 ms.
现在,如果我们运行重复的查询:
MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes
WHERE size(nodes) > 1
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10(no changes, no records)
如果我们删除WHERE子句怎么办?
MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10╒═════╤═════════════╕
│"ids"│"size(nodes)"│
╞═════╪═════════════╡
│[23] │1 │
├─────┼─────────────┤
│[86] │1 │
├─────┼─────────────┤
│[77] │1 │
├─────┼─────────────┤
│[59] │1 │
├─────┼─────────────┤
│[50] │1 │
├─────┼─────────────┤
│[32] │1 │
├─────┼─────────────┤
│[41] │1 │
├─────┼─────────────┤
│[53] │1 │
├─────┼─────────────┤
│[44] │1 │
├─────┼─────────────┤
│[8] │1 │
└─────┴─────────────┘
贺拉,不再重复! 最后,让我们检查一下是否保留了我们希望保留的节点。 我们期望它的“ internalId”为175:
MATCH (p:Person {id: 1})
RETURN size((p)--()), id(p) AS internalId╒═══════════════╤════════════╕
│"size((p)--())"│"internalId"│
╞═══════════════╪════════════╡
│242 │175 │
└───────────────┴────────────┘
哪有! 关系比以前少了很多,因为这些关系中有很多是要复制我们现在已删除的节点。
如果我们想更进一步,可以将重复节点的关系“合并”到我们保留的节点上,但这是另一篇文章!
翻译自: https://www.javacodegeeks.com/2017/10/neo4j-cypher-deleting-duplicate-nodes.html