Can you give me an example to query parent-child hierarchy using Spark SQL recursive CTE?
ChatGPT said:
Sure! Here are clear, complete Spark SQL examples that show how to query a parent-child hierarchy using a recursive CTE.
(Supported in Spark 3.4+.)
⭐ Example Scenario
Suppose you have a table:
CREATE TABLE org (id INT,parent_id INT,name STRING );
Example data:
| id | parent_id | name |
|---|---|---|
| 1 | NULL | CEO |
| 2 | 1 | VP Eng |
| 3 | 1 | VP Sales |
| 4 | 2 | Manager A |
| 5 | 2 | Manager B |
| 6 | 4 | Engineer |
This forms a tree:
CEO├─ VP Eng│ ├─ Manager A│ │ └─ Engineer│ └─ Manager B└─ VP Sales
✅ Spark SQL Recursive CTE Example
WITH RECURSIVE org_tree AS (-- 1. Anchor: all root nodesSELECTid,parent_id,name,CAST(name AS STRING) AS path,0 AS levelFROM orgWHERE parent_id IS NULLUNION ALL-- 2. Recursive: join children to parentsSELECTc.id,c.parent_id,c.name,CONCAT(p.path, ' > ', c.name) AS path,p.level + 1 AS levelFROM org cJOIN org_tree pON c.parent_id = p.id )SELECT * FROM org_tree ORDER BY path;
✔ Output
| id | parent_id | name | path | level |
|---|---|---|---|---|
| 1 | NULL | CEO | CEO | 0 |
| 2 | 1 | VP Eng | CEO > VP Eng | 1 |
| 3 | 1 | VP Sales | CEO > VP Sales | 1 |
| 4 | 2 | Manager A | CEO > VP Eng > Manager A | 2 |
| 5 | 2 | Manager B | CEO > VP Eng > Manager B | 2 |
| 6 | 4 | Engineer | CEO > VP Eng > Manager A > Engineer | 3 |
🔍 How it works
1. Anchor query
Gets root rows (parent_id IS NULL).
2. Recursive query
Keeps joining child rows to the current “tree”.
3. Spark handles recursion
Spark automatically expands:
level 0 → level 1 → level 2 → …
Until no more child rows exist.
⭐ Bonus: Query all descendants of a specific id
WITH RECURSIVE sub AS (SELECT *FROM orgWHERE id = 2 -- start from VP EngUNION ALLSELECT c.*FROM org cJOIN sub p ON c.parent_id = p.id ) SELECT * FROM sub;
⭐ Bonus: Query all ancestors of a specific id
WITH RECURSIVE ancestors AS (SELECT *FROM orgWHERE id = 6 -- EngineerUNION ALLSELECT p.*FROM org pJOIN ancestors c ON p.id = c.parent_id ) SELECT * FROM ancestors;