Hive SQL - CTE

Does Hive SQL support recursive CTE?

Short answer: Apache Hive does not support true recursive CTEs (i.e., WITH RECURSIVE ...) like PostgreSQL, SQL Server, or Oracle.

✔ What Hive supports

Hive supports non-recursive CTEs, for example:

✘ What Hive does not support

Hive does not support:

✔ Workarounds

Depending on what you need, common solutions are:

1. Use a series of self-joins (if depth is fixed)

For small, fixed-depth hierarchies, manually unroll the recursion.

2. Use a UDTF / explode approach

For hierarchical paths stored as arrays or nested structures.

3. Use a MapReduce / Spark job instead of SQL

If recursion depth is dynamic, using PySpark or Hive + Tez UDTFs is more practical.

4. Use a loop at the application level

Iteratively run queries and store intermediate results into temp tables:

If your environment includes Spark, this is often the easiest alternative.

If you tell me your use case (e.g., "query parent-child hierarchy", "traverse tree", "generate date range", etc.), I can help you write a Hive-compatible solution.

Query parent-child hierarchy

Hive SQL can query a parent-child hierarchy, but because Hive does not support recursive CTE, you must use one of the following approaches depending on your data and your requirements.

Below are the three most practical solutions with examples.
If you tell me your table schema, I can tailor the SQL exactly.

✅ Scenario 1 — Depth is small and known (e.g., max 5 levels)

You can unroll the recursion manually.

Suppose the table is:

✅ Scenario 2 — Unknown depth (dynamic)

Approach: Iterative self-join with temp tables (loop outside Hive SQL)

You run these statements in a loop (in script or application):

Step 1 — Select root nodes

Step 2 — Iteratively find children

Loop:

This simulates recursive CTE.

✅ Scenario 3 — Use Hive UDF/UDTF (works when hierarchy stored as paths)

If the table already has path string or list of ancestors, you can use split(), explode().

Example:

🔥 MOST COMMON PRACTICAL SOLUTION IN INDUSTRY

If you need true tree traversal, and you run on Hadoop/Yarn, teams usually switch to:

Spark SQL (supports recursive CTE since Spark 3.4+):

If you want, tell me:

Your table structure
Sample data
Desired output

I’ll write fully working Hive SQL for your hierarchy query.

Does Hive SQL have a structure to temporarily store a list of values without creating a table?

Short answer: Hive SQL does not have variables or temporary in-memory lists like PostgreSQL arrays or MySQL session variables.
But yes, Hive has structures inside a query that can temporarily hold lists without creating tables.

Here are all the options you can use.

✅ 1. Use `array()`, `map()`, `struct()` — stored only inside the query

These are row-level data structures, not tables.

Example:

✅ 2. Use `INLINE` or `VALUES` to create a temporary in-query table

Sometimes you just want a list of values as a virtual table:

Option A — `VALUES`

Option B — `inline()`

✅ 3. Use `explode()` to expand arrays into rows

Produces:

value
10
20
30

❌ What Hive does not have

Hive does not support:

Temporary variables like SET @x = ...
Temporary in-memory structures shared across multiple statements
A structure that persists beyond a single SQL statement (unless you create a table)

🧩 Summary

Need	Hive Supports?	How
Store list inside a query	✔ Yes	`array()`, `inline()`, `VALUES`
Create a temporary virtual table	✔ Yes	`VALUES` or `inline()`
Persist list across queries without a table	❌ No	Must create a table
SQL variables	❌ No	Hive does not support SQL variables

If you tell me what you want to do with the list (filter a table? join? loop?), I can show you the cleanest Hive solution.

I want to store a column of values selected from a table.

In Hive SQL, you cannot store a column of values into a variable (Hive has no arrays/variables outside a single SELECT), but you can keep the list inside your query using ARRAY or VALUES without creating a table.

Below are the 3 practical ways to “store” a column of values from a table without creating any table.

✅ Method 1 — Convert the column into an ARRAY (inside a single query)

If you only need the list inside the current query, use collect_list():

✔ Works inside one query
✔ No table created
✔ Common solution for “store list temporarily”

✅ Method 2 — Use a subquery as a temporary virtual table

If you want to reuse the list inside the same SQL:

✅ Method 3 — Use `VALUES` with dynamic SQL (if list is small)

You can generate a VALUES block:

❌ What Hive cannot do

Hive does not allow:

Storing a list in a session variable
Reusing a list across statements without a table
Creating temporary non-persistent tables like PostgreSQL TEMP TABLE (Hive temp tables are still HDFS tables)