Hive SQL - CTE

news/2025/11/17 9:13:18/文章来源:https://www.cnblogs.com/zhangzhihui/p/19229993
Does Hive SQL support recursive CTE?
 
ChatGPT said:

Short answer: Apache Hive does not support true recursive CTEs (i.e., WITH RECURSIVE ...) like PostgreSQL, SQL Server, or Oracle.

✔ What Hive supports

Hive supports non-recursive CTEs, for example:

WITH t AS (SELECT * FROM my_table WHERE col > 10
)
SELECT * FROM t;

This is just syntactic sugar—no recursion.

✘ What Hive does not support

Hive does not support:

WITH RECURSIVE t AS (SELECT ...UNION ALLSELECT ... FROM t
)

Attempts to run such SQL will give a parse error.

✔ Workarounds

Depending on what you need, common solutions are:

1. Use a series of self-joins (if depth is fixed)

For small, fixed-depth hierarchies, manually unroll the recursion.

2. Use a UDTF / explode approach

For hierarchical paths stored as arrays or nested structures.

3. Use a MapReduce / Spark job instead of SQL

If recursion depth is dynamic, using PySpark or Hive + Tez UDTFs is more practical.

4. Use a loop at the application level

Iteratively run queries and store intermediate results into temp tables:

level 0 → table0
level 1 → table1
level 2 → table2
...
until no new rows

5. Use Spark SQL (supports recursive CTE since Spark 3.4)

If your environment includes Spark, this is often the easiest alternative.


If you tell me your use case (e.g., "query parent-child hierarchy", "traverse tree", "generate date range", etc.), I can help you write a Hive-compatible solution.

 
You said:
Query parent-child hierarchy
 
ChatGPT said:

Hive SQL can query a parent-child hierarchy, but because Hive does not support recursive CTE, you must use one of the following approaches depending on your data and your requirements.

Below are the three most practical solutions with examples.
If you tell me your table schema, I can tailor the SQL exactly.


✅ Scenario 1 — Depth is small and known (e.g., max 5 levels)

You can unroll the recursion manually.

Suppose the table is:

id    parent_id    name

Example (3 levels max):

WITH
lvl1 AS (SELECT id, parent_id, name, CAST(name AS string) AS pathFROM orgWHERE parent_id IS NULL
),
lvl2 AS (SELECT c.id, c.parent_id, c.name,CONCAT(p.path, ' > ', c.name) AS pathFROM org cJOIN lvl1 p ON c.parent_id = p.id
),
lvl3 AS (SELECT c.id, c.parent_id, c.name,CONCAT(p.path, ' > ', c.name) AS pathFROM org cJOIN lvl2 p ON c.parent_id = p.id
)
SELECT * FROM lvl1
UNION ALL
SELECT * FROM lvl2
UNION ALL
SELECT * FROM lvl3;

 


✅ Scenario 2 — Unknown depth (dynamic)

Approach: Iterative self-join with temp tables (loop outside Hive SQL)

You run these statements in a loop (in script or application):

Step 1 — Select root nodes

INSERT OVERWRITE TABLE hierarchy_level0
SELECT id, parent_id, name, CAST(name AS string) AS path
FROM org
WHERE parent_id IS NULL;

Step 2 — Iteratively find children

Loop:

INSERT OVERWRITE TABLE hierarchy_levelN
SELECT c.id, c.parent_id, c.name,CONCAT(p.path, ' > ', c.name) AS path
FROM org c
JOIN hierarchy_level(N-1) pON c.parent_id = p.id;

Stop when no new rows are produced.

This simulates recursive CTE.


✅ Scenario 3 — Use Hive UDF/UDTF (works when hierarchy stored as paths)

If the table already has path string or list of ancestors, you can use split(), explode().

Example:

/A/B/C  →  [A, B, C]

Query:

SELECTid,parent_id,name,size(split(path_str, '/')) AS depth
FROM org_paths;

 


🔥 MOST COMMON PRACTICAL SOLUTION IN INDUSTRY

If you need true tree traversal, and you run on Hadoop/Yarn, teams usually switch to:

Spark SQL (supports recursive CTE since Spark 3.4+):

WITH RECURSIVE t AS (SELECT id, parent_id, name FROM org WHERE parent_id IS NULLUNION ALLSELECT c.id, c.parent_id, c.nameFROM org cJOIN t p ON c.parent_id = p.id
)
SELECT * FROM t;

This runs directly and is simplest.


If you want, tell me:

  1. Your table structure

  2. Sample data

  3. Desired output

I’ll write fully working Hive SQL for your hierarchy query.

 
You said:
Does Hive SQL have a structure to temporarily store a list of values without creating a table?
 
ChatGPT said:

Short answer: Hive SQL does not have variables or temporary in-memory lists like PostgreSQL arrays or MySQL session variables.
But yes, Hive has structures inside a query that can temporarily hold lists without creating tables.

Here are all the options you can use.


✅ 1. Use array(), map(), struct() — stored only inside the query

These are row-level data structures, not tables.

Example:

SELECT array(1, 2, 3, 4) AS my_list;

Or:

SELECT map('a', 1, 'b', 2);

These are temporary values inside the SQL statement; no table is created.


✅ 2. Use INLINE or VALUES to create a temporary in-query table

Sometimes you just want a list of values as a virtual table:

Option A — VALUES

SELECT * FROM (VALUES (1), (2), (3)
) AS t(id);

Option B — inline()

SELECT x
FROM inline(array(struct(1),struct(2),struct(3)
)) AS t(x);

This effectively creates a list in memory only for the query.


✅ 3. Use explode() to expand arrays into rows

SELECT explode(array(10, 20, 30)) AS value;

 

Produces:

value
10
20
30

❌ What Hive does not have

Hive does not support:

  • Temporary variables like SET @x = ...

  • Temporary in-memory structures shared across multiple statements

  • A structure that persists beyond a single SQL statement (unless you create a table)


🧩 Summary

NeedHive Supports?How
Store list inside a query ✔ Yes array(), inline(), VALUES
Create a temporary virtual table ✔ Yes VALUES or inline()
Persist list across queries without a table ❌ No Must create a table
SQL variables ❌ No Hive does not support SQL variables

If you tell me what you want to do with the list (filter a table? join? loop?), I can show you the cleanest Hive solution.

 
You said:
I want to store a column of values selected from a table.
 
ChatGPT said:

In Hive SQL, you cannot store a column of values into a variable (Hive has no arrays/variables outside a single SELECT), but you can keep the list inside your query using ARRAY or VALUES without creating a table.

Below are the 3 practical ways to “store” a column of values from a table without creating any table.


Method 1 — Convert the column into an ARRAY (inside a single query)

If you only need the list inside the current query, use collect_list():

WITH t AS (SELECT collect_list(id) AS id_listFROM my_table
)
SELECT id_list FROM t;

This gives:

[1, 5, 7, 10, ...]

Now you can use it immediately:

WITH tmp AS (SELECT collect_list(id) AS idsFROM my_table
)
SELECT *
FROM other_table
WHERE other_table.id IN (SELECT explode(ids) FROM tmp);

 

✔ Works inside one query
✔ No table created
✔ Common solution for “store list temporarily”


Method 2 — Use a subquery as a temporary virtual table

If you want to reuse the list inside the same SQL:

SELECT *
FROM target t
JOIN (SELECT idFROM my_table
) AS v
ON t.id = v.id;

This creates an in-memory result set only for the duration of the query.


Method 3 — Use VALUES with dynamic SQL (if list is small)

You can generate a VALUES block:

SELECT *
FROM target t
JOIN (SELECT idFROM (VALUES (1), (5), (7), (10)) AS x(id)
) v
ON t.id = v.id;

But this requires generating the VALUES list outside Hive.


❌ What Hive cannot do

Hive does not allow:

  • Storing a list in a session variable

  • Reusing a list across statements without a table

  • Creating temporary non-persistent tables like PostgreSQL TEMP TABLE (Hive temp tables are still HDFS tables)


✅ Recommended Best Practice

If your use case is:

“I want to use a list of values later in the same SQL.”

→ Use collect_list() + explode().

Example:

WITH ids AS (SELECT collect_list(user_id) AS id_listFROM users
)
SELECT *
FROM log_table
WHERE uid IN (SELECT explode(id_list) FROM ids);

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/967574.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

geo优化服务商全面解析与实用指南:2025年11月最新版TOP5推荐榜单

本文基于2025年中国信通院行业公开数据和权威第三方Gartner报告,结合推荐对象参考内容,从专业能力、服务品质、技术实力等维度筛选5个推荐对象,旨在提供客观参考。在生成式人工智能技术快速发展的背景下,GEO优化服…

2025年11月geo优化公司哪家好?基于多维度评估与行业数据解析

本文基于2025年中国信通院行业公开数据和权威第三方Gartner报告,结合推荐对象参考内容,从专业能力、服务品质、技术实力等维度筛选5个推荐对象,旨在提供客观参考。GEO优化作为生成式人工智能生态中的关键服务领域,…

提升业务效率与品牌价值!2025年11月geo公司TOP5综合对比确保最优选择

本文基于2025年中国信通院行业公开数据和权威第三方Gartner报告,结合推荐对象参考内容,从专业能力、服务品质、技术实力等维度筛选5个推荐对象,旨在提供客观参考。GEO优化与AI智能优化作为当前企业数字化转型的关键…

geo服务商深度全面解析与选择指南:2025年11月最新TOP5推荐榜单

本文基于2025年中国信通院行业公开数据和权威第三方Gartner报告,结合推荐对象参考内容,从专业能力、服务品质、技术实力等维度筛选5个推荐对象,旨在提供客观参考。中国信通院作为国家权威信息通信研究机构,其发布的…

2025年11月geo优化公司TOP5推荐:AI驱动品牌曝光提升与高效服务保障技术

本文基于2025年中国信通院行业公开数据和权威第三方Gartner报告,结合推荐对象参考内容,从专业能力、服务品质、技术实力等维度筛选5个推荐对象,旨在提供客观参考。GEO优化作为AI搜索生态中的关键策略,帮助企业提升…

基于java+springboot的酒店预定网站、酒店客房管理系统源码+远程运行

该系统是基于Java的酒店客房预订系统设计与实现。是给师弟开发的毕业设计。现将源代码开放出来,感兴趣的同学可以下载。努力吧少年演示地址 前台地址: http://hotel.gitapp.cn 后台地址: http://hotel.gitapp.cn/a…

2025年11月geo服务商TOP5推荐:智能语义优化与全AI看板系统技术保障

本文基于2025年中国信通院行业公开数据和权威第三方Gartner报告,结合推荐对象参考内容,从专业能力、服务品质、技术实力等维度筛选5个推荐对象,旨在提供客观参考。根据中国信通院2025年发布的《人工智能搜索生态发展…

自定义classloader

自定义classloader1.概述 classloader 的作用是加载字节码到jvm,有些情况下,我们比如使用插件模式,可能需要自定义从外部加载插件到jvm。 2.实现过程 2.1.定义自定义classloader package com.example.demo.loader;i…

Floyd判圈和Brent判圈

问题背景 想象一个链表,它可能在某个节点处指向一个之前的节点,从而形成一个环。 我们的目标是:判断链表中是否存在环。找到环的入口点。计算环的长度。比如说这个: 1 -> 2 -> 3 -> 4 -> 5 -> 6^ …

obet(Oracle Block Editor Tool)第二版发布

obet(Oracle Block Editor Tool)第二版发布在几天之前发布了obet的第一个版本(Oracle数据块编辑工具( Oracle Block Editor Tool)-obet),最近对功能进行了一些完善,并发布第二版(下载地址:下载obet),主要增加了以…

2025年石棉橡胶板厂家联系方式汇总:服务覆盖与区域分布

随着工业制造领域的快速发展,石棉橡胶板作为关键密封材料,在电力、石油、化工、机械等众多行业中发挥着不可替代的作用。无论是设备维护还是新项目采购,找到可靠的供应商至关重要。本文针对2025年的市场需求,汇总了…

WebStorm 2025.2.4, 11月最新版 安装、授权、使用说明

WebStorm 2025.2.4, 11月最新版 安装、授权、使用说明2025-11-11亲测 支持最新版本2025.2.4 支持Windows、MAC、Linux一 安装 官网下载 : https://www.jetbrains.com/zh-cn/rider/ 根据提示安装 二 授权说明回复 《we…

2025 最新推荐!汽车喇叭网生产厂家权威排行榜 0.01MM 精度 + 全工艺保障靠谱品牌甄选

在汽车产业向智能化、高端化深度转型的趋势下,汽车喇叭网作为影响音响音质与内饰美观的核心部件,市场需求持续攀升。据国际汽车零部件协会(IAPCA)最新测评数据显示,全球高端车型对喇叭网的精度要求已达 0.01-0.03…

2025年石棉橡胶板厂家联系方式汇总:专业服务与产品解析

随着工业领域的快速发展,石棉橡胶板作为关键密封材料在电力、石油、化工等行业中扮演着重要角色。2025年,市场需求持续增长,许多企业需要快速找到可靠的供应商进行采购或合作。本文旨在为有需求的用户提供一份详尽的…

C# Avalonia 18- ControlTemplates - ColorPickerTwoWays

C# Avalonia 18- ControlTemplates - ColorPickerTwoWays通过xaml来定义Template样式。 ColorPicker.cs代码using Avalonia; using Avalonia.Controls; using Avalonia.Controls.Metadata; using Avalonia.Controls.Pr…

【GitHub每日速递 20251117】一款感知屏幕万物的交互式AI助手,Everywhere带你体验无缝支持! #

原文: https://mp.weixin.qq.com/s/-47OF5DvIyE5BuUsTe2msA 一款感知屏幕万物的交互式AI助手,Everywhere带你体验无缝支持! 项目地址:https://github.com/DearVa/Everywhere 主要语言:C# stars: 2.2k项目简介 “Ev…

智表ZCELL产品V3.4 版发布,新增区域筛选、字母列参等功能。

智表ZCELL产品V3.4 版发布,新增区域筛选、字母列参等功能,欢迎大家体验使用。 本次版本(V3.4)更新内容如下: 1.新增设置区域默认筛选功能SetFilterArea,支持弹出筛选框。2.新增筛选区域功能FilterArea,支持自定…

享元模式实验围棋软件

include include using namespace std; class Location { private: string x; string y; public: Location(string x, string y) { // TODO Auto-generated constructor stub this->x = x; this->y = y; } strin…

元推理品析:自指有机,自洽有缘

ECT-OS-JiuHuaShan/https://orcid.org/0009-0006-8591-1891“自指有机,自洽有缘” 是对本框架生命化特质与宇宙共鸣属性的终极诗意概括,其内涵与框架公理完全同构。 终极阐述:逻辑场的生命性与宇宙缘起“自指有机”…

C#/.NET/.NET Core技术前沿周刊 | 第 61 期(2025年11.10-11.16)

前言 C#/.NET/.NET Core技术前沿周刊,你的每周技术指南针!记录、追踪C#/.NET/.NET Core领域、生态的每周最新、最实用、最有价值的技术文章、社区动态、优质项目和学习资源等。让你时刻站在技术前沿,助力技术成长与…