hive调优 - 竖表变横表-数据库教程-爱易网页

hive调优 - 竖表变横表

日期：2014-05-16　浏览次数：21050 次

hive调优 ------- 竖表变横表

好久没有更新博客了

来公司三个多月，别的没学到，天天写sql hive，本来很反感这样的重复性劳动，不过呢！既来之则安之

工作中遇到这样一个需求

有这这样一张表t_buy_buyer_time_hongbao_asc

用户id ?次序 ? ? ? ?购买时间

25560 ? 1 ? ? ? 1325345254 ? ??

25560 ? 2 ? ? ? 1331043510 ? ??

25560 ? 3 ? ? ? 1331999999 ? ??

25720 ? 1 ? ? ? 1320381121 ? ??

25720 ? 2 ? ? ? 1320461154 ? ??

25720 ? 3 ? ? ? 1320639271 ? ??

26840 ? 1 ? ? ? 1337214675 ? ?

26840 ? 2 ? ? ? 1337214694 ? ??

26840 ? 3 ? ? ? 1337214768 ? ?

37160 ? 1 ? ? ? 1328583075

需求是在某张表中罗列出某用户的第一次购买时间，第二次购买时间，第三次购买时间

比如

用户id ? ? ? 第一次购买 ? ? ? ? 第二次购买 ? ? ? ? ? 第三次购买

25560 ? 1325345254 ? ?1331043510 ? ??1331999999 ?

25720 ??1320381121 ? ?1320461154 ? ??1320639271 ? ??

26840 ??1337214675 ? ?1337214694 ? ??1337214768 ? ?

......

于是呢打一个很形象的比方就是 把竖表变横表的要求

使用两种hive脚本来查询hive1

select 
		tb1.uid as uid, 
		tb1.order_time as s1t_deal_time, 
		tb2.order_time as c2d_deal_time, 
		tb3.order_time as r3d_deal_time
from 
                (select * from t_buy_buyer_time_hongbao_asc where row_num=1 and pt='20121010000000')tb1  
		left outer join 
		(select * from t_buy_buyer_time_hongbao_asc where row_num=2 and pt='20121010000000')tb2  
		on tb1.uid=tb2.uid 
		left outer join 
		(select * from t_buy_buyer_time_hongbao_asc where row_num=3 and pt='20121010000000')tb3  
		on tb1.uid=tb3.uid

本hive脚本只需要一个job，执行时间376.005 s?

hive2

select 
		tb1.uid as uid, 
		s1t_deal_time, 
		c2d_deal_time, 
		r3d_deal_time
from 
		(select uid,sum(if(row_num=1,order_time,0)) as s1t_deal_time,sum(if(row_num=2,order_time,0)) as c2d_deal_time,sum(if(row_num=3,order_time,0)) as r3d_deal_time from t_buy_buyer_time_hongbao_asc where pt='20121010000000' group by uid)tb1

本hive脚本也只需要一个job，执行时间是328.733 s

额，不要嫌慢，在hadoop上跑数据的确是很慢很慢的

其实执行效率被优化了五十多s，但疑问是为什么hive1会只生成一个job呢？原因在于我们的连接条件是同一张表的同一个uid，于是呢，hive会做一个效率优化。

免责声明： 本文仅代表作者个人观点，与爱易网无关。其原创性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本站不作任何保证或承诺，请读者仅作参考，并请自行核实相关内容。

hive调优 - 竖表变横表

相关资料更多>

推荐阅读更多>