Spark执行计划分析与研究

在学习、使用和研究spark的过程中,逐渐会发现:单纯看官方文档对spark参数调优只能解决一小部分的问题,要想进一步的学习spark,进一步调优甚至在spark源码的基础上二次开发,我觉得收益最高的应该是学习执行计划了。

因此在研究spark源码之前,学习执行计划 可以对整个spark执行过程、架构设计都有一个初步的认识。

然而国内网站各大博客,都没有找到一个相关入门教程,笔者打算利用空余时间,在学习的同时做一个笔记,分享一下。

 

-------

基础篇:

Spark常用算子:https://blog.csdn.net/zyzzxycj/article/details/82706233

内存模型:https://blog.csdn.net/zyzzxycj/article/details/82740733

RDD&Partition:https://blog.csdn.net/zyzzxycj/article/details/82754411

DAG:https://blog.csdn.net/zyzzxycj/article/details/82734453

hive文件存储类型:https://blog.csdn.net/zyzzxycj/article/details/79267635

-------

Spark执行流程与原理:

https://blog.csdn.net/zyzzxycj/article/details/83788647

-------

1、先从最简单的例子开始 select *

场景:select * from test1(普通表 TextFile、一个partition)

SELECT self_entity_id entity_id
FROM DW_DTL.DTL_T_CM_OD_TP_JOIN a
JOIN(SELECT entity_id,code,nameFROM ods_shop_org.shopWHERE code IN('C8007797')) b ON(a.self_entity_id = b.entity_id)
WHERE curr_date BETWEEN '2018-07-22' AND '2018-08-22'
GROUP BY a.self_entity_id

执行计划:

SELECT bb.customer_register_id,bb.time,cc.mobile from(SELECT a.user_id,b.customerregister_id,c.mobileFROM (SELECT split(split(split(split(recored,' HTTP')[0],'seat_code=')[1],'&')[0],'_')[1] AS user_idFROM dw_ubt.meal_2dfire_comWHERE recored LIKE '%async_modify%'AND recored LIKE '%presell%'AND pt>='20180905'AND pt<='20180911'GROUP BY  split(split(split(split(recored,' HTTP')[0],'seat_code=')[1],'&')[0],'_')[1]) aLEFT JOIN (SELECT customerregister_idFROM ods_order_org.waitingorderdetailWHERE pay_status=2AND status=4AND kind=7AND pt>'20180501'GROUP BY  customerregister_id) bON a.user_id=b.customerregister_idLEFT JOIN ods_member_org.customer_register cON a.user_id=c.idWHERE b.customerregister_id is null) aaRIGHT JOIN (SELECT customer_register_id,from_unixtime(int(create_time/1000),'YYYY-MM-DD HH:MM:SS') AS timeFROM ods_presell_market_org.coupon_customerWHERE activity_type=1AND channel != 4AND status=1AND from_unixtime(int(expire_time/1000),'yyyyMMdd') > from_unixtime(unix_timestamp(),'yyyyMMdd')AND is_valid =1AND create_time <=1536681600000) bbON aa.user_id=bb.customer_register_id
LEFT JOIN ods_member_org.customer_register ccON bb.customer_register_id=cc.id
WHERE aa.user_id is null

执行计划:

create table active_shop_week ASSELECT 2 AS dim_time,t1.curr_week AS curr_date,total_active_shop,(SELECT curr_week,count(distinct entity_id) AS total_active_shopFROM totGROUP BY  curr_week) t1
LEFT JOIN (SELECT a.curr_week,count(distinct a.entity_id) AS catering_active_shopFROM tot aJOIN catering bON a.entity_id = b.entity_idGROUP BY  a.curr_week) t2ON t1.curr_week = t2.curr_week
LEFT JOIN (SELECT a.curr_week,count(distinct a.entity_id) AS retail_active_shopFROM tot aJOIN retail bON a.entity_id = b.entity_idGROUP BY  a.curr_week) t3 ON t1.curr_week = t3.curr_week
LEFT JOIN (SELECT curr_week,count(distinct entity_id) AS total_online_orderFROM online_orderGROUP BY  curr_week) o1ON t1.curr_week = o1.curr_week
LEFT JOIN (SELECT curr_week,count(distinct entity_id) AS total_online_payFROM online_payGROUP BY  curr_week) o2ON t1.curr_week = o2.curr_week
LEFT JOIN (SELECT a.curr_week,count(distinct a.entity_id) AS catering_order_activeFROM order_active aJOIN catering bON a.entity_id = b.entity_idGROUP BY  a.curr_week) d1ON t1.curr_week = d1.curr_week
LEFT JOIN (SELECT curr_week,count(distinct entity_id) AS retail_order_activeFROM (SELECT curr_date,entity_id,concat(year(date_sub(next_day(curr_date,'MON'),7)),'-',weekofyear(date_sub(next_day(curr_date,'MON'),7))) AS curr_weekFROM olap_analysis.tmp_retail_active_shopUNIONallSELECT curr_date,a.entity_id,concat(year(date_sub(next_day(curr_date,'MON'),7)),'-',weekofyear(date_sub(next_day(curr_date,'MON'),7))) AS curr_weekFROM order_active aJOIN retail bON a.entity_id = b.entity_id)uGROUP BY  curr_week ) d2ON t1.curr_week = d2.curr_week
LEFT JOIN (SELECT curr_week,count(distinct entity_id) AS catering_online_orderFROM online_orderGROUP BY  curr_week) w1ON t1.curr_week = w1.curr_week
LEFT JOIN (SELECT curr_week,sum(if(order_from = 40,1,0)) AS catering_weixin_order,sum(if(order_from = 41,1,0)) AS catering_zhifubao_orderFROM (SELECT entity_id,order_from,concat(year(date_sub(next_day(curr_date,'MON'),7)),'-',weekofyear(date_sub(next_day(curr_date,'MON'),7))) AS curr_weekFROM (SELECT from_unixtime(int(op_time/1000),'yyyy-MM-dd') AS curr_date, entity_id, order_fromFROM dw_order_org.waitingorderdetail_cleanWHERE pt = '20180925'AND from_unixtime(int(op_time/1000), 'yyyy-MM-dd')BETWEEN '2018-08-01'AND '2018-09-25') xGROUP BY  concat(year(date_sub(next_day(curr_date,'MON'),7)),'-',weekofyear(date_sub(next_day(curr_date,'MON'),7))), entity_id, order_from) ttGROUP BY  curr_week) w2ON t1.curr_week = w2.curr_weekLEFT JOIN (SELECT curr_week,count(distinct a.entity_id) AS catering_online_payFROM online_pay aJOIN catering bON a.entity_id = b.entity_idGROUP BY  curr_week) op1ON t1.curr_week = op1.curr_week
LEFT JOIN (SELECT curr_week,count(distinct a.entity_id) AS retail_online_payFROM online_pay aJOIN retail bON a.entity_id = b.entity_idGROUP BY  curr_week) op2ON t1.curr_week = op2.curr_week
LEFT JOIN (SELECT curr_week,sum(if(pay_client_type = 1,1,0)) AS catering_weixin_pay,sum(if(pay_client_type = 2,1,0)) AS catering_zhifubao_payFROM (SELECT a.entity_id,a.curr_week,a.pay_client_typeFROM paymentflow aJOIN catering bON a.entity_id = b.entity_idGROUP BY  a.entity_id,a.curr_week,a.pay_client_type) rGROUP BY  curr_week ) p1ON w1.curr_week = p1.curr_week
LEFT JOIN (SELECT curr_week,sum(if(pay_client_type = 1,1,0)) AS retail_weixin_pay,sum(if(pay_client_type = 2,1,0)) AS retail_zhifubao_payFROM (SELECT a.entity_id,a.curr_week,a.pay_client_typeFROM paymentflow aJOIN retail bON a.entity_id = b.entity_idGROUP BY  a.entity_id,a.curr_week,a.pay_client_type) rGROUP BY  curr_week ) p2ON w1.curr_week = p2.curr_week

 


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部