大数据项目之电商数仓(脚本篇)
我的脚本存放目录 /user/local/bin/
注意:
有些脚本涉及到用户和密码记得要改成自己的!路径有的也需要留意,这些都是我亲测跑过的脚本。
大数据项目之电商数仓(脚本篇)
- 1.xsync集群分发脚本
- 2.zookeeper集群起动关闭脚本
- 3.日志启动脚本
- 4.时间同步脚本(用于造假数据)
- 5.集群各个节点查看状态脚本
- 6.日志采集 Flume 启动停止脚本
- 7.Kafka 集群启动停止脚本
- 8.日志消费 Flume 启动停止脚本
- 9.采集通道启动/停止脚本
- 10.mysql业务数据通过sqoop导入hdfs脚本
- 11.数仓搭建-ODS 层(用户行为数据)加载数据脚本
- 12.数仓搭建-ODS 层(业务数据)加载数据脚本
- 13.数仓搭建-DWD 层启动表加载数据脚本
- 14.数仓搭建-DWD 层数据解析脚本
- 15.数仓搭建-DWD 层事件表加载数据脚本
- 16.数仓搭建-DWD 层数据导入脚本
- 17.数仓搭建-DWS 层数据导入脚本
- 18.数仓搭建-DWT 层数据导入脚本
- 19.数仓搭建-ADS 层会员主题信息导入脚本
- 20.数仓搭建-ADS 层导入脚本
- 21.hdfs_to_mysql传输脚本
1.xsync集群分发脚本
参考这篇博客安装
https://blog.csdn.net/yhblog/article/details/84066218
#!/bin/bash
#获取输入参数的个数.没有参数直接退出
pcount=$#
if((pcount==0));then
echo no args;
exit;
fi#2.获取文件名称
p1=$1
fname=`basename $p1`
echo fname=$fname#3.获取上级目录到绝对路径
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir#4.获取当前用户名称
user=`whoami`#5.循环
for ((host=102; host<105;host++));doecho ----------------hadoop$host------------------rsync -av $pdir/$fname $user@hadoop$host:$pdir
done
2.zookeeper集群起动关闭脚本
#!/bin/bashcase $1 in
"start"){for i in hadoop102 hadoop103 hadoop104dossh $i "/opt/module/zookeeper-3.4.10/bin/zkServer.sh start"done
};;
"stop"){for i in hadoop102 hadoop103 hadoop104dossh $i "/opt/module/zookeeper-3.4.10/bin/zkServer.sh stop"done
};;
"status"){for i in hadoop102 hadoop103 hadoop104dossh $i "/opt/module/zookeeper-3.4.10/bin/zkServer.sh status"done
};;
esac
3.日志启动脚本
#!/bin/bashfor i in hadoop102 hadoop103
doecho ---------$i 生成日志----------ssh $i "java -jar /opt/module/log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar $1 $2 >/dev/null 2>&1 &"
done
4.时间同步脚本(用于造假数据)
#!/bin/bashfor i in hadoop102 hadoop103 hadoop104
doecho ---------$i 同步时间----------ssh -t $i "sudo data -s $1"
done
5.集群各个节点查看状态脚本
#!/bin/bashfor i in hadoop102 hadoop103 hadoop104
doecho ---------$i ----------ssh $i "$*"
done
6.日志采集 Flume 启动停止脚本
#!/bin/bash
case $1 in
"start"){ for i in hadoop102 hadoop103 do echo " --------启动 $i 采集 flume-------" ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/file-flume-kafka.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/test1 2>&1 &"done
};;
"stop"){for i in hadoop102 hadoop103 do echo " --------停止 $i 采集 flume-------" ssh $i "ps -ef | grep file-flume-kafka | grep -v grep |awk '{print \$2}' | xargs kill" done
};;
esac
7.Kafka 集群启动停止脚本
#! /bin/bash case $1 in
"start"){ for i in hadoop102 hadoop103 hadoop104 do echo " --------启动 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties " done
};;
"stop"){for i in hadoop102 hadoop103 hadoop104 do echo " --------停止 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop" done
};;
esac
8.日志消费 Flume 启动停止脚本
#! /bin/bashcase $1 in
"start"){ for i in hadoop104 do echo " --------启动 $i 消费 flume-------" ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/kafka-flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/log.txt 2>&1 &"done
};;
"stop"){for i in hadoop104 do echo " --------停止 $i 消费 flume-------" ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs kill" done
};;
esac
9.采集通道启动/停止脚本
#! /bin/bashcase $1 in
"start"){ echo " -------- 启动 集群 -------" echo " -------- 启动 hadoop 集群 -------" /opt/module/hadoop-2.7.2/sbin/start-dfs.sh ssh hadoop103 "/opt/module/hadoop-2.7.2/sbin/start-yarn.sh" #启动 Zookeeper 集群 zk.sh start sleep 4s; #启动 Flume 采集集群 f1.sh start #启动 Kafka 采集集群 kf.sh start sleep 6s; #启动 Flume 消费集群 f2.sh start
};;
"stop"){ echo " -------- 停止 集群 -------"#停止 Flume 消费集群 f2.sh stop #停止 Kafka 采集集群 kf.sh stop sleep 6s; #停止 Flume 采集集群 f1.sh stop #停止 Zookeeper 集群 zk.sh stop echo " -------- 停止 hadoop 集群 -------" ssh hadoop103 "/opt/module/hadoop-2.7.2/sbin/stop-yarn.sh" /opt/module/hadoop-2.7.2/sbin/stop-dfs.sh
};;
esac
10.mysql业务数据通过sqoop导入hdfs脚本
#! /bin/bash sqoop=/opt/module/sqoop/bin/sqoop
do_date=`date -d '-1 day' +%F` if [[ -n "$2" ]]; then do_date=$2
fiimport_data(){
$sqoop import --connect jdbc:mysql://hadoop102:3306/gmall --username root --password 123456 --target-dir /origin_data/gmall/db/$1/$do_date --delete-target-dir --query "$2 and \$CONDITIONS" --num-mappers 1 --fields-terminated-by '\t' --compress --compression-codec lzop --null-string '\\N' --null-non-string '\\N'hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/gmall/db/$1/$do_date
}import_order_info(){ import_data order_info "select id, final_total_amount, order_status, user_id, out_trade_no, create_time, operate_time, province_id, benefit_reduce_amount, original_total_amount, feight_fee from order_info where (date_format(create_time,'%Y-%m-%d')='$do_date' or date_format(operate_time,'%Y-%m-%d')='$do_date')"
}import_coupon_use(){ import_data coupon_use "select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time from coupon_use where (date_format(get_time,'%Y-%m-%d')='$do_date' or date_format(using_time,'%Y-%m-%d')='$do_date' or date_format(used_time,'%Y-%m-%d')='$do_date')"
}import_order_status_log(){ import_data order_status_log "select id, order_id, order_status, operate_time from order_status_log where
date_format(operate_time,'%Y-%m-%d')='$do_date'"
}import_activity_order(){ import_data activity_order "select id, activity_id, order_id, create_time from activity_order where
date_format(create_time,'%Y-%m-%d')='$do_date'"
}import_user_info(){ import_data "user_info" "select id, name, birthday,gender, email, user_level, create_time, operate_time from user_info where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$do_date')"
}import_order_detail(){ import_data order_detail "select od.id, order_id, user_id, sku_id, sku_name, order_price, sku_num, od.create_time from order_detail od join order_info oi on od.order_id=oi.id where
DATE_FORMAT(od.create_time,'%Y-%m-%d')='$do_date'"
}import_payment_info(){ import_data "payment_info" "select id, out_trade_no, order_id, user_id, alipay_trade_no, total_amount, subject, payment_type, payment_time from payment_info where
DATE_FORMAT(payment_time,'%Y-%m-%d')='$do_date'"
}import_comment_info(){ import_data comment_info "select id, user_id, sku_id, spu_id, order_id, appraise, comment_txt, create_time from comment_info where date_format(create_time,'%Y-%m-%d')='$do_date'"
}import_order_refund_info(){ import_data order_refund_info "select id, user_id, order_id, sku_id,refund_type, refund_num, refund_amount, refund_reason_type, create_time from order_refund_info where
date_format(create_time,'%Y-%m-%d')='$do_date'"
}import_sku_info(){ import_data sku_info "select id, spu_id, price, sku_name, sku_desc, weight, tm_id, category3_id, create_time from sku_info where 1=1"
}import_base_category1(){ import_data "base_category1" "select id, name from base_category1 where 1=1"
}import_base_category2(){ import_data "base_category2" "select id, name, category1_id from base_category2 where 1=1"
}import_base_category3(){ import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1"
}import_base_province(){ import_data base_province "select id, name, region_id, area_code, iso_code from base_province where 1=1"
}import_base_region(){ import_data base_region "select id, region_name from base_regionwhere 1=1"
}import_base_trademark(){ import_data base_trademark "select tm_id, tm_name from base_trademark where 1=1"
}import_spu_info(){ import_data spu_info "select id, spu_name, category3_id, tm_id from spu_info where 1=1"
}import_favor_info(){ import_data favor_info "select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from favor_info where 1=1"
}import_cart_info(){ import_data cart_info "select id, user_id, sku_id, cart_price, sku_num, sku_name, create_time, operate_time, is_ordered, order_time from cart_info where 1=1"
}import_coupon_info(){ import_data coupon_info "select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, spu_id,tm_id, category3_id, limit_num, operate_time, expire_time from coupon_info where 1=1"
}import_activity_info(){ import_data activity_info "select id, activity_name, activity_type, start_time, end_time, create_time from activity_info where 1=1"
}import_activity_rule(){ import_data activity_rule "select id, activity_id, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from activity_rule where 1=1"
}import_base_dic(){ import_data base_dic "select dic_code, dic_name, parent_code, create_time, operate_time from base_dic where 1=1"
}case $1 in "order_info") import_order_info
;;"base_category1") import_base_category1
;;"base_category2") import_base_category2
;;"base_category3") import_base_category3
;;"order_detail") import_order_detail
;;"sku_info") import_sku_info
;;"user_info") import_user_info
;;"payment_info") import_payment_info
;;"base_province") import_base_province
;;"base_region") import_base_region
;;"base_trademark") import_base_trademark
;;"activity_info") import_activity_info
;;"activity_order") import_activity_order
;;"cart_info") import_cart_info
;;"comment_info") import_comment_info
;;"coupon_info") import_coupon_info
;;"coupon_use") import_coupon_use
;;"favor_info") import_favor_info
;;"order_refund_info") import_order_refund_info
;;"order_status_log") import_order_status_log
;;"spu_info") import_spu_info
;; "activity_rule") import_activity_rule
;;"base_dic") import_base_dic
;;
"first") import_base_category1 import_base_category2 import_base_category3 import_order_info import_order_detail import_sku_info import_user_info import_payment_info import_base_provinceimport_base_region import_base_trademark import_activity_info import_activity_order import_cart_info import_comment_info import_coupon_use import_coupon_info import_favor_info import_order_refund_info import_order_status_log import_spu_info import_activity_rule import_base_dic
;;
"all") import_base_category1 import_base_category2 import_base_category3 import_order_info import_order_detail import_sku_info import_user_info import_payment_info import_base_trademark import_activity_info import_activity_order import_cart_info import_comment_info import_coupon_use import_coupon_info import_favor_info import_order_refund_info import_order_status_log import_spu_info import_activity_rule import_base_dic
;;
esac
11.数仓搭建-ODS 层(用户行为数据)加载数据脚本
#!/bin/bash # 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fiecho "===日志日期为 $do_date==="
sql="
load data inpath '/origin_data/gmall/log/topic_start/$do_date' overwrite
into table ${APP}.ods_start_log partition(dt='$do_date'); load data inpath '/origin_data/gmall/log/topic_event/$do_date' overwrite
into table ${APP}.ods_event_log partition(dt='$do_date');
"$hive -e "$sql" hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_start_log/dt=$do_date
hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_event_log/dt=$do_date
12.数仓搭建-ODS 层(业务数据)加载数据脚本
#!/bin/bash APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;then do_date=$2
elsedo_date=`date -d "-1 day" +%F`
fisql1="
load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/activity_order/$do_date' OVERWRITE into table ${APP}.ods_activity_order partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');
load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date');
"sql2="
load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;
load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;
"
case $1 in
"first"){ $hive -e "$sql1" $hive -e "$sql2"
};;
"all"){ $hive -e "$sql1"
};;
esac
13.数仓搭建-DWD 层启动表加载数据脚本
#!/bin/bash # 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fisql="
set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table "$APP".dwd_start_log
PARTITION (dt='$do_date')
select get_json_object(line,'$.mid') mid_id, get_json_object(line,'$.uid') user_id, get_json_object(line,'$.vc') version_code, get_json_object(line,'$.vn') version_name, get_json_object(line,'$.l') lang, get_json_object(line,'$.sr') source, get_json_object(line,'$.os') os, get_json_object(line,'$.ar') area, get_json_object(line,'$.md') model, get_json_object(line,'$.ba') brand, get_json_object(line,'$.sv') sdk_version, get_json_object(line,'$.g') gmail, get_json_object(line,'$.hw') height_width, get_json_object(line,'$.t') app_time, get_json_object(line,'$.nw') network, get_json_object(line,'$.ln') lng, get_json_object(line,'$.la') lat, get_json_object(line,'$.entry') entry, get_json_object(line,'$.open_ad_type') open_ad_type, get_json_object(line,'$.action') action, get_json_object(line,'$.loading_time') loading_time, get_json_object(line,'$.detail') detail, get_json_object(line,'$.extend1') extend1
from "$APP".ods_start_log
where dt='$do_date';
"$hive -e "$sql"
14.数仓搭建-DWD 层数据解析脚本
#!/bin/bash# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fi
sql="
use gmall;
insert overwrite table "$APP".dwd_base_event_log partition(dt='$do_date') select base_analizer(line,'mid') as mid_id, base_analizer(line,'uid') as user_id, base_analizer(line,'vc') as version_code, base_analizer(line,'vn') as version_name, base_analizer(line,'l') as lang, base_analizer(line,'sr') as source, base_analizer(line,'os') as os, base_analizer(line,'ar') as area,base_analizer(line,'md') as model, base_analizer(line,'ba') as brand, base_analizer(line,'sv') as sdk_version, base_analizer(line,'g') as gmail, base_analizer(line,'hw') as height_width, base_analizer(line,'t') as app_time, base_analizer(line,'nw') as network, base_analizer(line,'ln') as lng, base_analizer(line,'la') as lat, event_name, event_json, base_analizer(line,'st') as server_time from "$APP".ods_event_log lateral view flat_analizer(base_analizer(line,'et')) tem_flat as event_name,event_json where dt='$do_date' and base_analizer(line,'et')<>'';
"
$hive -e "$sql"
15.数仓搭建-DWD 层事件表加载数据脚本
#!/bin/bash# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table "$APP".dwd_display_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.action') action, get_json_object(event_json,'$.kv.goodsid') goodsid, get_json_object(event_json,'$.kv.place') place, get_json_object(event_json,'$.kv.extend1') extend1, get_json_object(event_json,'$.kv.category') category, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='display';
insert overwrite table "$APP".dwd_newsdetail_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.entry') entry, get_json_object(event_json,'$.kv.action') action, get_json_object(event_json,'$.kv.goodsid') goodsid, get_json_object(event_json,'$.kv.showtype') showtype, get_json_object(event_json,'$.kv.news_staytime') news_staytime, get_json_object(event_json,'$.kv.loading_time') loading_time, get_json_object(event_json,'$.kv.type1') type1, get_json_object(event_json,'$.kv.category') category, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='newsdetail';
insert overwrite table "$APP".dwd_loading_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.action') action, get_json_object(event_json,'$.kv.loading_time') loading_time, get_json_object(event_json,'$.kv.loading_way') loading_way, get_json_object(event_json,'$.kv.extend1') extend1, get_json_object(event_json,'$.kv.extend2') extend2, get_json_object(event_json,'$.kv.type') type, get_json_object(event_json,'$.kv.type1') type1, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='loading';
insert overwrite table "$APP".dwd_ad_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.entry') entry, get_json_object(event_json,'$.kv.action') action, get_json_object(event_json,'$.kv.contentType') contentType, get_json_object(event_json,'$.kv.displayMills') displayMills, get_json_object(event_json,'$.kv.itemId') itemId, get_json_object(event_json,'$.kv.activityId') activityId, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='ad';
insert overwrite table "$APP".dwd_notification_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.action') action, get_json_object(event_json,'$.kv.noti_type') noti_type, get_json_object(event_json,'$.kv.ap_time') ap_time, get_json_object(event_json,'$.kv.content') content, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='notification';
insert overwrite table "$APP".dwd_active_background_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.active_source') active_source, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='active_background';
insert overwrite table "$APP".dwd_comment_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.comment_id') comment_id, get_json_object(event_json,'$.kv.userid') userid, get_json_object(event_json,'$.kv.p_comment_id') p_comment_id, get_json_object(event_json,'$.kv.content') content, get_json_object(event_json,'$.kv.addtime') addtime, get_json_object(event_json,'$.kv.other_id') other_id, get_json_object(event_json,'$.kv.praise_count') praise_count, get_json_object(event_json,'$.kv.reply_count') reply_count, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='comment';
insert overwrite table "$APP".dwd_favorites_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.id') id, get_json_object(event_json,'$.kv.course_id') course_id, get_json_object(event_json,'$.kv.userid') userid, get_json_object(event_json,'$.kv.add_time') add_time, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='favorites';
insert overwrite table "$APP".dwd_praise_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.id') id, get_json_object(event_json,'$.kv.userid') userid, get_json_object(event_json,'$.kv.target_id') target_id, get_json_object(event_json,'$.kv.type') type, get_json_object(event_json,'$.kv.add_time') add_time, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='praise';
insert overwrite table "$APP".dwd_error_log PARTITION (dt='$do_date') select mid_id, user_id, version_code, version_name, lang, source, os, area, model, brand, sdk_version, gmail, height_width, app_time, network, lng, lat, get_json_object(event_json,'$.kv.errorBrief') errorBrief, get_json_object(event_json,'$.kv.errorDetail') errorDetail, server_time from "$APP".dwd_base_event_log where dt='$do_date' and event_name='error';
"
$hive -e "$sql"
16.数仓搭建-DWD 层数据导入脚本
#!/bin/bash APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$2" ] ;then do_date=$2
elsedo_date=`date -d "-1 day" +%F`
fi
sql1="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table ${APP}.dwd_dim_sku_info partition(dt='$do_date') select sku.id, sku.spu_id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.tm_id, ob.tm_name, sku.category3_id, c2.id category2_id, c1.id category1_id, c3.name category3_name, c2.name category2_name, c1.name category1_name, spu.spu_name, sku.create_time from ( select * from ${APP}.ods_sku_info where dt='$do_date' )sku join ( select * from ${APP}.ods_base_trademark where dt='$do_date' )ob on sku.tm_id=ob.tm_id join ( select * from ${APP}.ods_spu_info where dt='$do_date' )spu on spu.id = sku.spu_id join ( select * from ${APP}.ods_base_category3 where dt='$do_date' )c3 on sku.category3_id=c3.id join ( select * from ${APP}.ods_base_category2 where dt='$do_date' )c2 on c3.category2_id=c2.id join ( select * from ${APP}.ods_base_category1 where dt='$do_date' )c1 on c2.category1_id=c1.id;
insert overwrite table ${APP}.dwd_dim_coupon_info partition(dt='$do_date') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, spu_id, tm_id, category3_id, limit_num, operate_time, expire_time from ${APP}.ods_coupon_info where dt='$do_date';
insert overwrite table ${APP}.dwd_dim_activity_info partition(dt='$do_date') select info.id, info.activity_name, info.activity_type, rule.condition_amount, rule.condition_num, rule.benefit_amount, rule.benefit_discount, rule.benefit_level, info.start_time, info.end_time, info.create_time from ( select * from ${APP}.ods_activity_info where dt='$do_date' )info left join ( select * from ${APP}.ods_activity_rule where dt='$do_date' )rule on info.id = rule.activity_id; insert overwrite table ${APP}.dwd_fact_order_detail partition(dt='$do_date') select od.id, od.order_id, od.user_id, od.sku_id, od.sku_name, od.order_price, od.sku_num, od.create_time, oi.province_id, od.order_price*od.sku_num from ( select * from ${APP}.ods_order_detail where dt='$do_date' ) od join ( select * from ${APP}.ods_order_info where dt='$do_date' ) oi on od.order_id=oi.id; insert overwrite table ${APP}.dwd_fact_payment_info partition(dt='$do_date') select pi.id, pi.out_trade_no, pi.order_id, pi.user_id, pi.alipay_trade_no, pi.total_amount, pi.subject, pi.payment_type, pi.payment_time, oi.province_id from ( select * from ${APP}.ods_payment_info where dt='$do_date' )pi join ( select id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on pi.order_id = oi.id; insert overwrite table ${APP}.dwd_fact_order_refund_info partition(dt='$do_date') select id, user_id, order_id, sku_id, refund_type, refund_num, refund_amount, refund_reason_type, create_time from ${APP}.ods_order_refund_info where dt='$do_date';
insert overwrite table ${APP}.dwd_fact_comment_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, order_id, appraise, create_time from ${APP}.ods_comment_info where dt='$do_date';
insert overwrite table ${APP}.dwd_fact_cart_info partition(dt='$do_date') select id, user_id, sku_id, cart_price, sku_num, sku_name, create_time, operate_time, is_ordered, order_time from ${APP}.ods_cart_info where dt='$do_date';
insert overwrite table ${APP}.dwd_fact_favor_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ${APP}.ods_favor_info where dt='$do_date';
insert overwrite table ${APP}.dwd_fact_coupon_use partition(dt) select if(new.id is null,old.id,new.id), if(new.coupon_id is null,old.coupon_id,new.coupon_id), if(new.user_id is null,old.user_id,new.user_id), if(new.order_id is null,old.order_id,new.order_id), if(new.coupon_status is null,old.coupon_status,new.coupon_status), if(new.get_time is null,old.get_time,new.get_time), if(new.using_time is null,old.using_time,new.using_time), if(new.used_time is null,old.used_time,new.used_time), date_format(if(new.get_time is null,old.get_time,new.get_time),'yyyy-MM-dd') from ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time from ${APP}.dwd_fact_coupon_use where dt in ( select date_format(get_time,'yyyy-MM-dd') from ${APP}.ods_coupon_use where dt='$do_date' ) )old full outer join ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time from ${APP}.ods_coupon_use where dt='$do_date' )new on old.id=new.id;
insert overwrite table ${APP}.dwd_fact_order_info partition(dt) select if(new.id is null,old.id,new.id), if(new.order_status is null,old.order_status,new.order_status), if(new.user_id is null,old.user_id,new.user_id), if(new.out_trade_no is null,old.out_trade_no,new.out_trade_no), if(new.tms['1001'] is null,old.create_time,new.tms['1001']), if(new.tms['1002'] is null,old.payment_time,new.tms['1002']), if(new.tms['1003'] is null,old.cancel_time,new.tms['1003']), if(new.tms['1004'] is null,old.finish_time,new.tms['1004']), if(new.tms['1005'] is null,old.refund_time,new.tms['1005']), if(new.tms['1006'] is null,old.refund_finish_time,new.tms['1006']), if(new.province_id is null,old.province_id,new.province_id), if(new.activity_id is null,old.activity_id,new.activity_id), if(new.original_total_amount is null,old.original_total_amount,new.original_total_amount), if(new.benefit_reduce_amount is null,old.benefit_reduce_amount,new.benefit_reduce_amount), if(new.feight_fee is null,old.feight_fee,new.feight_fee), if(new.final_total_amount is null,old.final_total_amount,new.final_total_amount), date_format(if(new.tms['1001'] is null,old.create_time,new.tms['1001']),'yyyy-MM-dd') from ( select id, order_status, user_id, out_trade_no, create_time, payment_time, cancel_time, finish_time, refund_time, refund_finish_time, province_id, activity_id, original_total_amount, benefit_reduce_amount, feight_fee, final_total_amount from ${APP}.dwd_fact_order_info where dt in( select date_format(create_time,'yyyy-MM-dd') from ${APP}.ods_order_info where dt='$do_date' ) )old full outer join ( select info.id, info.order_status, info.user_id, info.out_trade_no, info.province_id, act.activity_id, log.tms, info.original_total_amount, info.benefit_reduce_amount, info.feight_fee, info.final_total_amount from ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',',' =') tms from ${APP}.ods_order_status_log where dt='$do_date' group by order_id )log join ( select * from ${APP}.ods_order_info where dt='$do_date' )info on log.order_id=info.id left join ( select * from ${APP}.ods_activity_order where dt='$do_date' )act on log.order_id=act.order_id )new on old.id=new.id; insert overwrite table ${APP}.dwd_dim_user_info_his_tmp select * from ( select id, name, birthday, gender, email, user_level, create_time, operate_time, '$do_date' start_date, '9999-99-99' end_date from ${APP}.ods_user_info where dt='$do_date' union all select uh.id, uh.name, uh.birthday, uh.gender, uh.email, uh.user_level, uh.create_time, uh.operate_time, uh.start_date, if(ui.id is not null and uh.end_date='9999-99-99', date_add(ui.dt,-1), uh.end_date) end_date from ${APP}.dwd_dim_user_info_his uh left join ( select * from ${APP}.ods_user_info where dt='$do_date' ) ui on uh.id=ui.id )his order by his.id, start_date;
insert overwrite table ${APP}.dwd_dim_user_info_his select * from ${APP}.dwd_dim_user_info_his_tmp;
"
sql2="
insert overwrite table ${APP}.dwd_dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.region_id, br.region_name from ${APP}.ods_base_province bp join ${APP}.ods_base_region br on bp.region_id=br.id;
"
case $1 in
"first"){ $hive -e "$sql1" $hive -e "$sql2"
};;
"all"){ $hive -e "$sql1"
};;
esac
17.数仓搭建-DWS 层数据导入脚本
#!/bin/bash APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fisql="
insert overwrite table ${APP}.dws_uv_detail_daycount partition(dt='$do_date')
select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang))lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, count(*) login_count
from ${APP}.dwd_start_log
where dt='$do_date'
group by mid_id; with
tmp_login as
( select user_id, count(*) login_count from ${APP}.dwd_start_log where dt='$do_date' and user_id is not null group by user_id
),
tmp_cart as
( select user_id, count(*) cart_count, sum(cart_price*sku_num) cart_amount from ${APP}.dwd_fact_cart_info where dt='$do_date' and user_id is not null and date_format(create_time,'yyyy-MM-dd')='$do_date' group by user_id
),
tmp_order as
( select user_id, count(*) order_count, sum(final_total_amount) order_amount from ${APP}.dwd_fact_order_info where dt='$do_date' group by user_id
) ,
tmp_payment as
( select user_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_fact_payment_info where dt='$do_date' group by user_id
)insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
select user_actions.user_id, sum(user_actions.login_count), sum(user_actions.cart_count), sum(user_actions.cart_amount), sum(user_actions.order_count), sum(user_actions.order_amount), sum(user_actions.payment_count), sum(user_actions.payment_amount)
from
( select user_id, login_count, 0 cart_count, 0 cart_amount, 0 order_count, 0 order_amount, 0 payment_count, 0 payment_amount from tmp_login union all select user_id, 0 login_count, cart_count, cart_amount, 0 order_count, 0 order_amount, 0 payment_count, 0 payment_amount from tmp_cart union all select user_id, 0 login_count, 0 cart_count, 0 cart_amount, order_count, order_amount, 0 payment_count, 0 payment_amount from tmp_order union all select user_id, 0 login_count, 0 cart_count, 0 cart_amount, 0 order_count, 0 order_amount, payment_count, payment_amount from tmp_payment
) user_actions
group by user_id; with
tmp_order as
( select sku_id, count(*) order_count, sum(sku_num) order_num, sum(total_amount) order_amount from ${APP}.dwd_fact_order_detail where dt='$do_date' group by sku_id
),
tmp_payment as
( select sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(total_amount) payment_amount from ${APP}.dwd_fact_order_detail where dt='$do_date' and order_id in ( select id from ${APP}.dwd_fact_order_info where (dt='$do_date' or dt=date_add('$do_date',-1)) and date_format(payment_time,'yyyy-MM-dd')='$do_date' )group by sku_id
),
tmp_refund as
( select sku_id, count(*) refund_count, sum(refund_num) refund_num, sum(refund_amount) refund_amount from ${APP}.dwd_fact_order_refund_info where dt='$do_date' group by sku_id
),
tmp_cart as
( select sku_id, count(*) cart_count, sum(sku_num) cart_num from ${APP}.dwd_fact_cart_info where dt='$do_date' and date_format(create_time,'yyyy-MM-dd')='$do_date' group by sku_id
),
tmp_favor as
( select sku_id, count(*) favor_count from ${APP}.dwd_fact_favor_info where dt='$do_date' and date_format(create_time,'yyyy-MM-dd')='$do_date' group by sku_id
),
tmp_appraise as
( select sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_fact_comment_info where dt='$do_date' group by sku_id
)
insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
select sku_id, sum(order_count), sum(order_num), sum(order_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_count), sum(refund_num), sum(refund_amount), sum(cart_count), sum(cart_num), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count)
from
( select sku_id, order_count, order_num, order_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_count, 0 refund_num, 0 refund_amount, 0 cart_count, 0 cart_num, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select sku_id, 0 order_count, 0 order_num, 0 order_amount, payment_count, payment_num, payment_amount, 0 refund_count, 0 refund_num, 0 refund_amount, 0 cart_count, 0 cart_num, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_payment union all select sku_id, 0 order_count, 0 order_num, 0 order_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_count, refund_num, refund_amount, 0 cart_count, 0 cart_num, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_refund union all select sku_id, 0 order_count, 0 order_num, 0 order_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_count, 0 refund_num, 0 refund_amount, cart_count, cart_num, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cart union all select sku_id, 0 order_count, 0 order_num, 0 order_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_count, 0 refund_num, 0 refund_amount, 0 cart_count, 0 cart_num, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_favor union all select sku_id, 0 order_count, 0 order_num, 0 order_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_count, 0 refund_num, 0 refund_amount, 0 cart_count, 0 cart_num, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_appraise
)tmp
group by sku_id; insert overwrite table ${APP}.dws_coupon_use_daycount partition(dt='$do_date')
select cu.coupon_id, ci.coupon_name, ci.coupon_type, ci.condition_amount, ci.condition_num, ci.activity_id, ci.benefit_amount, ci.benefit_discount, ci.create_time, ci.range_type, ci.spu_id, ci.tm_id, ci.category3_id, ci.limit_num, cu.get_count, cu.using_count, cu.used_count
from
( select coupon_id, sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) using_count, sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) used_count from ${APP}.dwd_fact_coupon_use where dt='$do_date' group by coupon_id
)cu
left join
( select * from ${APP}.dwd_dim_coupon_info where dt='$do_date'
)ci on cu.coupon_id=ci.id; insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date')
select oi.activity_id, ai.activity_name, ai.activity_type, ai.start_time, ai.end_time, ai.create_time, oi.order_count, oi.payment_count from
( select activity_id, sum(if(date_format(create_time,'yyyy-MM-dd')='$do_date',1,0)) order_count, sum(if(date_format(payment_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count from ${APP}.dwd_fact_order_info where (dt='$do_date' or dt=date_add('$do_date',-1)) and activity_id is not null group by activity_id
)oi
join
( select * from ${APP}.dwd_dim_activity_info where dt='$do_date'
)ai
on oi.activity_id=ai.id; insert overwrite table ${APP}.dws_sale_detail_daycount partition(dt='$do_date')
select op.user_id, op.sku_id, ui.gender, months_between('$do_date', ui.birthday)/12 age, ui.user_level, si.price, si.sku_name, si.tm_id, si.category3_id, si.category2_id, si.category1_id, si.category3_name, si.category2_name, si.category1_name, si.spu_id, op.sku_num, op.order_count, op.order_amount
from
( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, sum(total_amount) order_amount from ${APP}.dwd_fact_order_detail where dt='$do_date' group by user_id, sku_id
)op
join
( select * from ${APP}.dwd_dim_user_info_his where end_date='9999-99-99'
)ui on op.user_id = ui.id
join
( select * from ${APP}.dwd_dim_sku_infowhere dt='$do_date'
)si on op.sku_id = si.id;
"
$hive -e "$sql"
18.数仓搭建-DWT 层数据导入脚本
#!/bin/bash APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table ${APP}.dwt_uv_topic
select nvl(new.mid_id,old.mid_id), nvl(new.user_id,old.user_id), nvl(new.version_code,old.version_code), nvl(new.version_name,old.version_name), nvl(new.lang,old.lang), nvl(new.source,old.source), nvl(new.os,old.os), nvl(new.area,old.area), nvl(new.model,old.model), nvl(new.brand,old.brand), nvl(new.sdk_version,old.sdk_version), nvl(new.gmail,old.gmail), nvl(new.height_width,old.height_width), nvl(new.app_time,old.app_time), nvl(new.network,old.network), nvl(new.lng,old.lng), nvl(new.lat,old.lat), nvl(old.login_date_first,'$do_date'), if(new.login_count>0,'$do_date',old.login_date_last), nvl(new.login_count,0), nvl(new.login_count,0)+nvl(old.login_count,0)
from
( select * from ${APP}.dwt_uv_topic
)old
full outer join
( select * from ${APP}.dws_uv_detail_daycount where dt='$do_date'
)new
on old.mid_id=new.mid_id; insert overwrite table ${APP}.dwt_user_topic
select nvl(new.user_id,old.user_id), if(old.login_date_first is null and new.login_count>0,'$do_date',old.login_date_first), if(new.login_count>0,'$do_date',old.login_date_last), nvl(old.login_count,0)+if(new.login_count>0,1,0), nvl(new.login_last_30d_count,0), if(old.order_date_first is null and new.order_count>0,'$do_date',old.order_date_first), if(new.order_count>0,'$do_date',old.order_date_last), nvl(old.order_count,0)+nvl(new.order_count,0), nvl(old.order_amount,0)+nvl(new.order_amount,0), nvl(new.order_last_30d_count,0), nvl(new.order_last_30d_amount,0), if(old.payment_date_first is null and new.payment_count>0,'$do_date',old.payment_date_first), if(new.payment_count>0,'$do_date',old.payment_date_last), nvl(old.payment_count,0)+nvl(new.payment_count,0), nvl(old.payment_amount,0)+nvl(new.payment_amount,0), nvl(new.payment_last_30d_count,0), nvl(new.payment_last_30d_amount,0)
from
( select * from ${APP}.dwt_user_topic
)old
full outer join
( select user_id, sum(if(dt='$do_date',login_count,0)) login_count, sum(if(dt='$do_date',order_count,0)) order_count, sum(if(dt='$do_date',order_amount,0)) order_amount, sum(if(dt='$do_date',payment_count,0)) payment_count, sum(if(dt='$do_date',payment_amount,0)) payment_amount, sum(if(order_count>0,1,0)) login_last_30d_count, sum(order_count) order_last_30d_count, sum(order_amount) order_last_30d_amount, sum(payment_count) payment_last_30d_count, sum(payment_amount) payment_last_30d_amount from ${APP}.dws_user_action_daycount where dt>=date_add( '$do_date',-30) group by user_id
)new
on old.user_id=new.user_id; with
sku_act as
(
select sku_id, sum(if(dt='$do_date', order_count,0 )) order_count, sum(if(dt='$do_date',order_num ,0 )) order_num, sum(if(dt='$do_date',order_amount,0 )) order_amount , sum(if(dt='$do_date',payment_count,0 )) payment_count, sum(if(dt='$do_date',payment_num,0 )) payment_num, sum(if(dt='$do_date',payment_amount,0 )) payment_amount, sum(if(dt='$do_date',refund_count,0 )) refund_count, sum(if(dt='$do_date',refund_num,0 )) refund_num, sum(if(dt='$do_date',refund_amount,0 )) refund_amount, sum(if(dt='$do_date',cart_count,0 )) cart_count, sum(if(dt='$do_date',cart_num,0 )) cart_num, sum(if(dt='$do_date',favor_count,0 )) favor_count, sum(if(dt='$do_date',appraise_good_count,0 )) appraise_good_count, sum(if(dt='$do_date',appraise_mid_count,0 ) ) appraise_mid_count , sum(if(dt='$do_date',appraise_bad_count,0 )) appraise_bad_count, sum(if(dt='$do_date',appraise_default_count,0 )) appraise_default_count,sum( order_count ) order_count30 , sum( order_num ) order_num30, sum(order_amount ) order_amount30, sum(payment_count ) payment_count30, sum(payment_num ) payment_num30, sum(payment_amount ) payment_amount30, sum(refund_count ) refund_count30, sum(refund_num ) refund_num30, sum(refund_amount ) refund_amount30, sum(cart_count ) cart_count30, sum(cart_num ) cart_num30, sum(favor_count ) favor_count30, sum(appraise_good_count ) appraise_good_count30, sum(appraise_mid_count ) appraise_mid_count30, sum(appraise_bad_count ) appraise_bad_count30, sum(appraise_default_count ) appraise_default_count30
from ${APP}.dws_sku_action_daycount
where dt>=date_add ( '$do_date',-30)
group by sku_id
),
sku_topic
as
(
select sku_id, spu_id, order_last_30d_count, order_last_30d_num, order_last_30d_amount, order_count, order_num, order_amount , payment_last_30d_count, payment_last_30d_num, payment_last_30d_amount,payment_count, payment_num, payment_amount, refund_last_30d_count, refund_last_30d_num, refund_last_30d_amount, refund_count, refund_num, refund_amount, cart_last_30d_count, cart_last_30d_num, cart_count, cart_num , favor_last_30d_count , favor_count , appraise_last_30d_good_count , appraise_last_30d_mid_count , appraise_last_30d_bad_count , appraise_last_30d_default_count , appraise_good_count , appraise_mid_count , appraise_bad_count , appraise_default_count
from ${APP}.dwt_sku_topic
)
insert overwrite table ${APP}.dwt_sku_topic
select nvl(sku_act.sku_id,sku_topic.sku_id) , sku_info.spu_id, nvl (sku_act.order_count30,0) , nvl (sku_act.order_num30,0) , nvl (sku_act.order_amount30,0) , nvl(sku_topic.order_count,0)+ nvl (sku_act.order_count,0) , nvl(sku_topic.order_num,0)+ nvl (sku_act.order_num,0) , nvl(sku_topic.order_amount,0)+ nvl (sku_act.order_amount,0), nvl (sku_act.payment_count30,0), nvl (sku_act.payment_num30,0), nvl (sku_act.payment_amount30,0), nvl(sku_topic.payment_count,0)+ nvl (sku_act.payment_count,0) , nvl(sku_topic.payment_num,0)+ nvl (sku_act.payment_count,0) , nvl(sku_topic.payment_amount,0)+ nvl (sku_act.payment_count,0) , nvl (refund_count30,0), nvl (sku_act.refund_num30,0), nvl (sku_act.refund_amount30,0), nvl(sku_topic.refund_count,0)+ nvl (sku_act.refund_count,0), nvl(sku_topic.refund_num,0)+ nvl (sku_act.refund_num,0), nvl(sku_topic.refund_amount,0)+ nvl (sku_act.refund_amount,0), nvl(sku_act.cart_count30,0) , nvl(sku_act.cart_num30,0) , nvl(sku_topic.cart_count ,0)+ nvl (sku_act.cart_count,0), nvl( sku_topic.cart_num ,0)+ nvl (sku_act.cart_num,0), nvl(sku_act.favor_count30 ,0) , nvl (sku_topic.favor_count ,0)+ nvl (sku_act.favor_count,0), nvl (sku_act.appraise_good_count30 ,0) , nvl (sku_act.appraise_mid_count30 ,0) , nvl (sku_act.appraise_bad_count30 ,0) , nvl (sku_act.appraise_default_count30 ,0) , nvl (sku_topic.appraise_good_count ,0)+ nvl (sku_act.appraise_good_count,0) , nvl (sku_topic.appraise_mid_count ,0)+ nvl (sku_act.appraise_mid_count,0) , nvl (sku_topic.appraise_bad_count ,0)+ nvl (sku_act.appraise_bad_count,0) , nvl (sku_topic.appraise_default_count ,0)+ nvl (sku_act.appraise_default_count,0)
from sku_act
full outer join sku_topic
on sku_act.sku_id =sku_topic.sku_id
left join
(select * from ${APP}.dwd_dim_sku_info where dt='$do_date') sku_info
on nvl(sku_topic.sku_id,sku_act.sku_id)= sku_info.id; insert overwrite table ${APP}.dwt_coupon_topic
select nvl(new.coupon_id,old.coupon_id), nvl(new.get_count,0), nvl(new.using_count,0), nvl(new.used_count,0), nvl(old.get_count,0)+nvl(new.get_count,0), nvl(old.using_count,0)+nvl(new.using_count,0), nvl(old.used_count,0)+nvl(new.used_count,0)
from
( select * from ${APP}.dwt_coupon_topic
)old
full outer join
( select coupon_id, get_count, using_count, used_count from ${APP}.dws_coupon_use_daycount where dt='$do_date'
)new
on old.coupon_id=new.coupon_id; insert overwrite table ${APP}.dwt_activity_topic
select nvl(new.id,old.id), nvl(new.activity_name,old.activity_name), nvl(new.order_count,0), nvl(new.payment_count,0), nvl(old.order_count,0)+nvl(new.order_count,0), nvl(old.payment_count,0)+nvl(new.payment_count,0)
from
( select * from ${APP}.dwt_activity_topic
)old
full outer join
( select id, activity_name, order_count, payment_count from ${APP}.dws_activity_info_daycount where dt='$do_date'
)new
on old.id=new.id;
"
$hive -e "$sql"
19.数仓搭建-ADS 层会员主题信息导入脚本
#!/bin/bash APP=gmall
hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fisql="
with
tmp_day_users as
(
select '$do_date' dt, count(*) day_users
from${APP}.dwt_user_topic
wherelogin_date_last='$do_date'
),
tmp_day_new_users as
(
select '$do_date' dt, count(*) day_new_users
from${APP}.dwt_user_topic
wherelogin_date_last='$do_date' and login_date_first='$do_date'
),
tmp_day_new_payment_users as
(
select '$do_date' dt, count(*) day_new_payment_users
from${APP}.dwt_user_topic
wherepayment_date_first='$do_date'
),
tmp_payment_users as
(
select '$do_date' dt, count(*) payment_users
from${APP}.dwt_user_topic
where payment_date_first is not null
),
tmp_users as
(
select '$do_date' dt, count(*) users
from${APP}.dwt_user_topic
tmp_users
)insert into table ${APP}.ads_user_topic
select '$do_date' dt, day_users, day_new_users, day_new_payment_users, payment_users, users, day_users/users, payment_users/users, day_new_users/users
fromtmp_day_users
jointmp_day_new_users
on tmp_day_users.dt=tmp_day_new_users.dt
jointmp_day_new_payment_users
on tmp_day_users.dt=tmp_day_new_payment_users.dt
jointmp_payment_users
on tmp_day_users.dt=tmp_payment_users.dt
jointmp_users
on
tmp_day_users.dt=tmp_users.dt;
"
$hive -e "$sql"
20.数仓搭建-ADS 层导入脚本
#!/bin/bash hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then do_date=$1
elsedo_date=`date -d "-1 day" +%F`
fi
sql="use gmall;
insert into table ads_uv_count
select '$do_date', sum(if(login_date_last='$do_date',1,0)), sum(if(login_date_last>=date_add(next_day('$do_date','monday'),-7) and login_date_last<=date_add(next_day('$do_date','monday'),-1) ,1,0)), sum(if(date_format(login_date_last,'yyyy-MM')=date_format('$do_date','yyyy-M M'),1,0)), if('$do_date'=date_add(next_day('$do_date','monday'),-1),'Y','N'), if('$do_date'=last_day('$do_date'),'Y','N')
from dwt_uv_topic; insert into table ads_new_mid_count
select '$do_date', count(*)
from dwt_uv_topic
where login_date_first='$do_date';
insert into table ads_silent_count
select '$do_date', count(*)
from dwt_uv_topic
where login_date_first=login_date_last and login_date_last<=date_add('$do_date',-7); insert into table ads_back_count
select '$do_date', concat(date_add(next_day('2020-03-10','MO'),-7),'_',date_add(next_day('2020- 03-10','MO'),-1)), count(*)
from
( select mid_id from dwt_uv_topic where login_date_last>=date_add(next_day('$do_date','MO'),-7) and login_date_last<= date_add(next_day('$do_date','MO'),-1) and login_date_first$do_date ','MO'),-7) )current_wk
left join
( select mid_id from dws_uv_detail_daycount where dt>=date_add(next_day('$do_date','MO'),-7*2) and dt<= date_add(next_day('$do_date','MO'),-7-1) group by mid_id
)last_wk
on current_wk.mid_id=last_wk.mid_id
where last_wk.mid_id is null; insert into table ads_wastage_count
select '$do_date', count(*)
from dwt_uv_topic
where login_date_last<=date_add('$do_date',-7); insert into table ads_user_retention_day_rate
select '$do_date', date_add('$do_date',-3), 3,sum(if(login_date_first=date_add('$do_date',-3) and login_date_last='$do_date',1,0)), sum(if(login_date_first=date_add('$do_date',-3),1,0)), sum(if(login_date_first=date_add('$do_date',-3) and login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date', -3),1,0))*100
from dwt_uv_topic
union all
select '$do_date', date_add('$do_date',-2), 2,sum(if(login_date_first=date_add('$do_date',-2) and login_date_last='$do_date',1,0)), sum(if(login_date_first=date_add('$do_date',-2),1,0)), sum(if(login_date_first=date_add('$do_date',-2) and login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date', -2),1,0))*100
from dwt_uv_topic
union all
select '$do_date', date_add('$do_date',-1), 1,sum(if(login_date_first=date_add('$do_date',-1) and login_date_last='$do_date',1,0)), sum(if(login_date_first=date_add('$do_date',-1),1,0)), sum(if(login_date_first=date_add('$do_date',-1) and login_date_last='$do_date',1,0))/sum(if(login_date_first=date_add('$do_date', -1),1,0))*100
from dwt_uv_topic; insert into table ads_continuity_wk_count
select '$do_date', concat(date_add(next_day('$do_date','MO'),-7*3),'_',date_add(next_day('$do_d ate','MO'),-1)), count(*)
from
( select mid_id from ( select mid_id from dws_uv_detail_daycount where dt>=date_add(next_day('$do_date','monday'),-7) and dt<=date_add(next_day('$do_date','monday'),-1) group by mid_id union all select mid_id from dws_uv_detail_daycount where dt>=date_add(next_day('$do_date','monday'),-7*2) and dt<=date_add(next_day('$do_date','monday'),-7-1) group by mid_id union all select mid_id from dws_uv_detail_daycount where dt>=date_add(next_day('$do_date','monday'),-7*3) and dt<=date_add(next_day('$do_date','monday'),-7*2-1) group by mid_id )t1 group by mid_id having count(*)=3
)t2; insert into table ads_continuity_uv_count
select '$do_date', concat(date_add('$do_date',-6),'_','$do_date'), count(*)
from
( select mid_id from ( select mid_id from ( select mid_id, date_sub(dt,rank) date_dif from ( select mid_id, dt, rank() over(partition by mid_id order by dt) rank from dws_uv_detail_daycount where dt>=date_add('$do_date',-6) and dt<='$do_date' )t1 )t2 group by mid_id,date_dif having count(*)>=3 )t3 group by mid_id
)t4; insert into table ads_user_topic
select '$do_date', sum(if(login_date_last='$do_date',1,0)), sum(if(login_date_first='$do_date',1,0)), sum(if(payment_date_first='$do_date',1,0)), sum(if(payment_count>0,1,0)), count(*), sum(if(login_date_last='$do_date',1,0))/count(*), sum(if(payment_count>0,1,0))/count(*), sum(if(login_date_first='$do_date',1,0))/sum(if(login_date_last='$do_date',1, 0))
from dwt_user_topic; insert into table ads_user_action_convert_day select '$do_date', uv.day_count, ua.cart_count, ua.cart_count/uv.day_count*100 visitor2cart_convert_ratio, ua.order_count, ua.order_count/ua.cart_count*100 visitor2order_convert_ratio, ua.payment_count, ua.payment_count/ua.order_count*100 order2payment_convert_ratio
from
( select '$do_date' dt, sum(if(cart_count>0,1,0)) cart_count, sum(if(order_count>0,1,0)) order_count, sum(if(payment_count>0,1,0)) payment_count from dws_user_action_daycount where dt='$do_date'
)ua join ads_uv_count uv on uv.dt=ua.dt; insert into table ads_product_info
select '$do_date' dt, sku_num, spu_num
from
( select '$do_date' dt, count(*) sku_num fromdwt_sku_topic
) tmp_sku_num
join
( select '$do_date' dt, count(*) spu_num from ( select spu_id fromdwt_sku_topic group by spu_id ) tmp_spu_id
) tmp_spu_num
on tmp_sku_num.dt=tmp_spu_num.dt; insert into table ads_product_sale_topN
select '$do_date', sku_id, payment_amount
from dws_sku_action_daycount
where dt='$do_date'
order by payment_amount desc
limit 10; insert into table ads_product_favor_topN
select '$do_date', sku_id, favor_count
from dws_sku_action_daycount
where dt='$do_date'
order by favor_count
limit 10; insert into table ads_product_cart_topN
select '$do_date' dt, sku_id, cart_num
from dws_sku_action_daycount
where dt='$do_date'
order by cart_num
limit 10; insert into table ads_product_refund_topN
select '$do_date', sku_id, refund_last_30d_count/payment_last_30d_count*100 refund_ratio
from dwt_sku_topic
order by refund_ratio desc
limit 10; insert into table ads_appraise_bad_topN
select '$do_date' dt, sku_id,
appraise_bad_count/(appraise_bad_count+appraise_good_count+appraise_mid_count+appraise_default_count)*100 appraise_bad_ratio
from dws_sku_action_daycount
where dt='$do_date'
order by appraise_bad_ratio desc
limit 10; insert into table ads_order_daycount
select '$do_date', sum(order_count), sum(order_amount), sum(if(order_count>0,1,0))
from dws_user_action_daycount
where dt='$do_date'; insert into table ads_payment_daycount
select tmp_payment.dt, tmp_payment.payment_count, tmp_payment.payment_amount, tmp_payment.payment_user_count, tmp_skucount.payment_sku_count, tmp_time.payment_avg_time
from
( select '$do_date' dt, sum(payment_count) payment_count, sum(payment_amount) payment_amount, sum(if(payment_count>0,1,0)) payment_user_count from dws_user_action_daycount where dt='$do_date'
)tmp_payment
join
( select '$do_date' dt, sum(if(payment_count>0,1,0)) payment_sku_count from dws_sku_action_daycount where dt='$do_date'
)tmp_skucount on tmp_payment.dt=tmp_skucount.dt
join
( select '$do_date' dt, sum(unix_timestamp(payment_time)-unix_timestamp(create_time))/count(*)/60 payment_avg_time from dwd_fact_order_info where dt='$do_date' and payment_time is not null
)tmp_time on tmp_payment.dt=tmp_time.dt; insert into table ads_sale_tm_category1_stat_mn
select mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name, sum(if(mn.order_count>=1,1,0)) buycount, sum(if(mn.order_count>=2,1,0)) buyTwiceLast, sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio, sum(if(mn.order_count>=3,1,0)) buy3timeLast , sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio , date_format('$do_date' ,'yyyy-MM') stat_mn, '$do_date' stat_date
from
( select user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name, sum(order_count) order_count from dws_sale_detail_daycount sd where date_format(dt,'yyyy-MM')=date_format('$do_date' ,'yyyy-MM') group by user_id, sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name
) mn
group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name;
"
$hive -e "$sql"
21.hdfs_to_mysql传输脚本
#!/bin/bash hive_db_name=gmall
mysql_db_name=gmall_report export_data() {
/opt/module/sqoop/bin/sqoop export --connect "jdbc:mysql://hadoop102:3306/${mysql_db_name}?useUnicode=true& characterEncoding=utf-8" --username root --password 123456 --table $1 --num-mappers 1 --export-dir /warehouse/$hive_db_name/ads/$1 --input-fields-terminated-by "\t" --update-mode allowinsert --update-key $2 --input-null-string '\\N' --input-null-non-string '\\N'
}case $1 in "ads_uv_count") export_data "ads_uv_count" "dt"
;;"ads_user_action_convert_day") export_data "ads_user_action_convert_day" "dt"
;;"ads_user_topic") export_data "ads_user_topic" "dt"
;; "all") export_data "ads_uv_count" "dt" export_data "ads_user_action_convert_day" "dt" export_data "ads_user_topic" "dt"
;;
esac
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
