通过hive元数据查询hive库和表的总条数
1、传统方式获得总条数例如以下:
select count(*) from ods.tracklog;
2、通过元数据获取方式如下:
A、查指定库
不含分区select FORMAT(sum(tb.PARAM_VALUE),0) from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join TABLE_PARAMS tb
on t.TBL_ID = tb.TBL_ID
where d.NAME='hive_ods' and tb.PARAM_KEY='numRows'; 查询含有分区表的总条数
select FORMAT(sum(a.PARAM_VALUE),0)
from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join PARTITIONS p
on t.TBL_ID = p.TBL_ID
left join PARTITION_PARAMS a
on p.PART_ID=a.PART_ID
where d.NAME='hive_ods_partition' and a.PARAM_KEY='numRows';
B、查指定表
不含分区
select d.NAME,t.TBL_NAME,t.TBL_ID,p.PART_ID,p.PART_NAME,a.PARAM_VALUE
from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join PARTITIONS p
on t.TBL_ID = p.TBL_ID
left join PARTITION_PARAMS a
on p.PART_ID=a.PART_ID
where t.TBL_NAME='tracklog' and d.NAME='ods' and a.PARAM_KEY='numRows';含分区
select FORMAT(sum(a.PARAM_VALUE),0)
from TBLS t
left join DBS d
on t.DB_ID = d.DB_ID
left join PARTITIONS p
on t.TBL_ID = p.TBL_ID
left join PARTITION_PARAMS a
on p.PART_ID=a.PART_ID
where t.TBL_NAME='tracklog' and d.NAME='ods' and a.PARAM_KEY='numRows';
3、通过元数据获取所有库的总条数和总大小
A、获取所有不分区库的总条数和总大小
SELECTa.TBL_ID,d.`NAME` dbName,a.TBL_NAME,b.PARAM_VALUE numRows,c.PARAM_VALUE totalSize
FROMTBLS AS aleft JOIN TABLE_PARAMS AS b on a.TBL_ID = b.TBL_IDleft JOIN TABLE_PARAMS AS con a.TBL_ID = c.TBL_IDleft JOIN DBS as don d.DB_ID = a.DB_ID
where b.PARAM_KEY = "numRows"and c.PARAM_KEY = "totalSize"
B、获取所有分区的总条数和总大小
SELECT main.*,b.numRows,b.totalSize FROM (SELECT main.TBL_ID AS metaObjectId,main.TBL_NAME AS tabName,b.PART_ID AS partId,b.PART_NAME AS partitionName,b.CREATE_TIME AS createTime,b.LAST_ACCESS_TIME AS updateTimeFROM TBLS mainINNER JOIN PARTITIONS b ON main.TBL_ID = b.TBL_IDWHERE b.PART_NAME IS NOT NULL
) main
LEFT JOIN
(
SELECT PART_ID AS partId,
MAX(CASE PARAM_KEY WHEN 'numRows' THEN PARAM_VALUE ELSE 0 END) AS numRows,
MAX(CASE PARAM_KEY WHEN 'totalSize' THEN PARAM_VALUE ELSE 0 END) AS totalSize
FROM PARTITION_PARAMS
GROUP BY PART_ID
) b ON main.partId=b.partId;
重点:对应hive分区的几个比较重要的元数据表
TBLS:hive所有表的基础信息,包括id,表名,等等
PARTITIONS:跟分区有关的表, 包括分区id,创建时间,分区表名,以及TBLS主表关联id等等
PARTITION_PARAMS:分区表比较重要的一些数据
主要字段:
numfiles:该分区下的文件数,
numRows:该分区下记录数,
rawDataSize是指原始数据的大小,
totalSize是指占用HDFS存储空间大小,
transient_lastDdlTime最后一次ddl时间
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
