Python Luigi框架的搭载和运行
Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Luigi 是python的一个框架,可以实现复杂的Job批处理,可以很好的管理Job的依赖,workflow,并提供可视化和错误提示,并提供命令行执行
It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic
支持 spark,Hive 等作业的调度
1.luigi的安装
Luigi的安装可用pip(省略Pip的安装)安装
pip luigi install
2.Luigi参数参考
luigi AllTask --module all_task
e.g.:luigi LuigiJobTask --module Luigi_info_Job --date `date +"%Y-%m-%d"`
-- LuigiJobTask 方法名,luigi_job_task 文件名,传递的参数
luigi LuigiJobTask --module luigi_job_task --date `date +"%Y-%m-%d"`
master = 'spark://127.0.0.1:7077' deploy_mode = 'cluster' driver_memory = '3g' executor_memory = '3g' executor_cores=4 num_executors=6
强烈参考:https://marcobonzanini.com/2015/10/24/building-data-pipelines-with-python-and-luigi/
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
