用XGBoost调XGBoost?我调我自己？

2023-06-26 19:26:41

上篇《深恶痛绝的超参》已经介绍了很多实用的调参方式，今天来看一篇更有趣的跳槽方法，用ML的方式调ML的模型我们用我们熟悉的模型去调我们熟悉的模型，看到这里很晕是不是，接下来我们就看看XGBoost如何调XGBoost。

Model-based HP Tuning

基于模型的调参其实想法很简单，我们需要有个方式指导超参优化，从而达到最好的效果。现在训练集很大，训练模型相当耗时，各种配置的组合往往又非常大，所以为什么不直接学一个estimator去给当前配置打分呢？每次训练都可以为我们探索方向给予启发。

基于模型优化超参可以概括为以下流程：

随机选n种配置
用estimator评估这些配置
从这些配置中挑出评分最高的
用评分最高的配置训练模型
把该配置和模型最终效果保存到estimator的训练数据中
重新训练estimator
返回最开始的一步，如果没达到停止条件

参数空间采样

怎么在参数空间采样呢？已经有现成的lib可以用了:

>>> import ConfigSpace as CS
>>> import ConfigSpace.hyperparameters as CSH
>>> cs = CS.ConfigurationSpace(seed=1234)
>>> a = CSH.UniformIntegerHyperparameter('a', lower=10, upper=100, log=False)
>>> b = CSH.CategoricalHyperparameter('b', choices=['red', 'green', 'blue'])
>>> cs.add_hyperparameters([a, b])
[a, Type: UniformInteger, Range: [10, 100], Default: 55,...]
>>> cs.sample_configuration()
Configuration:a, Value: 27b, Value: 'blue'

"我"调"我"自己

最早都是用高斯过程最为estimator来进行调参的，但是最近的研究显示树模型也很适合做estimator，而且高斯过程也不支持类目特征，所以用XGBoost做estimator当然是最合适的。

接下来就是构建超参优化器了：

import pandas as pd
import numpy as np
class Optimizer:"""
    This class optimise an algorithm/model configuration with respect to a given score.
    """def __init__(self,algo_score,max_iter,max_intensification,model,cs):"""
        :param algo_score: is the function called to evaluate algorithm / model score
        :param max_iter: the maximal number of training to perform
        :param max_intensification: the maximal number of candidates configuration to sample randomly
        :param model: the class of the internal model used as score estimator.
        :param cs: the configuration space to explore
        """self.traj = []self.algo_score = algo_score # 打分模型self.max_iter = max_iter # 迭代次数，停止条件可以按需求更改self.max_intensification = max_intensification # 候选参数组合随机的个数self.internal_model = model() # 评估参数模型self.trajectory = [] # 记录每次优化后的参数组合self.cfgs = []self.scores = {}self.best_cfg = Noneself.best_score = Noneself.cs = csdef cfg_to_dtf(self, cfgs):"""
        Convert configs list into pandas DataFrame to ease learning
        """cfgs = [dict(cfg) for cfg in cfgs]dtf = pd.DataFrame(cfgs)return dtfdef optimize(self):"""
        Optimize algo/model using internal score estimator
        """cfg = self.cs.sample_configuration()self.cfgs.append(cfg)self.trajectory.append(cfg)# initial runscore = self.algo_score(cfg)self.scores[cfg] = scoreself.best_cfg = cfgself.best_score = scoredtf = self.cfg_to_dtf(self.cfgs)for i in range(0, self.max_iter):# We need at least two datapoints for training# 至少2个数据才能训练调参模型if dtf.shape[0] > 1:scores = np.array([ val for key, val in self.scores.items()])self.internal_model.fit(dtf, scores)# intensificationcandidates = [self.cs.sample_configuration() for i in range(0, self.max_intensification)]candidate_scores = [self.internal_model.predict(self.cfg_to_dtf([cfg])) for cfg in candidates]best_candidates = np.argmax(candidate_scores)cfg = candidates[best_candidates]self.cfgs.append(cfg)score = self.algo_score(cfg)self.scores[cfg] = scoreif score > self.best_score:self.best_cfg = cfgself.best_score = scoreself.trajectory.append(cfg)dtf = self.cfg_to_dtf(self.cfgs)self.internal_model.fit(dtf,np.array([val for kay, val in self.scores.items()]))else:cfg = self.cs.sample_configuration()self.cfgs.append(cfg)score = self.algo_score(cfg)self.scores[cfg] = scoreif score > self.best_score:self.best_cfg = cfgself.best_score = scoreself.trajectory.append(cfg)dtf = self.cfg_to_dtf(self.cfgs)

把algo_score换成需要调参数的XGB，并把internal_model替换成用于调参的XGB，就可以自动搜寻参数啦，还等什么，快去尝试下吧！

参考文献：

用XGB调XGB?"我"调"我"自己？

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > 关于jupyter几个不得不知道的tips
下一篇 > 个性化推荐系统该如何评估，四种不同策略的角度

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

用XGBoost调XGBoost?我调我自己？

Model-based HP Tuning

参数空间采样

"我"调"我"自己

相关文章