--随机森林--
随机森林
随机森林:多棵决策树构建而成,每一棵决策树都是上一篇文章中的决策树的原理,只是“随机森林”将其集成(ensemble)到了一起,因此,随机森林是一种“集成算法”
随机森林API
随机森林API:class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, booststrap=True, rangdom_state=None)
- n_estimators:integer,optional(default=10),森林中树木的数量
- criterion:string,可选(default='gini')分割特征的测量方法,也可选用entropy,上一篇中有写到两者计算方法略有不同,但是效果类似
- max_depth:integer或None,可选,默认为“None”,树的最大深度
- bootstrap:boolean,optional(default=True),是否在构建树时使用放回抽样
随机森林优点
- 准确率高
- 不会过拟合
- 适用于大数据集,较常用
随机森林实操
import numpy as np
from sklearn.ensemble import RangdomForestClassifier
import matplotlib.pyplot as plt
from sklearn import datasets
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifierwine = datasets.load_wine()
wine数据集介绍:
目标值分三类,target = 0,1,2;target_names = class_0, class_1, class_2
特征值有很多,feature_names: 'alcohol', 'malic_acid', 'alcalinity_of_ash', 'ash'......13种
data的shape = (178,13)
X = wine['data']
y = wine['target']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)clf = RandomForestClassifier()
clf.fit(X_train, y_train)
y_ = clf.predict(X_test)from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_)
# 1.0
由上述随机森林方法测出来的准确率是1.0
下面使用决策树方法
dt_clf = DecisionTreeClassifier()dt_clf.fit(X_train, y_train)dt_clf.score(X_test, y_test)
# 0.944444444
决策树方法测出来的准确率是0.944
score = 0
for i in range(100):X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)dt_clf = DecisionTreeClassifier()dt_clf.fit(X_train,y_train)score+=dt_clf.score(X_test,y_test)/100print('决策树多次运行准确率:',score)
决策树多次运行准确率: 0.909166666666666
score = 0
for i in range(100):X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2)clf = RandomForestClassifier(n_estimators=100)clf.fit(X_train,y_train)score+=clf.score(X_test,y_test)/100print('随机森林多次运行准确率:',score)
随机森林多次运行准确率: 0.9808333333333332
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
