使用keras的LSTM模型多变量时间序列预测2020年5G市场

2023-10-24 00:21:45

问题背景

根据所给的数据对各区域5G市场及无线整体市场规模进行预测。
完整数据和代码请点击链接
提示：请根据自己的判断对异常数据进行处理。并基于给出的数据进行预测，无需另外寻找公开数据。
各区域说明：

地区	英文名称	缩写
北美	NORTH AMERICA	NA
欧洲、中东、非洲	EUROPE, MIDDLE EAST, AND AFRICA	EMEA
亚太	ASIA PACIFIC	AP
加勒比及拉丁美洲	CARIBBEAN AND LATIN AMERICA	CALA

缩写说明：

缩写	说明
Q1	第一季度
Q2	第二季度
Q3	第三季度
Q4	第四季度

预测的思想

在本问题中，需要根据2G，3G，4G市场来对5G市场做一个预测。由于这些数据都是随着时间的变化而变化的，而且相互之间肯定有作用，因而不能简单的用线性回归进行拟合，必须使用能够学习特征的根据时间变化的关系以及多特征相互影响的模型。
由于博主能力有限，因而在处理的时候做了一些简化，没有考虑2G，3G，4G，5G这些变量之间的相互影响关系，而是直接根据时间分别预测了2G，3G，4G在2020年的4个季度中的市场份额，以及所有通讯市场在2020年的4个季度中的份额之和。再使用所有市场份额之和减去2G，3G，4G的市场份额就可以得到了2020年4个季度的5G市场份额。如果用所有市场份额之和减去2G，3G，4G的市场份额得到的是负数的话，就多预测几遍，使得相减的结果不小于0。这个模型的缺点也是显而易见的，没有考虑2G，3G，4G，5G之间的相互影响与关系，一味地考虑2G，3G，4G以及总体市场与时间的关系。如果哪位仁兄有高见的话，务必在评论区中提出，不胜感激。

数据预处理

缺失值采用前后4个数字的平均值；
子市场规模之和不等于整体市场规模时，以子市场之和为准。

建立模型

我们所构建的LSTM网络主要有3层：

含有50个神经网络单元的LSTM层；
含有50个神经网络单元的LSTM；
具有1个输出单元的全连接层（Dense）。

model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(units=50))
model.add(Dense(1))

编译模型

在模型的编译步骤中，我们通过调整以下参数来：
Loss function（损失函数）。我们选取的是Mean Squared Error（均方差）函数，这个函数可以衡量模型在训练过程中的准确程度。我们希望将这个函数最小化，以便在正确的方向上引导模型。
Optimizer（优化器），我们选取的是adam函数，用于选择更新模型的方式。

model.compile(loss='mean_squared_error', optimizer='adam')

训练模型

训练LSTM模型需要以下步骤：

将训练数据提供给模型。
该模型学会了将x映射到y上。
使用模型进行预测。

经过试验，发现epochs（训练次数）为3，batch_size（批处理次数）为3的时候，模型预测结果与实际结果的均方差最小。

history = model.fit(x_train, y_train, epochs=3, batch_size=3, verbose=0)

训练结果

预测的数据结果

2020 Q1-Q4 各区域5G规模预测（单位：亿美元）

地区\季度	Q1	Q2	Q3	Q4
AP 5G	6.890	6.717	6.581	6.493
CALA 5G	0.819	0.634	0.474	0.369
EMEA 5G	2.405	2.236	2.140	2.115
NA 5G	1.899	1.766	1.562	1.694

2020 Q1-Q4 各区域无线整体市场规模预测（单位：亿美元）

地区\季度	Q1	Q2	Q3	Q4
AP total	42.613	46.132	49.805	52.012
CALA total	6.057	6.454	6.783	6.968
EMEA total	20.841	22.501	24.007	24.778
NA total	17.465	19.062	20.677	21.415

预测结果与实际结果的对比

在这里插入图片描述

模型评估

我们通过计算每个地区的市场预测结果的变异系数（Coefficient of Variation，CoV）来评估模型预测的准确性。变异系数越高，预测的准确度越差。
在这里插入图片描述

在这里插入图片描述

完整代码

# importing required libraries
import csv
from typing import List, Anyimport numpy as np
import pandas as pd
from keras.layers import Dense, LSTM
from keras.losses import mean_squared_error
from keras.models import Sequential
from matplotlib import pyplot
from numpy import mean
from sklearn.preprocessing import MinMaxScalerdef total_prediction(file_path, col_g, num):""":param file_path: the path of file read:param col_g: the column we need to read from .csv file:param num: the row number of the .csv file:return: Coefficient of Variation"""# read from csvdf = pd.read_csv(file_path)# sort by row namedata = df.sort_index(ascending=True, axis=0)# creating data framenew_data = pd.DataFrame(index=range(0, len(df)), columns=['Date', col_g])# copy the 'data' read from .csv file to 'new_data'for i in range(0, len(data)):new_data['Date'][i] = data['Date'][i]new_data[col_g][i] = data[col_g][i]# setting index, new_data is the DataFrame type.new_data.index = new_data.Date# drop the 'Date' column of 'new_data'new_data.drop('Date', axis=1, inplace=True)# creating train and test setsdataset = new_data.values# the 0th to num th data was included in train settrain = dataset[0:num-20, :]# the size of test set is 12test_size = 6# to avoid the null prediction, we use the test_size to num of 'dataset' as valid set.valid = dataset[test_size:, :]# converting dataset into x_train and y_train# scaler is used to normalize data within 0 and 1scaler = MinMaxScaler(feature_range=(0, 1))# normalizing datascaled_data = scaler.fit_transform(dataset)x_train, y_train = [], []for i in range(test_size, len(train)):x_train.append(scaled_data[i - test_size:i, 0])y_train.append(scaled_data[i, 0])# convert to numpy# the row of x_train is (num-test_size)=64, the col of x_train is test_size=12# the row of y_train is test_size=12, the col of x_train is 1x_train, y_train = np.array(x_train), np.array(y_train)#  tuple, reshape x_train to be 3D.[samples, timesteps, features], (64,12,1)x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))# create and fit the LSTM network# design networkmodel = Sequential()model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))model.add(LSTM(units=50))model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='adam')# fit network# x_train: Input data; y_train: Target data; batch_size:Number of samples per gradient updatehistory = model.fit(x_train, y_train, epochs=3, batch_size=3, verbose=0)# plot loss# pyplot.plot(history.history['loss'], label='train')# pyplot.legend()# pyplot.show()# predicting with past 'test_size' sample from the train data# from len(new_data) - len(valid) - test_size () to endinputs = new_data[len(new_data) - len(valid) - test_size:].values# reshape into one colinputs = inputs.reshape(-1, 1)# Scaling features of inputs according to feature_range.inputs = scaler.transform(inputs)# test the predict result of model, add the data to be predict into x_test listx_test = []# total_size*test_size, 74*6for i in range(test_size, inputs.shape[0]):x_test.append(inputs[i - test_size:i, 0])# convert to numpy arrayx_test = np.array(x_test)# (total_size, test_size, )(74,6,1)x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))# make predictionpredict_result = model.predict(x_test)# inverse the normalized data to originalpredict_result = scaler.inverse_transform(predict_result)# the size of predict_result is num - test_size, the last 4 cannot take into considerationpredict_result = predict_result[0:num - test_size - 4]predict_2020 = predict_result[num - test_size - 4:]valid = valid[:, 0]valid = valid[0:num - test_size - 4]# plot the predict and the actualpyplot.plot(predict_result, label='predict')pyplot.plot(valid, label=col_g)pyplot.title(col_g)pyplot.legend()pyplot.show()print(col_g)# calculate Root Mean Squared Errorrmse = np.sqrt(mean_squared_error(predict_result, valid))# calculate Coefficient of Variationcov = rmse / mean(valid)print('Coefficient of Variation:', rmse / mean(valid))for i in predict_result:float(i)print(i)print("\n")return cov,predict_result,validdef draw_bar(x_index, data_list, xticks, title, x_label, y_label):"""to draw the bar plot:param x_index: index:param data_list: height:param xticks: set the current tick locations labels of the x-axis:param title::param x_label::param y_label::return: null"""pyplot.bar(x_index, data_list)pyplot.xlabel(x_label)pyplot.ylabel(y_label)pyplot.xticks(x_index, xticks)pyplot.title(title)pyplot.show()pyplot.savefig()if __name__ == "__main__":# index = np.arange(4)# cov_rcv = []# print('\nTotal prediction:\n')# cov_rcv.append(total_prediction('AP.csv', 'total_ap', 76))# cov_rcv.append(total_prediction('CALA.csv', 'total_cala', 76))# cov_rcv.append(total_prediction('EMEA.csv', 'total_emea', 76))# cov_rcv.append(total_prediction('NA.csv', 'total_na', 76))# cov_rcv = np.array(cov_rcv)# x_ticks = ('total_ap', 'total_cala', 'total_emea', 'total_na')# draw_bar(index, cov_rcv, x_ticks, 'Total', 'area', 'CoV')predict_result = total_prediction('AP.csv', 'total_ap', 76)[1]valid = total_prediction('AP.csv', 'total_ap', 76)[2]fig = pyplot.figure()ax1 = fig.add_subplot(221)ax1.plot(predict_result, label='predict')ax1.plot(valid, label='total_ap')pyplot.legend()predict_result=total_prediction('CALA.csv', 'total_cala', 76)[1]valid = total_prediction('CALA.csv', 'total_cala', 76)[2]ax2 = fig.add_subplot(222)ax2.plot(predict_result, label='predict')ax2.plot(valid, label='total_cala')pyplot.legend()predict_result=total_prediction('EMEA.csv', 'total_emea', 76)[1]valid = total_prediction('EMEA.csv', 'total_emea', 76)[2]ax3 = fig.add_subplot(223)ax3.plot(predict_result, label='predict')ax3.plot(valid, label='total_emea')pyplot.legend()predict_result=total_prediction('NA.csv', 'total_na', 76)[1]valid = total_prediction('NA.csv', 'total_na', 76)[2]ax4 = fig.add_subplot(224)ax4.plot(predict_result, label='predict')ax4.plot(valid, label='total_na')pyplot.legend()pyplot.show()# print('\n2g prediction:\n')# cov_2g = [total_prediction('AP.csv', '2g_ap', 76), total_prediction('CALA.csv', '2g_cala', 76),#           total_prediction('EMEA.csv', '2g_emea', 76), total_prediction('NA.csv', '2g_na', 76)]# cov_2g = np.array(cov_2g)# x_ticks = ('2g_ap', '2g_cala', '2g_emea', '2g_na')# draw_bar(index, cov_2g, x_ticks, '2g', 'area', 'CoV')## print('\n3g prediction:\n')# cov_3g = [total_prediction('AP.csv', '3g_ap', 76), total_prediction('CALA.csv', '3g_cala', 76),#           total_prediction('EMEA.csv', '3g_emea', 76), total_prediction('NA.csv', '3g_na', 76)]# cov_3g = np.array(cov_3g)# x_ticks = ('3g_ap', '3g_cala', '3g_emea', '3g_na')# draw_bar(index, cov_3g, x_ticks, '3g', 'area', 'CoV')## print('\n4g prediction:\n')# cov_4g = [total_prediction('AP.csv', '4g_ap', 76), total_prediction('CALA.csv', '4g_cala', 76),#           total_prediction('EMEA.csv', '4g_emea', 76), total_prediction('NA.csv', '4g_na', 76)]# cov_4g = np.array(cov_4g)# x_ticks = ('4g_ap', '4g_cala', '4g_emea', '4g_na')# draw_bar(index, cov_4g, x_ticks, '4g', 'area', 'CoV')# total_prediction('AP.csv', '2g_ap', 76)# total_prediction('CALA.csv', '2g_cala', 76)# total_prediction('EMEA.csv', '2g_emea', 76)# total_prediction('NA.csv', '2g_na', 76)## total_prediction('AP.csv', '3g_ap', 76)# total_prediction('CALA.csv', '3g_cala', 76)# total_prediction('EMEA.csv', '3g_emea', 76)# total_prediction('NA.csv', '3g_na', 76)## total_prediction('AP.csv', '4g_ap', 76)# total_prediction('CALA.csv', '4g_cala', 76)# total_prediction('EMEA.csv', '4g_emea', 76)# total_prediction('NA.csv', '4g_na', 76)

参考链接

https://blog.csdn.net/qq_28031525/article/details/79046718

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > 使用LSTM预测股票市场基于Tensorflow
下一篇 > 预测股票市场收益

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce