Paper翻译：《Apple Leaf Diseases Recognition Based on An Improved Convolutional Neural Network》

2023-08-28 06:00:37

论文名称：《Apple Leaf Diseases Recognition Based on An Improved Convolutional Neural Network》
论文作者：Yan Q , Yang B , Wang W , et al.
发表期刊：Sensors, 2020, 20(12):3535.
论文总结：

Research Gap:
改进VGG16（VGG16 + BN + GAP）对苹果叶部病害进行检测
Importance：
VGG16模型的模型参数上可以减少89%
对4类苹果叶片的ACC为99.01%

论文地址： https://www.researchgate.net/publication/342401878_Apple_Leaf_Diseases_Recognition_Based_on_An_Improved_Convolutional_Neural_Network
论文目录
- Abstract
- 1.Introduction
- 2.Methods
- - 2.1 Data
  - 2.2. VGG16 and Transfer Learning
  - - 2.2.1. VGG16
    - 2.2.2. Transfer Learning
  - 2.3. Improved CNNs Based on VGG16
  - - 2.3.1. Global Average Pooling Layer (GAP)
    - 2.3.2. Batch Normalization (BN)
    - 2.3.3. Adaptive Moment Estimation
- 3.Results and Discussion
- - 3.1.Comparison of Model Performance
  - 3.2.Convergence Rate Analysis
  - 3.3.Training Time and Parameters
  - 3.4.Comparison of Optimal Algorithms
  - 3.5.Data Augmentation
- 4.Conclusions

Abstract

摘要

原文译文

Abstract: Scab, frogeye spot, and cedar rust are three common types of apple leaf diseases, and the rapid diagnosis and accurate identification of them play an important role in the development of apple production. In this work, an improved model based on VGG16 is proposed to identify apple leaf diseases, in which the global average poling layer is used to replace the fully connected layer to reduce the parameters and a batch normalization layer is added to improve the convergence speed. A transfer learning strategy is used to avoid a long training time. The experimental results show that the overall accuracy of apple leaf classification based on the proposed model can reach 99.01%. Compared with the classical VGG16, the model parameters are reduced by 89%, the recognition accuracy is improved by 6.3%, and the training time is reduced to 0.56% of that of the original model. Therefore, the deep convolutional neural network model proposed in this work provides a better solution for the identification of apple leaf diseases with higher accuracy and a faster convergence speed. 摘要：黑星病、蛙眼斑病和雪松锈病是苹果常见的三种叶片病害，对其的快速诊断和准确鉴定对苹果生产的发展具有重要意义。在这项工作中，提出了一种基于VGG16的改进模型来识别苹果叶片病害，其中使用全局平均极化层代替全连接层以减少参数，并添加批量归一化层以提高收敛速度。迁移学习策略用于避免长时间的训练。实验结果表明，基于该模型的苹果叶片分类总体准确率可达99.01%。与经典VGG16相比，模型参数降低89%，识别准确率提升6.3%，训练时间减少至原模型的0.56%。因此，本文提出的深度卷积神经网络模型为苹果叶片病害识别提供了更好的解决方案，具有更高的准确率和更快的收敛速度。

Keywords: apple leaf diseases; transfer learning; deep learning; convolutional neural networks 关键词：苹果叶病；迁移学习；深度学习；卷积神经网络

原文	译文
Abstract: Scab, `frogeye spot`, and `cedar rust` are three common types of apple leaf diseases, and the `rapid` diagnosis and accurate identification of them play an important role in the development of apple production. In this work, an improved model based on VGG16 is proposed to identify apple leaf diseases, in which the global average poling layer is used to replace the fully connected layer to reduce the parameters and a batch normalization layer is added to improve the convergence speed. A transfer learning strategy is used to avoid a long training time. The experimental results show that the overall accuracy of apple leaf classification based on the proposed model can reach 99.01%. Compared with the classical VGG16, the model parameters are reduced by 89%, the recognition accuracy is improved by 6.3%, and the training time is reduced to 0.56% of that of the original model. Therefore, the deep convolutional neural network model proposed in this work provides a better solution for the identification of apple leaf diseases with higher accuracy and a faster `convergence` speed.	摘要：黑星病、`蛙眼斑病`和`雪松锈病`是苹果常见的三种叶片病害，对其的`快速`诊断和准确鉴定对苹果生产的发展具有重要意义。在这项工作中，提出了一种基于VGG16的改进模型来识别苹果叶片病害，其中使用全局平均极化层代替全连接层以减少参数，并添加批量归一化层以提高收敛速度。迁移学习策略用于避免长时间的训练。实验结果表明，基于该模型的苹果叶片分类总体准确率可达99.01%。与经典VGG16相比，模型参数降低89%，识别准确率提升6.3%，训练时间减少至原模型的0.56%。因此，本文提出的深度卷积神经网络模型为苹果叶片病害识别提供了更好的解决方案，具有更高的准确率和更快的`收敛`速度。
Keywords: apple leaf diseases; transfer learning; deep learning; convolutional neural networks	关键词：苹果叶病；迁移学习；深度学习；卷积神经网络

1.Introduction

原文	译文
Leaf diseases are one of the main obstacles to apple production. `Among them`, scab, frogeye spot, and cedar rust are three most common types of apple leaf diseases and have a bad impact on apple growing. Therefore, the detection of apple leaf diseases has attracted more and more attention, and the early identification of apple leaf disease is very important for the intervention of treatment. In the past, disease identification methods were generally divided into manual identification and an expert system. However, both of them are highly dependent on fruit growers and experts and are time-consuming and usually poor in generalization.	叶病是苹果生产的主要障碍之一。 `其中`，疮痂病、蛙眼斑病和雪松锈病是最常见的三种苹果叶病害，对苹果的生长影响很大。因此，苹果叶病的检测越来越受到重视，而苹果叶病的早期识别对于干预治疗非常重要。过去，疾病识别方法一般分为人工识别和专家系统。然而，两者都高度依赖水果种植者和专家，并且耗时且普遍性较差。
With the development of machine learning methods, some computational models have been proposed for plant disease diagnosis based on different algorithms. Some studies have found diseased regions by K-means clustering-based segmentation and build disease recognition models using supervised learning methods, including the random forest, support vector machine (SVM), and K-nearest neighbor methods [1–3]. Rothe et al. used an active contour model for image segmentation and extracted Hu’s moments as features for the training of an adaptive neuro-fuzzy inference system, by which a classification accuracy of 85% can be achieved [4]. Gupta et al. proposed an autonomously modified SVM-CS model where a SVM model was trained and optimized using the concept of a `cuckoo` search [5]. However, these classification features are heavily depended on man-made selection and the recognition rates are not satisfactory.	随着机器学习方法的发展，一些基于不同算法的植物病害诊断计算模型被提出。一些研究通过基于 K 均值聚类的分割发现了病变区域，并使用监督学习方法构建了疾病识别模型，包括随机森林、支持向量机 (SVM) 和 K 近邻方法 [1-3]。罗特等人使用主动轮廓模型进行图像分割，并提取 Hu 的矩作为特征用于训练自适应神经模糊推理系统，由此可以实现 85% 的分类准确率 [4]。古普塔等人提出了一种自主修改的 SVM-CS 模型，其中使用`布谷鸟`搜索的概念对 SVM 模型进行训练和优化 [5]。然而，这些分类特征严重依赖于人为选择，识别率并不理想。
In recent years, convolutional neural networks (CNNs) have shown good results in recognition tasks by reducing the need for image preprocessing and improving the identification accuracy [6–13]. Leaf disease recognition based on CNNs has become a new hotspot in the agricultural informatization area [14–16]. Lu et al. proposed a rice disease identification method based on deep CNN techniques and achieved an accuracy of 95.48% on a dataset of 500 natural images of diseased and healthy rice leaves [17]. Zhang et al. proposed the improved GoogLeNet and Cifar10 models and obtained the average identification accuracies of 98.9% and 98.8%, respectively [18]. Liu et al. designed a novel architecture of AlexNet to detect apple leaf diseases, and the experimental results showed that this approach achieved an overall accuracy of 97.62% for disease identification [19]. Although the recognition accuracy of these CNN models is higher than that of traditional machine learning methods, there are still some shortcomings—such as high model `complexity`, much more parameters, and a long training time—which prevent their application in real environments.	近年来，卷积神经网络（CNN）通过减少对图像预处理的需求和提高识别精度[6-13]，在识别任务中显示出良好的效果。基于CNNs的叶片病害识别已成为农业信息化领域的新热点[14-16]。卢等人提出了一种基于深度 CNN 技术的水稻病害识别方法，在 500 张病叶和健康水稻自然图像的数据集上实现了 95.48% 的准确率 [17]。张等人提出了改进的 GoogLeNet 和 Cifar10 模型，分别获得了 98.9% 和 98.8% 的平均识别准确率 [18]。刘等人设计了一种新的 AlexNet 架构来检测苹果叶病，实验结果表明，该方法对病害识别的总体准确率达到了 97.62% [19]。尽管这些 CNN 模型的识别准确率高于传统机器学习方法，但仍存在一些缺点，例如模型`复杂度`高、参数多、训练时间长等，阻碍了它们在实际环境中的应用。
In this work, we propose a method for apple leaf disease identification based on an improved deep convolution neural network architecture which can effectively reduce the model complexity and training time. The network proposed in this work `adopts` the concept of transfer learning to pre-train a VGG16 network and adjusts the network structure by removing three fully connected layers, adding a global average pooling layer, a batch normalization layer, and a fully connected layer. Based on a benchmark dataset, the proposed model, which can reach a 89% reduction in the model parameters of the original VGG16 model, greatly reduced the training time and achieved a higher accuracy rate.	在这项工作中，我们提出了一种基于改进的深度卷积神经网络架构的苹果叶片病害识别方法，可以有效降低模型复杂度和训练时间。这项工作中提出的网络`采用`迁移学习的概念来预训练一个 VGG16 网络，并通过移除三个全连接层、添加一个全局平均池化层、一个批量归一化层和一个全连接层来调整网络结构。基于基准数据集，所提出的模型在原始VGG16模型的模型参数上可以减少89%，大大减少了训练时间并获得了更高的准确率。

2.Methods

2.1 Data

原文译文

The dataset in this work is from the “2008 ’AI Challenger’ Global Challenge” and includes 10 kinds of plants with 27 categories of diseases. This work addresses the automatic identification of apple leaf diseases, therefore only apple leaves are selected from this dataset. There are four categories of apple leaf images within the dataset, and Figure 1 lists some of them. With the exception of healthy leaves, three types of disease images—i.e., scab, frogeye spot, and cedar rust—are collected within the dataset. Typically, the lesions on scab leaves are gray-brown and nearly round or radial, frogeye spot is tan and the shape is flakes or dots, and cedar rust leaves have round orange-yellow lesions with red edges. Some spot and cedar rust lesions are similar in color and shape, which increases the difficulty in recognition by computational methods. 本作品中的数据集来自“2008‘AI Challenger’全球挑战赛”，包括10种植物27类病害。这项工作解决了苹果叶病害的自动识别，因此仅从该数据集中选择了苹果叶。数据集中有四类苹果叶图像，图 1 列出了其中的一些。除了健康的叶子外，数据集中收集了三种类型的疾病图像，即疮痂病、蛙眼斑病和雪松锈病。通常，疮痂病叶上的病斑呈灰褐色，近圆形或放射状，蛙眼斑呈棕褐色，呈片状或点状，雪松锈病叶呈橙黄色圆形病斑，边缘呈红色。一些斑锈病和雪松锈病在颜色和形状上相似，这增加了计算方法识别的难度。

In this work, there are 2446 pictures collected within our dataset, where 1340 of them are healthy, 411 are scab, 487 are frogeye spot, and 208 are cedar rust. In the original dataset, the dataset was divided into two subsets—i.e., 2141 pictures were for model training and the remaining 305 ones for testing. The details about the dataset are shown in Table 1. 在这项工作中，我们的数据集中收集了 2446 张图片，其中 1340 张是健康的，411 张是痂，487 张是蛙眼斑，208 张是雪松锈病。在原始数据集中，数据集被分为两个子集——即 2141 张图片用于模型训练，其余 305 张用于测试。数据集的详细信息如表 1 所示。

原文	译文
The dataset in this work is from the “2008 ’AI Challenger’ Global Challenge” and includes 10 kinds of plants with 27 categories of diseases. This work `addresses` the automatic identification of apple leaf diseases, therefore only apple leaves are selected from this dataset. There are four categories of apple leaf images within the dataset, and Figure 1 lists some of them. With the exception of healthy leaves, three types of disease images—i.e., scab, frogeye spot, and cedar rust—are collected within the dataset. Typically, the lesions on scab leaves are gray-brown and nearly round or radial, frogeye spot is tan and the shape is flakes or dots, and cedar rust leaves have round orange-yellow lesions with red edges. Some spot and cedar rust lesions are similar in color and shape, which increases the difficulty in recognition by computational methods.	本作品中的数据集来自“2008‘AI Challenger’全球挑战赛”，包括10种植物27类病害。这项工作`解决`了苹果叶病害的自动识别，因此仅从该数据集中选择了苹果叶。数据集中有四类苹果叶图像，图 1 列出了其中的一些。除了健康的叶子外，数据集中收集了三种类型的疾病图像，即疮痂病、蛙眼斑病和雪松锈病。通常，疮痂病叶上的病斑呈灰褐色，近圆形或放射状，蛙眼斑呈棕褐色，呈片状或点状，雪松锈病叶呈橙黄色圆形病斑，边缘呈红色。一些斑锈病和雪松锈病在颜色和形状上相似，这增加了计算方法识别的难度。
In this work, there are 2446 pictures collected within our dataset, where 1340 of them are healthy, 411 are scab, 487 are frogeye spot, and 208 are cedar rust. In the original dataset, the dataset was divided into two subsets—i.e., 2141 pictures were for model training and the remaining 305 ones for testing. The details about the dataset are shown in Table 1.	在这项工作中，我们的数据集中收集了 2446 张图片，其中 1340 张是健康的，411 张是痂，487 张是蛙眼斑，208 张是雪松锈病。在原始数据集中，数据集被分为两个子集——即 2141 张图片用于模型训练，其余 305 张用于测试。数据集的详细信息如表 1 所示。

在这里插入图片描述

2.2. VGG16 and Transfer Learning

2.2.1. VGG16

原文译文

With the rapid development of deep learning, CNNs had been applied widely in different fields, especially in image classification and recognition and target location and detection [20]. A CNN is a special multi-layer perceptron (MLP) or multilayered feed forward neural network, which generally consists of an input layer, convolution layer, pooling layer, fully connected layer, and output layer. The convolution layer can realize dimensionality reduction and feature extraction by implementing two design concepts: local perception and parameter sharing. The pooling layer can reduce the size of the data, where smart sampling also has the invariance of local linear transformation, which enhances the generalization ability of convolutional neural networks. The fully connected layer acts as a classifier in the whole neural network. It is common for multiple fully connected layers to be used after several rounds of convolution, and the resulting structure of the last convolutional layer is flattened [21,22]. 随着深度学习的飞速发展，CNNs 在不同领域得到了广泛的应用，特别是在图像分类识别和目标定位检测等领域[20]。 CNN 是一种特殊的多层感知器 (MLP) 或多层前馈神经网络，一般由输入层、卷积层、池化层、全连接层和输出层组成。卷积层通过实现局部感知和参数共享两个设计理念，可以实现降维和特征提取。池化层可以减少数据的规模，其中智能采样还具有局部线性变换的不变性，增强了卷积神经网络的泛化能力。全连接层充当整个神经网络中的分类器。在几轮卷积后使用多个全连接层是很常见的，最后一个卷积层的结果结构被展平[21,22]。

The VGG16 contains 16 convolutional layers with very small receptive fields, 3 × 3, and five max‐pooling layers of size 2 × 2 for carrying out spatial pooling, followed by three fully connected layers. A classical VGG16 model involves 144 million parameters, where rectification nonlinearity (ReLU) activation is applied to all hidden space pooling and the softmax function is applied in the final layer [23]. The model also uses dropout regularization in the fully connected layers. A schematic of the VGG16 architecture is shown in Figure 2, where the marked red box shows a classifier consisting of three fully connected layers. VGG16 包含 16 个具有非常小的感受野的卷积层，3 × 3，和五个大小为 2 × 2 的最大池化层，用于执行空间池化，然后是三个全连接层。一个经典的 VGG16 模型涉及 1.44 亿个参数，其中整流非线性 (ReLU) 激活应用于所有隐藏空间池化，而 softmax 函数应用于最后一层 [23]。该模型还在全连接层中使用了 dropout 正则化。 VGG16 架构的示意图如图 2 所示，其中标记的红色框显示了一个由三个全连接层组成的分类器。

原文	译文
With the rapid development of deep learning, CNNs had been applied widely in different fields, especially in image classification and recognition and target location and detection [20]. A CNN is a special multi-layer perceptron (MLP) or multilayered feed forward neural network, which generally consists of an input layer, convolution layer, pooling layer, fully connected layer, and output layer. The convolution layer can realize `dimensionality reduction` and feature extraction by implementing two design concepts: local perception and parameter sharing. The pooling layer can reduce the size of the data, where smart sampling also has the invariance of local linear transformation, which enhances the generalization ability of convolutional neural networks. The fully connected layer acts as a classifier in the whole neural network. It is common for multiple fully connected layers to be used after several rounds of convolution, and the resulting structure of the last convolutional layer is flattened [21,22].	随着深度学习的飞速发展，CNNs 在不同领域得到了广泛的应用，特别是在图像分类识别和目标定位检测等领域[20]。 CNN 是一种特殊的多层感知器 (MLP) 或多层前馈神经网络，一般由输入层、卷积层、池化层、全连接层和输出层组成。卷积层通过实现局部感知和参数共享两个设计理念，可以实现`降维`和特征提取。池化层可以减少数据的规模，其中智能采样还具有局部线性变换的不变性，增强了卷积神经网络的泛化能力。全连接层充当整个神经网络中的分类器。在几轮卷积后使用多个全连接层是很常见的，最后一个卷积层的结果结构被展平[21,22]。
The VGG16 contains 16 convolutional layers with very small receptive fields, 3 × 3, and five max‐pooling layers of size 2 × 2 for carrying out spatial pooling, followed by three fully connected layers. A classical VGG16 model involves 144 million parameters, where rectification nonlinearity (ReLU) activation is applied to all hidden space pooling and the softmax function is applied in the final layer [23]. The model also uses dropout regularization in the fully connected layers. A `schematic` of the VGG16 architecture is shown in Figure 2, where the marked red box shows a classifier consisting of three fully connected layers.	VGG16 包含 16 个具有非常小的感受野的卷积层，3 × 3，和五个大小为 2 × 2 的最大池化层，用于执行空间池化，然后是三个全连接层。一个经典的 VGG16 模型涉及 1.44 亿个参数，其中整流非线性 (ReLU) 激活应用于所有隐藏空间池化，而 softmax 函数应用于最后一层 [23]。该模型还在全连接层中使用了 dropout 正则化。 VGG16 架构的`示意图`如图 2 所示，其中标记的红色框显示了一个由三个全连接层组成的分类器。

在这里插入图片描述

2.2.2. Transfer Learning

原文	译文
CNNs typically require a large annotated image dataset to achieve a high predictive accuracy. However, the acquisition of such data is difficult and labeling them is costly in many areas. In light of these challenges, the concept of transfer learning is adopted in many previous studies for solving cross-domain image classification problems and has been shown to be very useful, where the “off-the-shelf” features of `well-established` CNNs, such as VGG16, AlexNet, and GoogLeNet, are pre-trained on large-scale annotated natural image datasets, such as ImageNet, where 15 million images are involved [24–27].	CNN 通常需要一个大的带注释的图像数据集来实现高预测精度。然而，获取此类数据很困难，并且在许多领域标记它们的成本很高。鉴于这些挑战，之前许多研究都采用了迁移学习的概念来解决跨域图像分类问题，并且已被证明非常有用，其中`成熟`的 CNN 的“现成”特征，例如 VGG16、AlexNet 和 GoogLeNet，在大规模带注释的自然图像数据集上进行了预训练，例如 ImageNet，其中涉及 1500 万张图像 [24-27]。
One common strategy of transfer learning is feature transfer, which removes the last layer of the pre-trained network and sends its previous activation values, which can be regarded as feature vectors, into classifiers for training. Another is parameter transfer, which only needs to re-initialize a few layers of the network, such as the last layer, and the other layers directly using the weight parameters of the pre-trained network, while a new dataset is used to finetune the network parameters [28–30].	迁移学习的一种常见策略是特征迁移，即移除预训练网络的最后一层，并将其先前的激活值（可以视为特征向量）发送到分类器中进行训练。另一个是参数传递，只需要重新初始化网络的几层，比如最后一层，其他层直接使用预训练网络的权重参数，同时使用新的数据集对网络进行微调网络参数 [28-30]。
Because of the small amount of data in this work, training a neural network from scratch will take a long time, and the data insufficiency easily causes an over-fitting problem, which will bring the model poor robustness. Therefore, we can use the idea of transfer learning, where a pre‐trained model is built on ImageNet to optimize the classification and recognition of apple leaf diseases. Herein, the VGG16 is fine tuned to fit our own data, which can save a lot of training time.	由于这项工作的数据量较小，从头开始训练一个神经网络需要很长时间，而且数据不足容易造成过拟合问题，从而导致模型鲁棒性较差。因此，我们可以使用迁移学习的思想，在 ImageNet 上建立一个预训练的模型来优化苹果叶病的分类和识别。在这里，VGG16经过微调以适合我们自己的数据，可以节省大量训练时间。

2.3. Improved CNNs Based on VGG16

原文	译文
A classical VGG16 network has a strong ability of image feature extraction and recognition. Its core idea is to use smaller convolution kernels to increase the depth of the network, which was the key to win the runner-up position in positioning and classification tasks in the ILSVRC Challenge in 2014. However, the VGG16 model has a huge amount of parameters, which will cause a slow convergence speed, long training time, and large storage capacity in practical applications.	经典的 VGG16 网络具有很强的图像特征提取和识别能力。它的核心思想是使用更小的卷积核来增加网络的深度，这是在 2014 年 ILSVRC Challenge 中获得定位和分类任务亚军的关键。然而，VGG16 模型有大量的参数，在实际应用中会导致收敛速度慢、训练时间长、存储容量大。
To address these problems, this work improves the VGG16 model by using a global average pooling layer, a batch normalization layer and a fully connected layer to replace the three fully connected layers in the original model. The global average pooling layer is used to replace the fully connected layer to reduce the parameters, and the batch normalization layer is added to improve the convergence speed. In order to avoid a long training time, the weights of the convolution layers are pre-trained by VGG16 on ImageNet. The stochastic gradient descent (SGD) optimizer is replaced by an adaptive moment estimation (Adam) to accelerate the convergence of the network. The network structure is shown in Figure 3, where the improvement of a classifier consisting of a global average pooling layer, a batch normalization layer, and a fully connected layer is shown within the marked green box.	为了解决这些问题，这项工作通过使用全局平均池化层、批量归一化层和全连接层来替换原始模型中的三个全连接层来改进 VGG16 模型。使用全局平均池化层代替全连接层减少参数，加入批量归一化层提高收敛速度。为了避免训练时间过长，卷积层的权重在 ImageNet 上通过 VGG16 进行预训练。随机梯度下降 (SGD) 优化器被自适应矩估计 (Adam) 取代，以加速网络的收敛。网络结构如图 3 所示，其中由全局平均池化层、批量归一化层和全连接层组成的分类器的改进显示在标记的绿色框中。

原文

译文

A classical VGG16 network has a strong ability of image feature extraction and recognition. Its core idea is to use smaller convolution kernels to increase the depth of the network, which was the key to win the runner-up position in positioning and classification tasks in the ILSVRC Challenge in 2014. However, the VGG16 model has a huge amount of parameters, which will cause a slow convergence speed, long training time, and large storage capacity in practical applications.

经典的 VGG16 网络具有很强的图像特征提取和识别能力。它的核心思想是使用更小的卷积核来增加网络的深度，这是在 2014 年 ILSVRC Challenge 中获得定位和分类任务亚军的关键。然而，VGG16 模型有大量的参数，在实际应用中会导致收敛速度慢、训练时间长、存储容量大。

To address these problems, this work improves the VGG16 model by using a global average pooling layer, a batch normalization layer and a fully connected layer to replace the three fully connected layers in the original model. The global average pooling layer is used to replace the fully connected layer to reduce the parameters, and the batch normalization layer is added to improve the convergence speed. In order to avoid a long training time, the weights of the convolution layers are pre-trained by VGG16 on ImageNet. The stochastic gradient descent (SGD) optimizer is replaced by an adaptive moment estimation (Adam) to accelerate the convergence of the network. The network structure is shown in Figure 3, where the improvement of a classifier consisting of a global average pooling layer, a batch normalization layer, and a fully connected layer is shown within the marked green box.

为了解决这些问题，这项工作通过使用全局平均池化层、批量归一化层和全连接层来替换原始模型中的三个全连接层来改进 VGG16 模型。使用全局平均池化层代替全连接层减少参数，加入批量归一化层提高收敛速度。为了避免训练时间过长，卷积层的权重在 ImageNet 上通过 VGG16 进行预训练。随机梯度下降 (SGD) 优化器被自适应矩估计 (Adam) 取代，以加速网络的收敛。网络结构如图 3 所示，其中由全局平均池化层、批量归一化层和全连接层组成的分类器的改进显示在标记的绿色框中。

在这里插入图片描述

2.3.1. Global Average Pooling Layer (GAP)

原文	译文
Global average pooling is to regularize the whole network structure to prevent over-fitting and reduce the dimensions from 3D to 1D [31,32]. In this work, the feature maps in the last convolution layer are averaged into a series of 1D outputs which is shown in Figure 4. A GAP can omit the expansion of the feature maps into vectors and full connection processing, and therefore greatly reduces the number of parameters. The advantage of a GAP over a fully connected layer is that it can preserve the convolution structure better by enhancing the correspondence between the feature maps and analogy, making the classification of the feature map credible and well-explained.	全局平均池化是对整个网络结构进行正则化以防止过拟合并将维度从 3D 减少到 1D [31,32]。在这项工作中，最后一个卷积层中的特征图被平均为一系列一维输出，如图 4 所示。 GAP 可以省略将特征图扩展为向量和全连接处理，因此大大减少了数量的参数。 GAP 相对于全连接层的优势在于它可以通过增强特征图和类比之间的对应关系更好地保留卷积结构，使特征图的分类可信且易于解释。

原文

译文

Global average pooling is to regularize the whole network structure to prevent over-fitting and reduce the dimensions from 3D to 1D [31,32]. In this work, the feature maps in the last convolution layer are averaged into a series of 1D outputs which is shown in Figure 4. A GAP can omit the expansion of the feature maps into vectors and full connection processing, and therefore greatly reduces the number of parameters. The advantage of a GAP over a fully connected layer is that it can preserve the convolution structure better by enhancing the correspondence between the feature maps and analogy, making the classification of the feature map credible and well-explained.

全局平均池化是对整个网络结构进行正则化以防止过拟合并将维度从 3D 减少到 1D [31,32]。在这项工作中，最后一个卷积层中的特征图被平均为一系列一维输出，如图 4 所示。 GAP 可以省略将特征图扩展为向量和全连接处理，因此大大减少了数量的参数。 GAP 相对于全连接层的优势在于它可以通过增强特征图和类比之间的对应关系更好地保留卷积结构，使特征图的分类可信且易于解释。

在这里插入图片描述

2.3.2. Batch Normalization (BN)

原文	译文
In deep learning, because the number of layers in the network is very large, if the data distribution at a certain layer starts to deviate significantly, this problem will intensify as the network deepens, which will increase the difficulty of the model optimization. Therefore, normalization helps to alleviate this problem. This method of batch normalization divides the data into several groups and updates the parameters according to the groups. The data in one group jointly determines the direction of the gradient and reduces the randomness when declining. On the other hand, because the number of samples in the batch is much smaller than the entire dataset, the amount of calculation has also dropped significantly. The batch normalization layer normalizes the inputs to the layer before the activation function is implemented, which can solve the problems of input data offset and increase [33].	在深度学习中，由于网络的层数非常多，如果某一层的数据分布开始出现明显偏差，这个问题会随着网络的加深而加剧，从而增加模型优化的难度。因此，归一化有助于缓解这个问题。这种批量归一化的方法将数据分成几组，并根据组更新参数。一组中的数据共同决定梯度的方向，减少下降时的随机性。另一方面，由于batch中的样本数量远小于整个数据集，计算量也大幅下降。批量归一化层在激活函数实现之前对层的输入进行归一化，可以解决输入数据偏移和增加的问题[33]。
Based on the BN algorithm, the parameters of the input layer are normalized and the activation function cannot affect the distribution of neurons. The importance of neurons will be weakened and some of them may be removed automatically. Because of the normalization of each epoch, the risk of parameter changes caused by a different data distribution is reduced and the convergence speed is accelerated.	基于BN算法，对输入层的参数进行归一化，激活函数不会影响神经元的分布。神经元的重要性将被削弱，其中一些可能会自动删除。由于每个epoch的归一化，降低了数据分布不同导致参数变化的风险，加快了收敛速度。

原文

译文

In deep learning, because the number of layers in the network is very large, if the data distribution at a certain layer starts to deviate significantly, this problem will intensify as the network deepens, which will increase the difficulty of the model optimization. Therefore, normalization helps to alleviate this problem. This method of batch normalization divides the data into several groups and updates the parameters according to the groups. The data in one group jointly determines the direction of the gradient and reduces the randomness when declining. On the other hand, because the number of samples in the batch is much smaller than the entire dataset, the amount of calculation has also dropped significantly. The batch normalization layer normalizes the inputs to the layer before the activation function is implemented, which can solve the problems of input data offset and increase [33].

在深度学习中，由于网络的层数非常多，如果某一层的数据分布开始出现明显偏差，这个问题会随着网络的加深而加剧，从而增加模型优化的难度。因此，归一化有助于缓解这个问题。这种批量归一化的方法将数据分成几组，并根据组更新参数。一组中的数据共同决定梯度的方向，减少下降时的随机性。另一方面，由于batch中的样本数量远小于整个数据集，计算量也大幅下降。批量归一化层在激活函数实现之前对层的输入进行归一化，可以解决输入数据偏移和增加的问题[33]。

Based on the BN algorithm, the parameters of the input layer are normalized and the activation function cannot affect the distribution of neurons. The importance of neurons will be weakened and some of them may be removed automatically. Because of the normalization of each epoch, the risk of parameter changes caused by a different data distribution is reduced and the convergence speed is accelerated.

基于BN算法，对输入层的参数进行归一化，激活函数不会影响神经元的分布。神经元的重要性将被削弱，其中一些可能会自动删除。由于每个epoch的归一化，降低了数据分布不同导致参数变化的风险，加快了收敛速度。

2.3.3. Adaptive Moment Estimation

2.3.3. 自适应矩估计

原文	译文
Adam is an extension of the stochastic gradient descent algorithm which can iteratively update the neural network weights based on training data [34,35]. This method not only stores the exponential decay mean of the square gradient but also preserves the exponential decay mean of the previously calculated first-order and second-order moment estimation of the gradient. It also designs different adaptive learning rates for different parameters. Optimization algorithms such as SGD maintain a single learning rate during the training process, and Adam can iteratively update the neural network weights based on the training data. When the parameters are backpropagated and updated, the Adam algorithm can better adjust the learning rate. Thus, Adam has a fast convergence speed and effective learning effect. It can also correct the problems existing in other optimization techniques, such as the loss function fluctuation caused by the disappearance of the learning rate, slow convergence, or parameter updating with high variance.	Adam 是随机梯度下降算法的扩展，它可以根据训练数据迭代更新神经网络权重 [34,35]。该方法不仅存储了平方梯度的指数衰减均值，而且还保留了先前计算的梯度一阶和二阶矩估计的指数衰减均值。它还针对不同的参数设计了不同的自适应学习率。 SGD 等优化算法在训练过程中保持单一学习率，Adam 可以根据训练数据迭代更新神经网络权重。当参数进行反向传播和更新时，Adam 算法可以更好地调整学习率。因此，Adam 具有快速的收敛速度和有效的学习效果。它还可以纠正其他优化技术中存在的问题，例如由于学习率消失、收敛速度慢或参数更新方差大而导致的损失函数波动。

原文

译文

Adam is an extension of the stochastic gradient descent algorithm which can iteratively update the neural network weights based on training data [34,35]. This method not only stores the exponential decay mean of the square gradient but also preserves the exponential decay mean of the previously calculated first-order and second-order moment estimation of the gradient. It also designs different adaptive learning rates for different parameters. Optimization algorithms such as SGD maintain a single learning rate during the training process, and Adam can iteratively update the neural network weights based on the training data. When the parameters are backpropagated and updated, the Adam algorithm can better adjust the learning rate. Thus, Adam has a fast convergence speed and effective learning effect. It can also correct the problems existing in other optimization techniques, such as the loss function fluctuation caused by the disappearance of the learning rate, slow convergence, or parameter updating with high variance.

Adam 是随机梯度下降算法的扩展，它可以根据训练数据迭代更新神经网络权重 [34,35]。该方法不仅存储了平方梯度的指数衰减均值，而且还保留了先前计算的梯度一阶和二阶矩估计的指数衰减均值。它还针对不同的参数设计了不同的自适应学习率。 SGD 等优化算法在训练过程中保持单一学习率，Adam 可以根据训练数据迭代更新神经网络权重。当参数进行反向传播和更新时，Adam 算法可以更好地调整学习率。因此，Adam 具有快速的收敛速度和有效的学习效果。它还可以纠正其他优化技术中存在的问题，例如由于学习率消失、收敛速度慢或参数更新方差大而导致的损失函数波动。

3.Results and Discussion

原文	译文
In this work, the proposed model was implemented with the Keras deep learning framework using a Intel® Core™ i7-8750H GPU (LENOVO, Jiangsu, China). The ImageNet pre-trained VGG16 CNN implemented within Keras Applications takes in a default image input size of 227 × 227. Therefore, all the pictures in our dataset were cut to the same size of 227 × 227.	在这项工作中，所提出的模型是通过使用英特尔® 酷睿™ i7-8750H GPU（中国江苏联想）的 Keras 深度学习框架实现的。在 Keras Applications 中实现的 ImageNet 预训练 VGG16 CNN 接受默认图像输入大小为 227 × 227。因此，我们数据集中的所有图片都被切割为相同的 227 × 227 大小。
The proposed CNN is trained on 2141 training pictures and tested on 305 ones, and the confusion is totally accurate, only one healthy picture is misclassified as scab, and only one is misclassified as healthy in both of scab and frogeye spot categories.	所提出的 CNN 在 2141 张训练图片上进行了训练，并在 305 张上进行了测试，混淆矩阵是完全准确的，只有一张健康的图片被错误分类为痂，只有一张在痂和蛙眼斑类别中都被错误分类为健康。

在这里插入图片描述

原文	译文
For the three misclassified pictures in the original dataset, Figure 5 lists the original one, its visualization of the last convolution layer and the superposition of the heat map of the original picture. There are some enlightenments can be found from these pictures. In Figure 5b, the strong light and small disease features may lead to the inaccurate extraction of disease features by the model. The frogeye spots in Figure 5c are small in size and light in color, which will leads to prediction errors with comparison to the dark area, for light is strongly learned in the network and therefore has a bigger weight.	对于原始数据集中的三张错误分类的图片，图5列出了原始的一张，它对最后一个卷积层的可视化和原始图片热图的叠加。从这些图片中可以找到一些启示。在图5b中，强光和小的疾病特征可能导致模型对疾病特征的提取不准确。图 5c 中的蛙眼斑点尺寸小，颜色浅，与暗区域相比，这将导致预测错误，因为光在网络中被强烈学习，因此具有更大的权重。

原文

译文

For the three misclassified pictures in the original dataset, Figure 5 lists the original one, its visualization of the last convolution layer and the superposition of the heat map of the original picture. There are some enlightenments can be found from these pictures. In Figure 5b, the strong light and small disease features may lead to the inaccurate extraction of disease features by the model. The frogeye spots in Figure 5c are small in size and light in color, which will leads to prediction errors with comparison to the dark area, for light is strongly learned in the network and therefore has a bigger weight.

对于原始数据集中的三张错误分类的图片，图5列出了原始的一张，它对最后一个卷积层的可视化和原始图片热图的叠加。从这些图片中可以找到一些启示。在图5b中，强光和小的疾病特征可能导致模型对疾病特征的提取不准确。图 5c 中的蛙眼斑点尺寸小，颜色浅，与暗区域相比，这将导致预测错误，因为光在网络中被强烈学习，因此具有更大的权重。

在这里插入图片描述

3.1.Comparison of Model Performance

原文	译文
To evaluate the performance of the proposed VGG model, four typical convolutional neural networks—i.e., AlexNet, GoogleNet, Resnet-34, and VGG16—are also implemented. Another apple leaf disease recognition structure presented by Liu et al., where the inception structure was added into the AlexNet framework, has also been compared. The recognition accuracy of the different models is shown in Figure 6.	为了评估所提出的 VGG 模型的性能，还实现了四个典型的卷积神经网络，即 AlexNet、GoogleNet、Resnet-34 和 VGG16。 Liu 等人提出的另一种苹果叶病识别结构，将初始结构添加到 AlexNet 框架中，也进行了比较。不同模型的识别准确率如图6所示。
It can be found that the accuracy of AlexNet and the original VGG16 is 93.11%, ResNet34 is 95.73%, and GoogleNet can reach 97.70%. When the inception structure was combined with AlexNet, the identification accuracy can be increased to 97.05%, which is higher than the original AlexNet. It can be seen that our work achieves the highest accuracy in the identification of apple leaf diseases—i.e, a 99.01% accuracy—which demonstrates the effectiveness of the proposed model. Compared to the other five models, whether in terms of precision, recall, or F1‐score, our model achieved the highest value.	可以发现AlexNet和原始VGG16的准确率为93.11%，ResNet34为95.73%，GoogleNet可以达到97.70%。当inception结构与AlexNet结合时，识别准确率可以提高到97.05%，高于原来的AlexNet。可以看出，我们的工作在识别苹果叶病害方面达到了最高的准确率——即 99.01% 的准确率——这证明了所提出模型的有效性。与其他五个模型相比，无论是在精度、召回率还是 F1 分数方面，我们的模型都取得了最高的值。

原文

译文

To evaluate the performance of the proposed VGG model, four typical convolutional neural networks—i.e., AlexNet, GoogleNet, Resnet-34, and VGG16—are also implemented. Another apple leaf disease recognition structure presented by Liu et al., where the inception structure was added into the AlexNet framework, has also been compared. The recognition accuracy of the different models is shown in Figure 6.

为了评估所提出的 VGG 模型的性能，还实现了四个典型的卷积神经网络，即 AlexNet、GoogleNet、Resnet-34 和 VGG16。 Liu 等人提出的另一种苹果叶病识别结构，将初始结构添加到 AlexNet 框架中，也进行了比较。不同模型的识别准确率如图6所示。

It can be found that the accuracy of AlexNet and the original VGG16 is 93.11%, ResNet34 is 95.73%, and GoogleNet can reach 97.70%. When the inception structure was combined with AlexNet, the identification accuracy can be increased to 97.05%, which is higher than the original AlexNet. It can be seen that our work achieves the highest accuracy in the identification of apple leaf diseases—i.e, a 99.01% accuracy—which demonstrates the effectiveness of the proposed model. Compared to the other five models, whether in terms of precision, recall, or F1‐score, our model achieved the highest value.

可以发现AlexNet和原始VGG16的准确率为93.11%，ResNet34为95.73%，GoogleNet可以达到97.70%。当inception结构与AlexNet结合时，识别准确率可以提高到97.05%，高于原来的AlexNet。可以看出，我们的工作在识别苹果叶病害方面达到了最高的准确率——即 99.01% 的准确率——这证明了所提出模型的有效性。与其他五个模型相比，无论是在精度、召回率还是 F1 分数方面，我们的模型都取得了最高的值。

在这里插入图片描述

原文	译文
Table 3 shows the precision, recall, f1‐score, and accuracy of different models achieved for the four categories of apple images. The Table 4 shows that AlexNet does not learn the features of the scab well enough, and the detection effect is poor; the improved Alex + Inception model recognition is better than the original Alex; what is more, the original VGG16 network has the worst learning of each feature. For these four‐leaf types, all the networks have the best recognition rate for healthy and the lowest scab recognition rate. Regardless of the accuracy or the detection index of each leaf type, our model achieved the best results. In general, our model has the best recognition effect.	表 3 显示了针对四类苹果图像实现的不同模型的准确率、召回率、f1-score 和准确率。表4表明AlexNet对scab的特征学习得不够好，检测效果较差；改进后的 Alex + Inception 模型识别比原来的 Alex 更好；更重要的是，原始的 VGG16 网络对每个特征的学习最差。对于这些四叶类型，所有网络的健康识别率最高，痂识别率最低。无论是每种叶子类型的准确率还是检测指标，我们的模型都取得了最好的结果。总的来说，我们的模型识别效果最好。

原文

译文

Table 3 shows the precision, recall, f1‐score, and accuracy of different models achieved for the four categories of apple images. The Table 4 shows that AlexNet does not learn the features of the scab well enough, and the detection effect is poor; the improved Alex + Inception model recognition is better than the original Alex; what is more, the original VGG16 network has the worst learning of each feature. For these four‐leaf types, all the networks have the best recognition rate for healthy and the lowest scab recognition rate. Regardless of the accuracy or the detection index of each leaf type, our model achieved the best results. In general, our model has the best recognition effect.

表 3 显示了针对四类苹果图像实现的不同模型的准确率、召回率、f1-score 和准确率。表4表明AlexNet对scab的特征学习得不够好，检测效果较差；改进后的 Alex + Inception 模型识别比原来的 Alex 更好；更重要的是，原始的 VGG16 网络对每个特征的学习最差。对于这些四叶类型，所有网络的健康识别率最高，痂识别率最低。无论是每种叶子类型的准确率还是检测指标，我们的模型都取得了最好的结果。总的来说，我们的模型识别效果最好。

在这里插入图片描述

3.2.Convergence Rate Analysis

原文	译文
The loss values in this work are calculated by cross entropy. Figure 7 shows the accuracy and loss values of the five models during training. The experimental results show that AlexNet, ResNet-34, GoogleNet, Alex + Inception, and our convolutional neural network converge within 60 training epochs, while VGG16 converges slowly. It can be found the proposed network structure converges in 10 training epochs, which is faster than the other five CNN models. The training process of GoogleNet is similar to the process of ResNet-34, and both converge after 20 training epochs, and AlexNet and the Alex + Inception model tend to be stable after 40 epochs.	这项工作中的损失值是通过交叉熵计算的。图 7 显示了训练过程中五个模型的准确率和损失值。实验结果表明，AlexNet、ResNet-34、GoogleNet、Alex + Inception 和我们的卷积神经网络在 60 个训练时期内收敛，而 VGG16 收敛缓慢。可以发现所提出的网络结构在 10 个训练时期内收敛，这比其他五个 CNN 模型更快。 GoogleNet 的训练过程与 ResNet-34 的过程类似，都在 20 个训练 epoch 后收敛，而 AlexNet 和 Alex + Inception 模型在 40 个 epoch 后趋于稳定。

原文

译文

The loss values in this work are calculated by cross entropy. Figure 7 shows the accuracy and loss values of the five models during training. The experimental results show that AlexNet, ResNet-34, GoogleNet, Alex + Inception, and our convolutional neural network converge within 60 training epochs, while VGG16 converges slowly. It can be found the proposed network structure converges in 10 training epochs, which is faster than the other five CNN models. The training process of GoogleNet is similar to the process of ResNet-34, and both converge after 20 training epochs, and AlexNet and the Alex + Inception model tend to be stable after 40 epochs.

这项工作中的损失值是通过交叉熵计算的。图 7 显示了训练过程中五个模型的准确率和损失值。实验结果表明，AlexNet、ResNet-34、GoogleNet、Alex + Inception 和我们的卷积神经网络在 60 个训练时期内收敛，而 VGG16 收敛缓慢。可以发现所提出的网络结构在 10 个训练时期内收敛，这比其他五个 CNN 模型更快。 GoogleNet 的训练过程与 ResNet-34 的过程类似，都在 20 个训练 epoch 后收敛，而 AlexNet 和 Alex + Inception 模型在 40 个 epoch 后趋于稳定。

在这里插入图片描述

3.3.Training Time and Parameters

原文	译文
Table 4 shows the number of parameters for each model and training time required when the model becomes stable. It can be found that the classical VGG16 model has the most parameters and the longest training time, the Alex + Inception model has the least training parameters, and AlexNet has the shortest training time. Our improved model can reduce 119,534,592 training parameters in comparison to the original VGG16 model. The convolutional neural network proposed in this work has fewer training parameters than AlexNet, ResNet34, and VGG16. The training time of the proposed model is 692 s, which is similar to that of ResNet34 and GoogleNet.	表 4 显示了每个模型的参数数量和模型稳定时所需的训练时间。可以发现经典的VGG16模型参数最多，训练时间最长，Alex+Inception模型训练参数最少，AlexNet训练时间最短。与原始 VGG16 模型相比，我们改进的模型可以减少 119,534,592 个训练参数。这项工作中提出的卷积神经网络的训练参数比 AlexNet、ResNet34 和 VGG16 少。所提出模型的训练时间为 692 s，与 ResNet34 和 GoogleNet 的训练时间相似。

原文

译文

Table 4 shows the number of parameters for each model and training time required when the model becomes stable. It can be found that the classical VGG16 model has the most parameters and the longest training time, the Alex + Inception model has the least training parameters, and AlexNet has the shortest training time. Our improved model can reduce 119,534,592 training parameters in comparison to the original VGG16 model. The convolutional neural network proposed in this work has fewer training parameters than AlexNet, ResNet34, and VGG16. The training time of the proposed model is 692 s, which is similar to that of ResNet34 and GoogleNet.

表 4 显示了每个模型的参数数量和模型稳定时所需的训练时间。可以发现经典的VGG16模型参数最多，训练时间最长，Alex+Inception模型训练参数最少，AlexNet训练时间最短。与原始 VGG16 模型相比，我们改进的模型可以减少 119,534,592 个训练参数。这项工作中提出的卷积神经网络的训练参数比 AlexNet、ResNet34 和 VGG16 少。所提出模型的训练时间为 692 s，与 ResNet34 和 GoogleNet 的训练时间相似。

3.4.Comparison of Optimal Algorithms

原文	译文
The optimization algorithm is of great importance for the model performance. In this work, the SGD optimization algorithm in the original VGG16 is replaced by the Adam optimization algorithmto improve the converge rate. Figure 8 shows the training process of these two optimization algorithmswith same learning rate of 1 × 10−5. The results show that the model using the Adam algorithm has a faster convergence speed. It can be found that the accuracy of testing is 98.03% when the SGD algorithm is used, while that of the Adam algorithm is 99.01%. From the loss curve in Figure 8, it can be seen that the Adam algorithm can converge quickly and is more stable than SGD.	优化算法对模型性能非常重要。在这项工作中，将原始 VGG16 中的 SGD 优化算法替换为 Adam 优化算法，以提高收敛速度。图 8 显示了这两种优化算法的训练过程，具有相同的 1 × 10−5 学习率。结果表明，采用Adam算法的模型具有更快的收敛速度。可以发现，使用SGD算法测试的准确率为98.03%，而Adam算法的准确率为99.01%。从图8的损失曲线可以看出，Adam算法收敛速度快，比SGD更稳定。

原文

译文

The optimization algorithm is of great importance for the model performance. In this work, the SGD optimization algorithm in the original VGG16 is replaced by the Adam optimization algorithmto improve the converge rate. Figure 8 shows the training process of these two optimization algorithmswith same learning rate of 1 × 10−5. The results show that the model using the Adam algorithm has a faster convergence speed. It can be found that the accuracy of testing is 98.03% when the SGD algorithm is used, while that of the Adam algorithm is 99.01%. From the loss curve in Figure 8, it can be seen that the Adam algorithm can converge quickly and is more stable than SGD.

优化算法对模型性能非常重要。在这项工作中，将原始 VGG16 中的 SGD 优化算法替换为 Adam 优化算法，以提高收敛速度。图 8 显示了这两种优化算法的训练过程，具有相同的 1 × 10−5 学习率。结果表明，采用Adam算法的模型具有更快的收敛速度。可以发现，使用SGD算法测试的准确率为98.03%，而Adam算法的准确率为99.01%。从图8的损失曲线可以看出，Adam算法收敛速度快，比SGD更稳定。

在这里插入图片描述

3.5.Data Augmentation

原文	译文
In this work, the dataset used herein includes only 2446 pictures, which is very small in comparison to that with which the VGG16 was pre-trained. In order to evaluate the performance of the proposed method, a data augmentation strategy is adopted to amplify the original dataset and test the classification performance on it. The augmented dataset is generated based on the original dataset by image geometric transformation, color changing, and noise adding, which increase the size of the test dataset from 2141 to 21,410.	在这项工作中，这里使用的数据集仅包含 2446 张图片，与 VGG16 预训练的图片相比非常小。为了评估所提出方法的性能，采用数据增强策略来放大原始数据集并测试其分类性能。增强数据集是在原始数据集的基础上通过图像几何变换、颜色变化和噪声添加生成的，将测试数据集的大小从 2141 增加到 21410。
Image rotation and flipping are two types of image geometric transformations where only the location of each pixel is changed. Rotating the pictures at different angles and flipping can expand the diversity of directions. It is generally difficult to capture each picture from different directions, and therefore to simulate this situation to eliminate the effect of direction on picture recognition, we rotated the original image around the center point by 90, 180, and 270 and when flipped horizontally. As shown in Figure 9, after rotation and flipping, the number of pictures increased by 4 times the original data set.	图像旋转和翻转是两种类型的图像几何变换，其中仅更改每个像素的位置。以不同的角度旋转图片和翻转可以扩展方向的多样性。通常很难从不同方向捕捉每张图片，因此为了模拟这种情况以消除方向对图片识别的影响，我们将原始图像围绕中心点旋转 90、180 和 270，并在水平翻转时进行。如图 9 所示，经过旋转和翻转后，图片数量增加了原始数据集的 4 倍。

原文

译文

In this work, the dataset used herein includes only 2446 pictures, which is very small in comparison to that with which the VGG16 was pre-trained. In order to evaluate the performance of the proposed method, a data augmentation strategy is adopted to amplify the original dataset and test the classification performance on it. The augmented dataset is generated based on the original dataset by image geometric transformation, color changing, and noise adding, which increase the size of the test dataset from 2141 to 21,410.

在这项工作中，这里使用的数据集仅包含 2446 张图片，与 VGG16 预训练的图片相比非常小。为了评估所提出方法的性能，采用数据增强策略来放大原始数据集并测试其分类性能。增强数据集是在原始数据集的基础上通过图像几何变换、颜色变化和噪声添加生成的，将测试数据集的大小从 2141 增加到 21410。

Image rotation and flipping are two types of image geometric transformations where only the location of each pixel is changed. Rotating the pictures at different angles and flipping can expand the diversity of directions. It is generally difficult to capture each picture from different directions, and therefore to simulate this situation to eliminate the effect of direction on picture recognition, we rotated the original image around the center point by 90, 180, and 270 and when flipped horizontally. As shown in Figure 9, after rotation and flipping, the number of pictures increased by 4 times the original data set.

图像旋转和翻转是两种类型的图像几何变换，其中仅更改每个像素的位置。以不同的角度旋转图片和翻转可以扩展方向的多样性。通常很难从不同方向捕捉每张图片，因此为了模拟这种情况以消除方向对图片识别的影响，我们将原始图像围绕中心点旋转 90、180 和 270，并在水平翻转时进行。如图 9 所示，经过旋转和翻转后，图片数量增加了原始数据集的 4 倍。

在这里插入图片描述

原文	译文
Adjusting the brightness, contrast, and hue of the image is another common image augmentation method widely used in image processing. During the process of image acquisition, pictures may be affected by different weather and exposed to different intensities of light, which possibly affects the experimental results. In order to simulate image collection under different light backgrounds, we adjusted the brightness and contrast, as shown in Figure 10, and the data was expanded by 4 times.	调整图像的亮度、对比度和色调是图像处理中广泛使用的另一种常见的图像增强方法。在图像采集过程中，图片可能会受到不同天气的影响，暴露在不同强度的光线下，这可能会影响实验结果。为了模拟不同光线背景下的图像采集，我们调整了亮度和对比度，如图10所示，数据扩大了4倍。

原文

译文

Adjusting the brightness, contrast, and hue of the image is another common image augmentation method widely used in image processing. During the process of image acquisition, pictures may be affected by different weather and exposed to different intensities of light, which possibly affects the experimental results. In order to simulate image collection under different light backgrounds, we adjusted the brightness and contrast, as shown in Figure 10, and the data was expanded by 4 times.

调整图像的亮度、对比度和色调是图像处理中广泛使用的另一种常见的图像增强方法。在图像采集过程中，图片可能会受到不同天气的影响，暴露在不同强度的光线下，这可能会影响实验结果。为了模拟不同光线背景下的图像采集，我们调整了亮度和对比度，如图10所示，数据扩大了4倍。

在这里插入图片描述

原文	译文
In the same experimental setup, the model we proposed is trained on the augmented 21,410 images and the final classification accuracy can reach 99.34%. When we used the original dataset to train the model, the accuracy rate can also reach 99.01%. Figure 12 shows the recognition accuracy. It can be seen that after the data expansion, all the measures have been slightly improved on the model proposed.	在相同的实验设置中，我们提出的模型在增强的 21,410 张图像上进行训练，最终分类准确率可以达到 99.34%。当我们使用原始数据集训练模型时，准确率也可以达到 99.01%。图 12 显示了识别准确率。可以看出，经过数据扩展后，所有的措施都在所提出的模型上略有改进。

原文

译文

In the same experimental setup, the model we proposed is trained on the augmented 21,410 images and the final classification accuracy can reach 99.34%. When we used the original dataset to train the model, the accuracy rate can also reach 99.01%. Figure 12 shows the recognition accuracy. It can be seen that after the data expansion, all the measures have been slightly improved on the model proposed.

在相同的实验设置中，我们提出的模型在增强的 21,410 张图像上进行训练，最终分类准确率可以达到 99.34%。当我们使用原始数据集训练模型时，准确率也可以达到 99.01%。图 12 显示了识别准确率。可以看出，经过数据扩展后，所有的措施都在所提出的模型上略有改进。

在这里插入图片描述

4.Conclusions

原文	译文
An improved convolution neural network model based on VGG16 is proposed in this work. The classifier of classical VGG16 network is modified by adding a batch normalization layer, a global average pooling layer, and a fully connected layer to accelerate convergence and reduce training parameters. The proposed model trains on 2141 apple leaves in the training set to identify apple leaf diseases. The experimental results show that the accuracy of the model test can reach 99.01% after 692 s training. Compared with the classical VGG16 network, the model parameters are reduced by 119,534,592, and the accuracy is improved by 6.3%.	本文提出了一种基于 VGG16 的改进卷积神经网络模型。经典 VGG16 网络的分类器通过添加批量归一化层、全局平均池化层和全连接层进行修改，以加速收敛并减少训练参数。提出的模型在训练集中的 2141 片苹果叶上进行训练，以识别苹果叶病。实验结果表明，经过692 s的训练，模型测试的准确率可以达到99.01%。与经典的VGG16网络相比，模型参数减少了119,534,592，准确率提高了6.3%。
Although the training time is longer than that of AlexNet and ResNet, our model has fewer parameters and a higher accuracy. Compared with GoogleNet and Alex + Inception, some parameters and training time are sacrificed, but our model has the highest accuracy of up to 99.01%. After data expansion, the accuracy of the model can be increased to 99.34%. The convolution neural network proposed in this work can identify apple leaf diseases quickly and accurately and provides a feasible scheme for identifying apple leaf diseases.	虽然训练时间比 AlexNet 和 ResNet 长，但我们的模型参数更少，准确率更高。与 GoogleNet 和 Alex + Inception 相比，牺牲了一些参数和训练时间，但我们的模型准确率最高，高达 99.01%。数据扩展后，模型的准确率可以提高到99.34%。本文提出的卷积神经网络可以快速准确地识别苹果叶片病害，为识别苹果叶片病害提供了可行的方案。
In the future, our work can be improved in the following aspects: (1) collecting more kinds and quantities of apple disease pictures to enrich the datasets to train better models, (2) trying other deep convolution neural networks to improve the accuracy and speed of recognition, (3) trying to run other deep learning methods and apply them to the real‐time detection of apple disease.	未来，我们的工作可以在以下几个方面进行改进：（1）收集更多种类和数量的苹果病害图片以丰富数据集以训练更好的模型，（2）尝试其他深度卷积神经网络以提高准确性和速度 (3) 尝试运行其他深度学习方法并将其应用于苹果病害的实时检测。

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > Superpoint Network for Point Cloud Oversegmentation论文阅读
下一篇 > python匿名函数lambda_python匿名函数lambda的问题？

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

Paper翻译：《Apple Leaf Diseases Recognition Based on An Improved Convolutional Neural Network》

Abstract

1.Introduction

2.Methods

2.1 Data

2.2. VGG16 and Transfer Learning

2.2.1. VGG16

2.2.2. Transfer Learning

2.3. Improved CNNs Based on VGG16

2.3.1. Global Average Pooling Layer (GAP)

2.3.2. Batch Normalization (BN)

2.3.3. Adaptive Moment Estimation

3.Results and Discussion

3.1.Comparison of Model Performance

3.2.Convergence Rate Analysis

3.3.Training Time and Parameters

3.4.Comparison of Optimal Algorithms

3.5.Data Augmentation

4.Conclusions

相关文章