L5W1作业2 字母级语言模型 - Dinosaurus land¶
欢迎来到恐龙大陆! 6500万年前,恐龙就已经存在,并且在该作业下它们又回来了。假设你负责一项特殊任务,领先的生物学研究人员正在创造新的恐龙品种,并计划将它们带入地球,而你的工作就是为这些新恐龙起名字。如果恐龙不喜欢它的名字,它可能会变得疯狂,所以需要明智地选择!
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4ZxkxcoH-1667719496845)(L5W1%E4%BD%9C%E4%B8%9A2%20%E5%AD%97%E6%AF%8D%E7%BA%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20-%20Dinosaurus%20land%C2%B6.assets/960.jpeg)]](https://img-blog.csdnimg.cn/946a9f7bdc354494a69765bbed893745.png)
幸运的是,你掌握了深度学习的一些知识,你将使用它来节省时间。你的助手已收集了他们可以找到的所有恐龙名称的列表,并将其编译到此dataset中。(请单击上一个链接查看)要创建新的恐龙名称,你将构建一个字母级语言模型来生成新名称。你的算法将学习不同的名称模式,并随机生成新名称。希望该算法可以使你和你的团队免受恐龙的愤怒!
完成此作业,你将学习:
- 如何存储文本数据以供RNN使用
- 如何在每个时间步采样预测并将其传递给下一个RNN单元以合成数据
- 如何建立一个字母级的文本生成循环神经网络
- 为什么梯度裁剪很重要
我们将从加载rnn_utils中为你提供的一些函数开始。具体来说,你可以访问诸如rnn_forward和rnn_backward之类的函数,这些函数与你在上一个作业中实现的函数等效。
评论
In [1]:
cd /home/kesci/input/deeplearning133797
/home/kesci/input/deeplearning133797
In [2]:
import numpy as np
from utils import *
import random
from random import shuffle
1 问题陈述
1.1 数据集和预处理
运行以下单元格以读取包含恐龙名称的数据集,创建唯一字符列表(例如a-z),并计算数据集和词汇量。
In [3]:
data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))
There are 19909 total characters and 27 unique characters in your data.
这些字符是a-z(26个字符)加上“\n”(换行符),在此作业中,其作用类似于我们在讲座中讨论的(句子结尾)标记,仅在此处表示恐龙名称的结尾,而不是句子的结尾。在下面的单元格中,我们创建一个python字典(即哈希表),以将每个字符映射为0-26之间的索引。我们还创建了第二个python字典,该字典将每个索引映射回对应的字符。这将帮助你找出softmax层的概率分布输出中哪个索引对应于哪个字符。下面的char_to_ix和ix_to_char是python字典。
In [4]:
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)
{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}
1.2 模型概述
你的模型将具有以下结构:
- 初始化参数
- 运行优化循环
- 正向传播以计算损失函数
- 反向传播以计算相对于损失函数的梯度
- 剪裁梯度以避免梯度爆炸
- 使用梯度下降方法更新参数。
- 返回学习的参数
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-v85MS728-1667719496846)(L5W1%E4%BD%9C%E4%B8%9A2%20%E5%AD%97%E6%AF%8D%E7%BA%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20-%20Dinosaurus%20land%C2%B6.assets/960-1667294036531-1.png)]](https://img-blog.csdnimg.cn/c19b4e073b724411ba41f94c662df1f1.png)
图1:循环神经网络,类似于你在上一个笔记本“手把手实现循环神经网络”中构建的内容。
在每个时间步,RNN都会根据给定的先前字符来预测下一个字符。数据集 X = ( x ⟨ 1 ⟩ , x ⟨ 2 ⟩ , . . . , x ⟨ T x ⟩ ) X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle}) X=(x⟨1⟩,x⟨2⟩,...,x⟨Tx⟩)是训练集中的字符列表,而 Y = ( y ⟨ 1 ⟩ , y ⟨ 2 ⟩ , . . . , y ⟨ T x ⟩ ) Y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle}) Y=(y⟨1⟩,y⟨2⟩,...,y⟨Tx⟩)使得每个时间步t,我们有 y ⟨ t ⟩ = x ⟨ t + 1 ⟩ y^{\langle t \rangle} = x^{\langle t+1 \rangle} y⟨t⟩=x⟨t+1⟩。
2 构建模型模块
在这一部分中,你将构建整个模型的两个重要模块:
- 梯度裁剪:避免梯度爆炸
- 采样:一种用于生成字符的技术
然后,你将应用这两个函数来构建模型。
2.1 在优化循环中裁剪梯度
在本节中,你将实现在优化循环中调用的clip函数。回想一下,你的总体循环结构通常由正向传播,损失计算,反向传播和参数更新组成。在更新参数之前,你将在需要时执行梯度裁剪,以确保你的梯度不会“爆炸”,这意味着要采用很大的值。
在下面的练习中,你将实现一个函数clip,该函数接收梯度字典,并在需要时返回裁剪后的梯度。梯度裁剪有多种方法。我们将使用简单的按元素裁剪程序,其中将梯度向量的每个元素裁剪为位于范围[-N,N]之间。通常,你将提供一个maxValue(例如10)。在此示例中,如果梯度向量的任何分量大于10,则将其设置为10;并且如果梯度向量的任何分量小于-10,则将其设置为-10。如果介于-10和10之间,则将其保留。
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RFVCnSCp-1667719496847)(L5W1%E4%BD%9C%E4%B8%9A2%20%E5%AD%97%E6%AF%8D%E7%BA%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20-%20Dinosaurus%20land%C2%B6.assets/960-1667294036531-2.png)]](https://img-blog.csdnimg.cn/ec40bc6422cd4738b52f2179ad76e809.png)
图2:在网络遇到轻微的“梯度爆炸”的情况下,使用与不使用梯度裁剪的梯度下降对比。
练习:实现以下函数以返回字典gradients的裁剪梯度。你的函数接受最大阈值,并返回裁剪后的梯度。你可以查看此hint,以获取有关如何裁剪numpy的示例。你将需要使用参数out = ...。
In [5]:
### GRADED FUNCTION: clipdef clip(gradients, maxValue):'''Clips the gradients' values between minimum and maximum.Arguments:gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValueReturns: gradients -- a dictionary with the clipped gradients.'''dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']### START CODE HERE #### clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)for gradient in [dWax, dWaa, dWya, db, dby]:np.clip(gradient,-maxValue , maxValue, out=gradient)### END CODE HERE ###gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}return gradients
In [6]:
np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.2971381536101662
gradients["db"][4] = [10.]
gradients["dby"][1] = [8.45833407]
预期输出:
gradients[“dWaa”][1][2] = 10.0
gradients[“dWax”][3][1] = -10.0
gradients[“dWya”][1][2] = 0.2971381536101662
gradients[“db”][4] = [10.]
gradients[“dby”][1] = [8.45833407]
2.2 采样
现在假设你的模型已经训练好。你想生成新文本(字符)。下图说明了生成过程:
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-k798lDfI-1667719496848)(L5W1%E4%BD%9C%E4%B8%9A2%20%E5%AD%97%E6%AF%8D%E7%BA%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20-%20Dinosaurus%20land%C2%B6.assets/960-1667294036531-3.png)]](https://img-blog.csdnimg.cn/bf90c9b7f69645f2ac1f6531cd2d8710.png)
图3:在此图中,我们假设模型已经训练好。我们在第一步中传入 x ⟨ 1 ⟩ = 0 ⃗ x^{\langle 1\rangle} = \vec{0} x⟨1⟩=0,然后让网络一次采样一个字符。
练习:实现以下的sample函数来采样字母。你需要执行4个步骤:
- 步骤1:将第一个"dummy"输入 x ⟨ 1 ⟩ = 0 ⃗ x^{\langle 1\rangle} = \vec{0} x⟨1⟩=0(零向量)传递给网络。这是我们生成任意字母之前的默认输入。我们还设置 a ⟨ 0 ⟩ = 0 ⃗ a^{\langle 0 \rangle} = \vec{0} a⟨0⟩=0。
- 步骤2:执行向正向传播的步骤,即可获得 a ⟨ 1 ⟩ a^{\langle 1 \rangle} a⟨1⟩ and y ^ ⟨ 1 ⟩ \hat{y}^{\langle 1 \rangle} y^⟨1⟩。以下是等式:
a ⟨ t + 1 ⟩ = tanh ( W a x x ⟨ t ⟩ + W a a a ⟨ t ⟩ + b ) (1) a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1} a⟨t+1⟩=tanh(Waxx⟨t⟩+Waaa⟨t⟩+b)(1)
z ⟨ t + 1 ⟩ = W y a a ⟨ t + 1 ⟩ + b y (2) z^{\langle t + 1 \rangle } = W_{ya} a^{\langle t + 1 \rangle } + b_y \tag{2} z⟨t+1⟩=Wyaa⟨t+1⟩+by(2)
y ^ ⟨ t + 1 ⟩ = s o f t m a x ( z ⟨ t + 1 ⟩ ) (3) \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3} y^⟨t+1⟩=softmax(z⟨t+1⟩)(3)
注意 y ^ ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle } y^⟨t+1⟩是一个(softmax)概率向量(其条目在0到1之间且总和为1)。 y ^ i ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle}_i y^i⟨t+1⟩ 表示由"i"索引的字符是下一个字符的概率。我们提供了一个softmax()函数供你使用。
- 步骤3:执行采样:根据 y ^ ⟨ t + 1 ⟩ \hat{y}^{\langle t+1 \rangle } y^⟨t+1⟩指定的概率分布,选择下一个字符的索引。这意味着,如果 y ^ i ⟨ t + 1 ⟩ = 0.16 \hat{y}^{\langle t+1 \rangle }_i = 0.16 y^i⟨t+1⟩=0.16,你将以16%的概率选择索引"i"。要实现它,你可以使用
np.random.choice。
以下是一个使用np.random.choice()的例子:
np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())
这意味着你将根据分布选择index:
P ( i n d e x = 0 ) = 0.1 , P ( i n d e x = 1 ) = 0.0 , P ( i n d e x = 2 ) = 0.7 , P ( i n d e x = 3 ) = 0.2 P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2 P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2。
- 步骤4:要在
sample()中实现的最后一步是覆盖变量x,该变量当前存储 x ⟨ t ⟩ x^{\langle t \rangle } x⟨t⟩,其值为 x ⟨ t + 1 ⟩ x^{\langle t+1 \rangle } x⟨t+1⟩。通过创建与预测字符相对应的独热向量以表示 x ⟨ t + 1 ⟩ x^{\langle t+1 \rangle } x⟨t+1⟩。然后,你将在步骤1中前向传播 x ⟨ t + 1 ⟩ x^{\langle t+1 \rangle } x⟨t+1⟩,并继续重复此过程,直到获得“\n”字符,表明你已经到达恐龙名称的末尾。
In [7]:
# GRADED FUNCTION: sampledef sample(parameters, char_to_ix, seed):"""Sample a sequence of characters according to a sequence of probability distributions output of the RNNArguments:parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. char_to_ix -- python dictionary mapping each character to an index.seed -- used for grading purposes. Do not worry about it.Returns:indices -- a list of length n containing the indices of the sampled characters."""# Retrieve parameters and relevant shapes from "parameters" dictionaryWaa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']vocab_size = by.shape[0]n_a = Waa.shape[1]### START CODE HERE #### Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)x = np.zeros((vocab_size,1))# Step 1': Initialize a_prev as zeros (≈1 line)a_prev = np.zeros((n_a,1))# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)indices = []# Idx is a flag to detect a newline character, we initialize it to -1idx = -1 # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well # trained model), which helps debugging and prevents entering an infinite loop. counter = 0newline_character = char_to_ix['\n']while (idx != newline_character and counter != 50):# Step 2: Forward propagate x using the equations (1), (2) and (3)a = np.tanh(np.dot(Wax,x)+np.dot(Waa,a_prev)+b)z = np.dot(Wya,a)+byy = softmax(z)# for grading purposesnp.random.seed(counter+seed) # Step 3: Sample the index of a character within the vocabulary from the probability distribution yidx = np.random.choice(range(len(y)),p=y.ravel())# Append the index to "indices"indices.append(idx)# Step 4: Overwrite the input character as the one corresponding to the sampled index.x = np.zeros((vocab_size,1))x[idx] = 1# Update "a_prev" to be "a"a_prev = a# for grading purposesseed += 1counter +=1### END CODE HERE ###if (counter == 50):indices.append(char_to_ix['\n'])return indices
In [8]:
np.random.seed(2)
n, n_a = 20, 100
a0 = np.random.randn(n_a, 1)
i0 = 1 # first character is ix_to_char[i0]
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}indices = sample(parameters, char_to_ix, 0)
print("Sampling:")
print("list of sampled indices:", indices)
print("list of sampled characters:", [ix_to_char[i] for i in indices])
Sampling:
list of sampled indices: [18, 2, 26, 0]
list of sampled characters: ['r', 'b', 'z', '\n']
预期输出:
Sampling:
list of sampled indices: [18, 2, 26, 0]
list of sampled characters: [‘r’, ‘b’, ‘z’, ‘\n’]
3 建立语言模型
现在是时候建立用于文字生成的字母级语言模型了。
3.1 梯度下降
在本部分中,你将实现一个函数,该函数执行随机梯度下降的一个步骤(梯度裁剪)。你将一次查看一个训练示例,因此优化算法为随机梯度下降。提醒一下,以下是RNN常见的优化循环的步骤:
- 通过RNN正向传播以计算损失
- 随时间反向传播以计算相对于参数的损失梯度
- 必要时裁剪梯度
- 使用梯度下降更新参数
练习:实现此优化过程(随机梯度下降的一个步骤)。
我们为你提供了以下函数:
def rnn_forward(X, Y, a_prev, parameters): """ Performs the forward propagation through the RNN and computes the cross-entropy loss. It returns the loss' value as well as a "cache" storing values to be used in the backpropagation.""" .... return loss, cache def rnn_backward(X, Y, parameters, cache): """ Performs the backward propagation through time to compute the gradients of the loss with respect to the parameters. It returns also all the hidden states.""" ... return gradients, a def update_parameters(parameters, gradients, learning_rate): """ Updates parameters using the Gradient Descent Update Rule.""" ... return parameters
In [9]:
# GRADED FUNCTION: optimizedef optimize(X, Y, a_prev, parameters, learning_rate = 0.01):"""Execute one step of the optimization to train the model.Arguments:X -- list of integers, where each integer is a number that maps to a character in the vocabulary.Y -- list of integers, exactly the same as X but shifted one index to the left.a_prev -- previous hidden state.parameters -- python dictionary containing:Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)b -- Bias, numpy array of shape (n_a, 1)by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)learning_rate -- learning rate for the model.Returns:loss -- value of the loss function (cross-entropy)gradients -- python dictionary containing:dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)db -- Gradients of bias vector, of shape (n_a, 1)dby -- Gradients of output bias vector, of shape (n_y, 1)a[len(X)-1] -- the last hidden state, of shape (n_a, 1)"""### START CODE HERE #### Forward propagate through time (≈1 line)loss, cache = rnn_forward(X,Y,a_prev,parameters)# Backpropagate through time (≈1 line)gradients, a = rnn_backward(X,Y,parameters,cache)# Clip your gradients between -5 (min) and 5 (max) (≈1 line)gradients = clip(gradients,5)# Update parameters (≈1 line)parameters = update_parameters(parameters,gradients,learning_rate)### END CODE HERE ###return loss, gradients, a[len(X)-1]
In [10]:
np.random.seed(1)
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26]loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])
Loss = 126.50397572165363
gradients["dWaa"][1][2] = 0.19470931534719205
np.argmax(gradients["dWax"]) = 93
gradients["dWya"][1][2] = -0.007773876032003275
gradients["db"][4] = [-0.06809825]
gradients["dby"][1] = [0.01538192]
a_last[4] = [-1.]
预期输出:
Loss = 126.50397572165363
gradients[“dWaa”][1][2] = 0.19470931534719205
np.argmax(gradients[“dWax”]) = 93
gradients[“dWya”][1][2] = -0.007773876032003275
gradients[“db”][4] = [-0.06809825]
gradients[“dby”][1] = [0.01538192]
a_last[4] = [-1.]
3.2 训练模型
给定恐龙名称数据集,我们将数据集的每一行(一个名称)用作一个训练示例。每100步随机梯度下降,你将抽样10个随机选择的名称,以查看算法的运行情况。请记住要对数据集进行混洗,以便随机梯度下降以随机顺序访问示例。
练习:按照说明进行操作并实现model()。当examples [index]包含一个恐龙名称(字符串)时,创建示例(X,Y),可以使用以下方法:
index = j % len(examples) X = [None] + [char_to_ix[ch] for ch in examples[index]] Y = X[1:] + [char_to_ix["\n"]]
注意,我们使用:index= j % len(examples),其中j = 1....num_iterations,以确保examples [index]始终是有效的语句(index小于len(examples))。
X的第一个条目为None将被rnn_forward()解释为设置 x ⟨ 0 ⟩ = 0 ⃗ x^{\langle 0 \rangle} = \vec{0} x⟨0⟩=0。此外,这确保了Y等于X,但向左移动了一步,并附加了“\n”以表示恐龙名称的结尾。
In [11]:
# GRADED FUNCTION: modeldef model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):"""Trains the model and generates dinosaur names. Arguments:data -- text corpusix_to_char -- dictionary that maps the index to a characterchar_to_ix -- dictionary that maps a character to an indexnum_iterations -- number of iterations to train the model forn_a -- number of units of the RNN celldino_names -- number of dinosaur names you want to sample at each iteration. vocab_size -- number of unique characters found in the text, size of the vocabularyReturns:parameters -- learned parameters"""# Retrieve n_x and n_y from vocab_sizen_x, n_y = vocab_size, vocab_size# Initialize parametersparameters = initialize_parameters(n_a, n_x, n_y)# Initialize loss (this is required because we want to smooth our loss, don't worry about it)loss = get_initial_loss(vocab_size, dino_names)# Build list of all dinosaur names (training examples).with open("dinos.txt") as f:examples = f.readlines()examples = [x.lower().strip() for x in examples]# Shuffle list of all dinosaur namesshuffle(examples)# Initialize the hidden state of your LSTMa_prev = np.zeros((n_a, 1))# Optimization loopfor j in range(num_iterations):### START CODE HERE #### Use the hint above to define one training example (X,Y) (≈ 2 lines)index = j%len(examples)X = [None] + [char_to_ix[ch] for ch in examples[index]]Y = X[1:] + [char_to_ix["\n"]]# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters# Choose a learning rate of 0.01curr_loss, gradients, a_prev = optimize(X,Y,a_prev,parameters,learning_rate=0.01) ### END CODE HERE #### Use a latency trick to keep the loss smooth. It happens here to accelerate the training.loss = smooth(loss, curr_loss)# Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properlyif j % 2000 == 0:print('Iteration: %d, Loss: %f' % (j, loss) + '\n')# The number of dinosaur names to printseed = 0for name in range(dino_names):# Sample indices and print themsampled_indices = sample(parameters, char_to_ix, seed)print_sample(sampled_indices, ix_to_char)seed += 1 # To get the same result for grading purposed, increment the seed by one. print('\n')return parameters
运行以下单元格,你应该观察到模型在第一次迭代时输出看似随机的字符。经过数千次迭代后,你的模型应该学会生成看起来合理的名称。
In [12]:
parameters = model(data, ix_to_char, char_to_ix)
Iteration: 0, Loss: 23.070859Nkzxwtdmfqoeyhsqwasjkjvu
Kneb
Kzxwtdmfqoeyhsqwasjkjvu
Neb
Zxwtdmfqoeyhsqwasjkjvu
Eb
XwtdmfqoeyhsqwasjkjvuIteration: 2000, Loss: 27.986441Lhusluinasaus
Hiba
Hvrosaurus
Lacalosalapsauruskolaybhis
Xusganclolveros
A
TosIteration: 4000, Loss: 25.995355Onytosaurus
Klecahus
Lytosaurus
Oia
Wusmcheopeuroshaschitochushelamalue
Ca
ToraperohurusIteration: 6000, Loss: 24.776055Phyusodonlonunosiargilus
Llecakptia
Lyussaurus
Pecahosaperthuranus
Xustaokoraurus
Da
TrrasaurusIteration: 8000, Loss: 24.122363Nhyusiandopeunoshapkoptoa
Klecaisaurus
Lwusoceosaurus
Ndaaerka
Xusraohoraviraucorantrantixalapelus
Daaerokachusheiivia
TrraohopeurosarresIteration: 10000, Loss: 23.825352Niwussaurus
Kieeahosaurus
Lustreolopeus
Necberte
Xussaurosaurus
Daberteg
TroenesaurusIteration: 12000, Loss: 23.432078Niwusialfsegyhustatloptochustnhaleitanbaphaer
Klecaertegaosaurus
Kutrochesteurortathonnochustomalelugamang
Ngcagosaurus
Xustameptius
Dabbosaurus
TosaurusIteration: 14000, Loss: 23.387299Nhysscanborex
Inee
Iusosaurus
Necagosaurus
Xprodonophus
Ca
TrodonophusIteration: 16000, Loss: 23.158160Nhyusia
Licaaisil
Lustolmashauhorratosaurus
Ola
Xstreolosaurus
Daalosaurus
TrocheosaurusIteration: 18000, Loss: 23.023754Ontosaurus
Licechosaurus
Lustononio
Oncalosaurus
Xstononiobus
Daakosaurus
ToraposaurusIteration: 20000, Loss: 22.963849Phyusbceismeulosparnimus
Lideberon
Lustrhong
Padagosaurus
Xusphelosaurus
Edalosaurus
TrodonisIteration: 22000, Loss: 22.914431Onyxinaphosaurus
Kcacaitia
Kusssaurus
Ona
Yusianguravarisaurus
Ca
TrocemptotaururusIteration: 24000, Loss: 22.790483Ngyxnmangnictitrs
Klacalosaurus
Kutqpangosaurus
Nabadps
Xusmandosaurus
Daadosaurus
TorandosIteration: 26000, Loss: 22.786011Ngytosaurus
Jiccalosaurus
Kuspramanopuosaurus
Nec
Xprocheptes
Ca
TorapiosaurusIteration: 28000, Loss: 22.737789Nixrsialgosaurus
Llecalosaurus
Lussperatops
Neeahosaurus
Xushanfosaurus
Daaisul
TrodonIteration: 30000, Loss: 22.673867Mawtosaurus
Inga
Jusspanchodus
Macaesmekanosaurus
Xosiangosaurus
Daalosaurus
TorbikosaurusIteration: 32000, Loss: 22.391556Phustonghoratermteranosatrus
Lelbakus
Musurepiordus
Pehaeropeltylurenus
Xusterissaurus
Elaeosaurus
TorclisaurusIteration: 34000, Loss: 22.615709Pettosaurus
Lidacerosaurus
Lurosaurus
Paiaeosaurus
Xuspanasaurus
Dabasoma
Trodonsbhunosianeosaurus
结论
你可以看到,在训练即将结束时,你的算法已开始生成合理的恐龙名称。刚开始时,它会生成随机字符,但是到最后,你会看到恐龙名字的结尾很酷。运行该算法更长时间,并调整超参数来看看是否可以获得更好的结果。我们的实现产生了一些非常酷的名称,例如“maconucon”,“marloralus”和“macingsersaurus”。你的模型还有望了解到恐龙名称往往以saurus,don,aura,tor等结尾。
如果你的模型生成了一些不酷的名字,请不要完全怪罪模型-并非所有实际的恐龙名字听起来都很酷。(例如,dromaeosauroides是实际存在的恐龙名称,并且也在训练集中。)但是此模型应该给你提供了一组可以从中挑选的候选名字!
该作业使用了相对较小的数据集,因此你可以在CPU上快速训练RNN。训练英语模型需要更大的数据集,并且通常需要更多的计算,在GPU上也要运行多个小时。我们使用恐龙的名字已经有一段时间了,到目前为止,我们最喜欢的名字是great, undefeatable,且fierce的:Mangosaurus!
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LQqBP1Me-1667719496849)(L5W1%E4%BD%9C%E4%B8%9A2%20%E5%AD%97%E6%AF%8D%E7%BA%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20-%20Dinosaurus%20land%C2%B6.assets/960-1667294036531-4.jpeg)]](https://img-blog.csdnimg.cn/97a2b037fc824e668d3d982416172036.png)
4 像莎士比亚一样创作
该笔记本的其余部分是可选的,尚未评分,但我们希望你都尝试做一下,因为它非常有趣且内容丰富。
一个类似(但更复杂)的任务是生成莎士比亚诗歌。无需从恐龙名称的数据集中学习,而是使用莎士比亚诗歌集。使用LSTM单元,你可以学习跨文本中许多字符的长期依赖关系。例如,某个字符出现在序列的某个地方可能会影响序列后面的其他字符。这些长期依赖关系对于恐龙名称来说不太重要,因为它们的名称很短。
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dG5ueACj-1667719496850)(L5W1%E4%BD%9C%E4%B8%9A2%20%E5%AD%97%E6%AF%8D%E7%BA%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20-%20Dinosaurus%20land%C2%B6.assets/960-1667294036532-5.jpeg)]](https://img-blog.csdnimg.cn/584a378ff8f64706a929a78af23d1bd3.png)
让我们成为诗人!
我们已经用Keras实现了莎士比亚诗歌生成器。运行以下单元格以加载所需的软件包和模型。这可能需要几分钟的时间。
In [13]:
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io
Using TensorFlow backend.
Loading text data...
Creating training set...
number of training examples: 31412
Vectorizing training set...
Loading model...
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
/opt/conda/lib/python3.6/site-packages/keras/engine/saving.py:327: UserWarning: Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.warnings.warn('Error in loading the saved optimizer '
为了节省你的时间,我们已经在莎士比亚的十四行诗"The Sonnets"诗歌集上训练了大约1000个epoch的模型。
让我们再训练模型完成一个新epoch,这也将花费几分钟。你可以运行generate_output,这将提示你输入小于40个字符的句子。这首诗将从你输入的句子开始,我们的RNN-Shakespeare将为你完成其余的部分!例如,尝试"Forsooth this maketh no sense "(不要输入引号)。根据是否在末尾加上空格,你的结果也可能会有所不同,两种方法都应尝试,也可以尝试其他输入法。
In [14]:
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])
Epoch 1/1
31412/31412 [==============================] - 138s 4ms/step - loss: 2.7218
Out[14]:
<keras.callbacks.History at 0x7fea566cf278>
In [17]:
# Run this cell to try with different inputs without having to re-train the model
generate_output()
Here is your poem: to be or not to be.
,
thu all the dase widh more manthle to doing,
dethought mine sunde thes it youl has lone love,
thas nother miunter habll i proy my tond,
astore self-efany nath's wordd,
by holl give for true every brifl to thee,
the hatth love thoughtrild
shy bist the eyes in my sorled not see of,
shy with rove mayst as my me whom must she trise.
his night bit mas my praire fired reon me.
do you khom whee a s
RNN-Shakespeare模型与你为恐龙名称构建的模型非常相似。唯一的区别是:
- LSTM代替基本的RNN来捕获更远的依赖
- 模型是更深的堆叠的LSTM模型(2层)
- 使用Keras而不是python来简化代码
如果你想了解更多信息,还可以在GitHub上查看Keras Team的文本生成实现:https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
祝贺你完成本笔记本作业!
参考:
- This exercise took inspiration from Andrej Karpathy’s implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy’s blog post.
- For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
