0%

机器学习—— RBF 神经网络

参考:

概念

RBF 神经网络 (RBF Neural Network) 是一种特殊的神经网络模型。这种神经网络通常有两个全连接层,在隐藏层中使用了高斯径向基函数 (Gaussian Radial Basis Function) 作为处理函数,如图所示:

RBF 神经网络

高斯径向基

高斯径向基来源于高斯分布。

高斯分布 (Gaussian Distribution) 是统计学中最常见的分布模式,它可以用以下公式表示: \[ \mathcal{N}(x: \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \] 其中\(\mu\)代表分布的平均值,\(\sigma\)代表分布的标准差。

任何函数都可以用高斯分布的线性组合来进行近似。如图所示,使用不同平均值和标准差的高斯函数的组合可以对一条非线性函数进行回归。

高斯分布拟合非线性函数

获取平均值

可以使用 K-平均聚类算法获得高斯函数的平均值。

K-平均聚类算法 (K-means Clustering) 是一种对样本数据进行聚类。聚类任务将样本分为若干个子集,以便确定各个高斯分布的位置。这种聚类算法可以由以下公式表示: \[ \underset{\mathbf{S}}{\mathrm{argmin}}\sum_{i=1}^k\sum_{\mathbf{X}\in S_i}\lVert \boldsymbol{x}-\boldsymbol{\mu}_i\rVert^2 \] 其中\(\mathbf{x} = (\boldsymbol{x}_1, \boldsymbol{x}_2, \dots, \boldsymbol{x}_n)\)是观测样本数据集,\(k\)为聚类的总集合数,\(\boldsymbol{\mu}_i\)为第\(i\)个集合的所有点的平均值。

K-平均聚类算法的主要过程可以表示为:

  1. 随机生成k个聚类中心点 (cluster centroids) \(\mu_1, \mu_2, \dots, \mu_k \in \Bbb{R}^n\)
  2. 按照公式 \[\boldsymbol{c}^{(i)} := \underset{j}{\mathrm{argmin}}\lVert \boldsymbol{x}^{(i)} - \boldsymbol{\mu}_j\rVert^2\] 获取每一个样本\(i\)的所属的类;
  3. 按照公式 \[\mu_j := \frac{\sum_{i=1}^m1\{\boldsymbol{c}^{(i)} = j\}\boldsymbol{x}^{(i)}}{\sum_{i=1}^m1\{\boldsymbol{c}^{(i)} = j\}}\] 求取每一个类\(j\)的新的中心;
  4. 重复步骤2和步骤3,直至收敛。

标准差的确定

高斯函数的标准差的确定有两种方法:

  • 使用每个聚类各自的标准差;
  • 使用整体标准差\(\sigma = \frac{d_{\max}}{\sqrt{2k}}\) 其中\(d_{\max}\)表示两个聚类中心间的最大距离,\(k\)表示聚类中心的个数。

RBF 神经网络的反向传递

对于每一个输入\(\boldsymbol{x}\),RBF 神经网络的输出为 \[ F(\boldsymbol{x}) = \sum_{j=1}^k\omega_j\phi_j(\boldsymbol{x}, \boldsymbol{c_j}) + b \] 式中\(\omega_j\)表示权重,\(b\)表示偏置,\(k\)表示聚类数,\(\phi_j(\cdot)\)是高斯 RBF 函数: \[ \phi_j(\boldsymbol{x}, \boldsymbol{c}_j) = \exp(\frac{-\lVert\boldsymbol{x - c}_j\rVert^2}{2\sigma^2_j}) \]

代价函数: \[ C = \sum_{i=1}^N(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))^2 \]

\(\boldsymbol{\omega}_j\)的更新方法: \[ \begin{aligned} &\begin{aligned} \frac{\partial C}{\partial\omega_j} &= \frac{\partial C}{\partial F}\frac{\partial F}{\partial\omega_j} \\ &= \frac{\partial}{\partial F}[\sum_{i=1}^N(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))^2] \cdot \frac{\partial}{\partial\omega_j}[\sum_{j=0}^K\omega_j\phi_j(\boldsymbol{x}, \boldsymbol{c}_j) + b]\\ &= -(y^{(i)}-F(\boldsymbol{x}^{(i)})) \cdot \phi_j(\boldsymbol{x}, \boldsymbol{c}_j) \end{aligned}\\ &\omega_j\leftarrow\omega_j + \eta(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))\phi_j(\boldsymbol{x}, \boldsymbol{c}_j) \end{aligned} \]

类似地,\(b\)的更新方法: \[ \begin{aligned} &\begin{aligned} \frac{\partial C}{\partial b} &= \frac{\partial C}{\partial F}\frac{\partial F}{\partial b} \\ &= \frac{\partial}{\partial F}[\sum_{i=1}^N(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))^2] \cdot \frac{\partial}{\partial b}[\sum_{j=0}^K\omega_j\phi_j(\boldsymbol{x}, \boldsymbol{c}_j) + b]\\ &= -(y^{(i)}-F(\boldsymbol{x}^{(i)})) \cdot 1 \end{aligned}\\ &b\leftarrow b + \eta(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)})) \end{aligned} \]

RBF 神经网络的 Python 实现

RBF 核函数:

1
2
def rbf(x, c, s):
return np.exp(-1 / (2 * s**2) * (x-c)**2)

K-平均聚类算法以及标准差的求取:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def kmeans(X, k):
"""Performs k-means clustering for 1D input

Arguments:
X {ndarray} -- A Mx1 array of inputs
k {int} -- Number of clusters

Returns:
ndarray -- A kx1 array of final cluster centers
"""

# randomly select initial clusters from input data
clusters = np.random.choice(np.squeeze(X), size=k)
prevClusters = clusters.copy()
stds = np.zeros(k)
converged = False

while not converged:
"""
compute distances for each cluster center to each point
where (distances[i, j] represents the distance between the ith point and jth cluster)
"""
distances = np.squeeze(np.abs(X[:, np.newaxis] - clusters[np.newaxis, :]))

# find the cluster that's closest to each point
closestCluster = np.argmin(distances, axis=1)

# update clusters by taking the mean of all of the points assigned to that cluster
for i in range(k):
pointsForCluster = X[closestCluster == i]
if len(pointsForCluster) > 0:
clusters[i] = np.mean(pointsForCluster, axis=0)

# converge if clusters haven't moved
converged = np.linalg.norm(clusters - prevClusters) < 1e-6
prevClusters = clusters.copy()

distances = np.squeeze(np.abs(X[:, np.newaxis] - clusters[np.newaxis, :]))
closestCluster = np.argmin(distances, axis=1)

clustersWithNoPoints = []
for i in range(k):
pointsForCluster = X[closestCluster == i]
if len(pointsForCluster) < 2:
# keep track of clusters with no points or 1 point
clustersWithNoPoints.append(i)
continue
else:
stds[i] = np.std(X[closestCluster == i])

# if there are clusters with 0 or 1 points, take the mean std of the other clusters
if len(clustersWithNoPoints) > 0:
pointsToAverage = []
for i in range(k):
if i not in clustersWithNoPoints:
pointsToAverage.append(X[closestCluster == i])
pointsToAverage = np.concatenate(pointsToAverage).ravel()
stds[clustersWithNoPoints] = np.mean(np.std(pointsToAverage))

return clusters, stds

建立 RBF 神经网络的类(单输出):

多个输出时应将权重\(\omega\)更改为一个矩阵,并将偏置\(b\)更改为一个向量。

1
2
3
4
5
6
7
8
9
10
11
class RBFNet(object):
"""Implementation of a Radial Basis Function Network"""
def __init__(self, k=2, lr=0.01, epochs=100, rbf=rbf, inferStds=True):
self.k = k
self.lr = lr
self.epochs = epochs
self.rbf = rbf
self.inferStds = inferStds

self.w = np.random.randn(k)
self.b = np.random.randn(1)

训练函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def fit(self, X, y):
if self.inferStds:
# compute stds from data
self.centers, self.stds = kmeans(X, self.k)
else:
# use a fixed std
self.centers, _ = kmeans(X, self.k)
dMax = max([np.abs(c1 - c2) for c1 in self.centers for c2 in self.centers])
self.stds = np.repeat(dMax / np.sqrt(2*self.k), self.k)

# training
for epoch in range(self.epochs):
for i in range(X.shape[0]):
# forward pass
a = np.array([self.rbf(X[i], c, s) for c, s, in zip(self.centers, self.stds)])
F = a.T.dot(self.w) + self.b

loss = (y[i] - F).flatten() ** 2
print('Loss: {0:.2f}'.format(loss[0]))

# backward pass
error = -(y[i] - F).flatten()

# online update
self.w = self.w - self.lr * a * error
self.b = self.b - self.lr * error

预测函数:

1
2
3
4
5
6
7
def predict(self, X):
y_pred = []
for i in range(X.shape[0]):
a = np.array([self.rbf(X[i], c, s) for c, s, in zip(self.centers, self.stds)])
F = a.T.dot(self.w) + self.b
y_pred.append(F)
return np.array(y_pred)

具体实现过程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# sample inputs and add noise
NUM_SAMPLES = 100
X = np.random.uniform(0., 1., NUM_SAMPLES)
X = np.sort(X, axis=0)
noise = np.random.uniform(-0.1, 0.1, NUM_SAMPLES)
y = np.sin(2 * np.pi * X) + noise

rbfnet = RBFNet(lr=1e-2, k=2)
rbfnet.fit(X, y)

y_pred = rbfnet.predict(X)

plt.plot(X, y, '-o', label='true')
plt.plot(X, y_pred, '-o', label='RBF-Net')
plt.legend()

plt.tight_layout()
plt.show()

训练结果:

训练结果