参考：

https://pythonmachinelearning.pro/using-neural-networks-for-regression-radial-basis-function-networks/

概念

RBF 神经网络 (RBF Neural Network) 是一种特殊的神经网络模型。这种神经网络通常有两个全连接层，在隐藏层中使用了高斯径向基函数 (Gaussian Radial Basis Function) 作为处理函数，如图所示：

高斯径向基

高斯径向基来源于高斯分布。

高斯分布 (Gaussian Distribution) 是统计学中最常见的分布模式，它可以用以下公式表示： \[ \mathcal{N}(x: \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \] 其中\(\mu\)代表分布的平均值，\(\sigma\)代表分布的标准差。

任何函数都可以用高斯分布的线性组合来进行近似。如图所示，使用不同平均值和标准差的高斯函数的组合可以对一条非线性函数进行回归。

获取平均值

可以使用 K-平均聚类算法获得高斯函数的平均值。

K-平均聚类算法 (K-means Clustering) 是一种对样本数据进行聚类。聚类任务将样本分为若干个子集，以便确定各个高斯分布的位置。这种聚类算法可以由以下公式表示： \[ \underset{\mathbf{S}}{\mathrm{argmin}}\sum_{i=1}^k\sum_{\mathbf{X}\in S_i}\lVert \boldsymbol{x}-\boldsymbol{\mu}_i\rVert^2 \] 其中\(\mathbf{x} = (\boldsymbol{x}_1, \boldsymbol{x}_2, \dots, \boldsymbol{x}_n)\)是观测样本数据集，\(k\)为聚类的总集合数，\(\boldsymbol{\mu}_i\)为第\(i\)个集合的所有点的平均值。

K-平均聚类算法的主要过程可以表示为：

随机生成k个聚类中心点 (cluster centroids) \(\mu_1, \mu_2, \dots, \mu_k \in \Bbb{R}^n\)；

按照公式 \[\boldsymbol{c}^{(i)} := \underset{j}{\mathrm{argmin}}\lVert \boldsymbol{x}^{(i)} - \boldsymbol{\mu}_j\rVert^2\] 获取每一个样本\(i\)的所属的类；

按照公式 \[\mu_j := \frac{\sum_{i=1}^m1\{\boldsymbol{c}^{(i)} = j\}\boldsymbol{x}^{(i)}}{\sum_{i=1}^m1\{\boldsymbol{c}^{(i)} = j\}}\] 求取每一个类\(j\)的新的中心；

重复步骤2和步骤3，直至收敛。

标准差的确定

高斯函数的标准差的确定有两种方法：

使用每个聚类各自的标准差；
使用整体标准差\(\sigma = \frac{d_{\max}}{\sqrt{2k}}\) 其中\(d_{\max}\)表示两个聚类中心间的最大距离，\(k\)表示聚类中心的个数。

RBF 神经网络的反向传递

对于每一个输入\(\boldsymbol{x}\)，RBF 神经网络的输出为 \[ F(\boldsymbol{x}) = \sum_{j=1}^k\omega_j\phi_j(\boldsymbol{x}, \boldsymbol{c_j}) + b \] 式中\(\omega_j\)表示权重，\(b\)表示偏置，\(k\)表示聚类数，\(\phi_j(\cdot)\)是高斯 RBF 函数： \[ \phi_j(\boldsymbol{x}, \boldsymbol{c}_j) = \exp(\frac{-\lVert\boldsymbol{x - c}_j\rVert^2}{2\sigma^2_j}) \]

代价函数： \[ C = \sum_{i=1}^N(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))^2 \]

\(\boldsymbol{\omega}_j\)的更新方法： \[ \begin{aligned} &\begin{aligned} \frac{\partial C}{\partial\omega_j} &= \frac{\partial C}{\partial F}\frac{\partial F}{\partial\omega_j} \\ &= \frac{\partial}{\partial F}[\sum_{i=1}^N(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))^2] \cdot \frac{\partial}{\partial\omega_j}[\sum_{j=0}^K\omega_j\phi_j(\boldsymbol{x}, \boldsymbol{c}_j) + b]\\ &= -(y^{(i)}-F(\boldsymbol{x}^{(i)})) \cdot \phi_j(\boldsymbol{x}, \boldsymbol{c}_j) \end{aligned}\\ &\omega_j\leftarrow\omega_j + \eta(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))\phi_j(\boldsymbol{x}, \boldsymbol{c}_j) \end{aligned} \]

类似地，\(b\)的更新方法： \[ \begin{aligned} &\begin{aligned} \frac{\partial C}{\partial b} &= \frac{\partial C}{\partial F}\frac{\partial F}{\partial b} \\ &= \frac{\partial}{\partial F}[\sum_{i=1}^N(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)}))^2] \cdot \frac{\partial}{\partial b}[\sum_{j=0}^K\omega_j\phi_j(\boldsymbol{x}, \boldsymbol{c}_j) + b]\\ &= -(y^{(i)}-F(\boldsymbol{x}^{(i)})) \cdot 1 \end{aligned}\\ &b\leftarrow b + \eta(\boldsymbol{y}^{(i)}-F(\boldsymbol{x}^{(i)})) \end{aligned} \]

RBF 神经网络的 Python 实现

RBF 核函数：

1 2	def rbf(x, c, s): return np.exp(-1 / (2 * s*2) (x-c)**2)

K-平均聚类算法以及标准差的求取：

def kmeans(X, k):
    """Performs k-means clustering for 1D input

    Arguments:
        X {ndarray} -- A Mx1 array of inputs
        k {int} -- Number of clusters

    Returns:
        ndarray -- A kx1 array of final cluster centers
    """

    # randomly select initial clusters from input data
    clusters = np.random.choice(np.squeeze(X), size=k)
    prevClusters = clusters.copy()
    stds = np.zeros(k)
    converged = False

    while not converged:
        """
        compute distances for each cluster center to each point
        where (distances[i, j] represents the distance between the ith point and jth cluster)
        """
        distances = np.squeeze(np.abs(X[:, np.newaxis] - clusters[np.newaxis, :]))

        # find the cluster that's closest to each point
        closestCluster = np.argmin(distances, axis=1)

        # update clusters by taking the mean of all of the points assigned to that cluster
        for i in range(k):
            pointsForCluster = X[closestCluster == i]
            if len(pointsForCluster) > 0:
                clusters[i] = np.mean(pointsForCluster, axis=0)

        # converge if clusters haven't moved
        converged = np.linalg.norm(clusters - prevClusters) < 1e-6
        prevClusters = clusters.copy()

    distances = np.squeeze(np.abs(X[:, np.newaxis] - clusters[np.newaxis, :]))
    closestCluster = np.argmin(distances, axis=1)

    clustersWithNoPoints = []
    for i in range(k):
        pointsForCluster = X[closestCluster == i]
        if len(pointsForCluster) < 2:
            # keep track of clusters with no points or 1 point
            clustersWithNoPoints.append(i)
            continue
        else:
            stds[i] = np.std(X[closestCluster == i])

    # if there are clusters with 0 or 1 points, take the mean std of the other clusters
    if len(clustersWithNoPoints) > 0:
        pointsToAverage = []
        for i in range(k):
            if i not in clustersWithNoPoints:
                pointsToAverage.append(X[closestCluster == i])
        pointsToAverage = np.concatenate(pointsToAverage).ravel()
        stds[clustersWithNoPoints] = np.mean(np.std(pointsToAverage))

    return clusters, stds

建立 RBF 神经网络的类（单输出）：

多个输出时应将权重\(\omega\)更改为一个矩阵，并将偏置\(b\)更改为一个向量。

class RBFNet(object):
    """Implementation of a Radial Basis Function Network"""
    def __init__(self, k=2, lr=0.01, epochs=100, rbf=rbf, inferStds=True):
        self.k = k
        self.lr = lr
        self.epochs = epochs
        self.rbf = rbf
        self.inferStds = inferStds

        self.w = np.random.randn(k)
        self.b = np.random.randn(1)

训练函数：

def fit(self, X, y):
    if self.inferStds:
        # compute stds from data
        self.centers, self.stds = kmeans(X, self.k)
    else:
        # use a fixed std
        self.centers, _ = kmeans(X, self.k)
        dMax = max([np.abs(c1 - c2) for c1 in self.centers for c2 in self.centers])
        self.stds = np.repeat(dMax / np.sqrt(2*self.k), self.k)

    # training
    for epoch in range(self.epochs):
        for i in range(X.shape[0]):
            # forward pass
            a = np.array([self.rbf(X[i], c, s) for c, s, in zip(self.centers, self.stds)])
            F = a.T.dot(self.w) + self.b

            loss = (y[i] - F).flatten() ** 2
            print('Loss: {0:.2f}'.format(loss[0]))

            # backward pass
            error = -(y[i] - F).flatten()

            # online update
            self.w = self.w - self.lr * a * error
            self.b = self.b - self.lr * error

预测函数：

def predict(self, X):
    y_pred = []
    for i in range(X.shape[0]):
        a = np.array([self.rbf(X[i], c, s) for c, s, in zip(self.centers, self.stds)])
        F = a.T.dot(self.w) + self.b
        y_pred.append(F)
    return np.array(y_pred)

具体实现过程：

# sample inputs and add noise
NUM_SAMPLES = 100
X = np.random.uniform(0., 1., NUM_SAMPLES)
X = np.sort(X, axis=0)
noise = np.random.uniform(-0.1, 0.1, NUM_SAMPLES)
y = np.sin(2 * np.pi * X)  + noise

rbfnet = RBFNet(lr=1e-2, k=2)
rbfnet.fit(X, y)

y_pred = rbfnet.predict(X)

plt.plot(X, y, '-o', label='true')
plt.plot(X, y_pred, '-o', label='RBF-Net')
plt.legend()

plt.tight_layout()
plt.show()

训练结果：

donghao_34's Blog

机器学习—— RBF 神经网络

概念

高斯径向基

获取平均值

标准差的确定

RBF 神经网络的反向传递

RBF 神经网络的 Python 实现