Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue

Jun Liang; Fei-yun Zhang; Xiao-xia Xiong; Xiao-bo Chen; Long Chen; Guo-hui Lan

doi:10.1080/18756891.2016.1256570

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Volume 9, Issue 6, December 2016, Pages 1041 - 1054

Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue

Authors

Jun Liang^*, Fei-yun Zhang, Xiao-xia Xiong, Xiao-bo Chen, Long Chen, Guo-hui Lan

Automotive Engineering Research Institute, Jiangsu University, Zhenjiang, 212013, P. R. China

^* Corresponding author. E-mail:liangjun@ujs.edu.cn.

Corresponding Author

Jun Liang

Received 8 April 2014, Accepted 11 April 2016, Available Online 1 December 2016.

DOI: 10.1080/18756891.2016.1256570 How to use a DOI?
Keywords: Support vector machines; Generalized eigenvalues; Locality preserving projections; Manifold regularization
Abstract: Proximal support vector machine via generalized eigenvalue (GEPSVM) is a recently proposed binary classification technique which aims to seek two nonparallel planes so that each one is closest to one of the two datasets while furthest away from the other. In this paper, we proposed a novel method called Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue (MRGEPSVM), which incorporates local geometry information within each class into GEPSVM by regularization technique. Each plane is required to fit each dataset as close as possible and preserve the intrinsic geometric structure of each class via manifold regularization. MRGEPSVM is also extended to the nonlinear case by kernel trick. The effectiveness of the method is demonstrated by tests on some examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation speed and test set correctness.
Copyright: © 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

1. Introduction

Standard SVMs¹, which are powerful tools for data classification and regression, have come to play a very dominant role in machine learning and pattern recognition community. The approach is systematic and motivated by Statistical Learning Theory², which seeks the optimal separating hyperplane by minimizing structural risk instead of empirical risk. In binary classification, it assigns the point to one of the two disjoint half spaces in either the original input space or a higher dimensional feature space. In order to find the optimal separating hyperplane, SVMs need to solve a quadratic programming problem (QPP) which is time-consuming in large sized samples.

Recently, several simpler classification methods were proposed^3,4. Opposite to SVMs, in binary classification, they all aim at finding two planes so that each plane was closest to the points of its own class and furthest away from the points of the other class. This idea not only leads to faster and simpler algorithms than the conventional SVMs but also avoids the unbalanced sample distribution problem in SVMs. Among these methods, generalized eigenvalue proximal SVM (GEPSVM)^5,6,7 does binary classification by formulating two eigenvalue problems instead of quadratic programming problem to generate two nonparallel planes which make it extremely fast in real-world applications. The eigenvectors corresponding to the smallest eigenvalues determines the above two planes, as shown in Fig. 1.

The last decade has witnessed the emergence of manifold learning as a vigorous paradigm for dimensionality reduction and data visualization. The basic assumption of manifold learning is that though probably each data point consists thousands of features, it may actually be sampled from a low dimensional manifold embedded in a high dimensional space. The popular manifold learning methods, including Locally Linear Embedding (LLE)⁸, ISOMAP⁹, Laplacian Eigenmap¹⁰ and their various extensions^11,12, have gained successful applications in many fields. In addition to the applications in dimensionality reductions, manifold regularization¹³ has been proposed as a new form of regularization that allows us to exploit the geometry of the distribution of the data points. It has also been used in SVMs, e.g., Laplacian SVMs^14,15, to take advantage of unlabeled data points. In the literature, manifold learning, especially locality preserving projections criteria¹⁶, are widely used to reduce dimensionality of the original data in feature extraction or penalize classifier using unlabeled data samples in semi-supervised learning. However, there is little evidence which has shown whether supervised classifier can benefit from manifold structure directly from labeled samples. Recently, the idea of manifold learning has been applied in GEPSVM by considering the local information within each class to eliminate the influence of outliers¹⁷.

With the geometric intuition of GEPSVM and the essence of manifold learning, we argue that manifold regularization is particularly suitable for GEPSVM and its variants. In contrast with conventional SVMs, GEPSVM and other recently proposed classifiers, e.g., Twin SVMs^4,18, are all based on seeking planes best fitting the data points of the same class rather than just separating points of different classes. Twin SVMs are non-parallel planar classifiers whose target is to construct a classification hyper plane for two kinds of data, which makes the sample of each super plane distance as close as possible, and the distance of the samples as far as possible. In other words, they are not only “discriminative” based approach but also “expressive” based approach. Therefore, it’s natural to make attempts to combine manifold structure with these methods. Another reason is that the numerators of the objective functions of GEPSVM have similar formulations as least square regression which is known to tend to overfit data points. That’s why Tikhonov regularization terms¹⁹ are integrated in GEPSVM to enhance its generalization ability and avoid overfitting. But they merely penalize the numerators whereas ignoring the denominators. Furthermore, in many cases, manifold regularization is more powerful as it takes the intrinsic geometry structure of the data points into account. In this paper, manifold regularization, specifically locality preserving projections (LPP)^20,21, is firstly extended in our context and then incorporated into classical GEPSVMs. Locality Preserving Projections (LPP) are linear projective maps that arise by solving a variation problem that optimally preserves the neighborhood structure of the data set^22,23. LPP should be seen as an alternative to Principal Component Analysis (PCA)—a classical linear technique that projects the data along the directions of maximal variance. When the high dimensional data lies on a low dimensional manifold embedded in the ambient space, the Locality Preserving Projections are obtained by finding the optimal linear approximations to the Eigen functions of the Laplace Beltrami operator on the manifold^24,25. LPP shares many of the data representation properties of nonlinear techniques such as Laplacian Eigen maps or Locally Linear Embedding. Yet LPP is linear and more crucially is defined everywhere in ambient space rather than just on the training data points. LPP may be conducted in the original space or in the reproducing kernel Hilbert space into which data points are mapped. This gives rise to kernel LPP. We should emphasize that the purpose of manifold regularization in our context is different from the classical one whose primary aim is to utilize the distribution information of unlabeled samples. Unlike semi-supervised²⁶ learning, the intention of the proposed approach is to demonstrate how manifold structure of labeled samples is also beneficial to supervised learning even when unlabeled samples are available. Twin SVM is a non-parallel planar classifier shown in Fig. 2, its target is to construct a classification hyper plane for two kinds of data, which makes the sample of each super plane distance as close as possible, and the distance of the samples as far as possible.

In this paper, MRGEPSVM is derived by designing a modified locality preserving projections criteria and incorporating it into GEPSVM. Unlike conventional LPP, we project the data points onto planes and require the local geometry structure to be preserved. It demonstrates that manifold regularization is suitable for GEPSVM and its variants.

The remainder of this paper is organized as follows: In section 2, we give a brief overview of GEPSVM followed by the construction of local preserving projections criteria in section 3. Section 4 presents manifold regularized GEPSVM based on the novel regularization. In section 5, detailed experimental results are given and we conclude this paper in section 6.

2. A brief review of GEPSVM

Supposed we are given m₁ training samples belongs to class 1 and m₂ training samples belongs to class 2 in the n-dimensional real space, with m₁ + m₂ = m. Let A and B be two matrices whose rows are samples of class 1 and class 2 respectively. Let w₁ and w₂ be the normal to the two planes respectively, GEPSVM seeks two nonparallel planes given by:

(1)

$xTw1+b1=0xTw2+b2=0$

so that each one is close to the points of one class but far away from the points of the other. This idea yields the following optimization problems.

(2)

${minw1,b1≠0‖Aw1+e1b1‖2/‖w1‖2‖Bw1+e2b1‖2/‖w1‖2minw2,b2≠0‖Bw2+e2b2‖2/‖w2‖2‖Aw2+e1b2‖2/‖w2‖2$

where Bw₁ + e₂b₁ ≠ 0, and Aw₂ + e₁b₂ ≠ 0, while e₁ and e₂ are vectors of ones of appropriate dimensionality. In order to achieve good generalization, Tikhonov regularization term 12 is introduced, leading to the following optimization problems.

(3)

${minw1,b1≠0‖Aw1+e1b1‖2+‖[w1;b1]‖2‖Bw1+e2b1‖2minw2,b2≠0‖Bw2+e2b2‖2+‖w1;b1‖2‖Aw2+e1b2‖2$

where Bw₁ + e₂b₁ ≠ 0, and Aw₂ + e₁b₂ ≠ 0. By defining:

(4)

$z1T=[w1T,b1]z2T=[w2T,b2]P=[Ae1]T[Ae1]Q=[Be2]T[Be2]$

The objective functions in (2) can be written as the following Rayleigh quotient problems:

(5)

${minz1≠0z1T(P+δ1I)z1z1TQz1minz2≠0z2T(P+δ1I)z2z2TQz2$

The optimal z₁ and z₂, i.e. the coefficients of the two planes, are determined by solving a pair of generalized eigenvalue problems respectively. As to the detailed algorithm of GEPSVM, please refer to Ref. 3.

3. Local Preserving Projections on Plane

In this section, the idea of local preserving projections is extended in the case that we expect the intrinsic geometry structure of original data samples of the same class can be preserved after they are projected to a plane.

Firstly, in order to preserve the original local geometry structure within each class, we need to construct a graph for each class, named as with-in class graph G^(c)(c = 1,2). Each vertex in G^(c) correspinds to a sample point which belongs to class c and an edge between a vertex pair is added when the corresponding sample pair is each other’s k-nearest neighbors. After the structure of G^(c) is fixed, we need to determine its weight matrix S^(c). In the literature, there exist several methods for weight calculation, e.g., RBF kernel, inverse Euclidean distance. In our experiments, RBF kernel is selected as follows: $Sij(c)=exp(−‖xi(c)−xjc‖2/δ)$ if $xi(c)$ and $xj(c)$ are neighbors and belonging to the same class c, otherwise $Sij(c)=0$ .

After with-in graphs have been constructed in input space, we project the points of each class onto their corresponding fitting planes and extend LPP criteria under this condition. The detailed procedure is given as follows.

Give an arbitrary point x in ℝⁿ and let y be its orthogonal projection onto the fitting plane w^T x + b = 0, we can obtain the following equality from Fig. 3.

(6)

$y+w‖w‖d=x$

After a simple calculation as described in Ref. 15, the value of d can be expressed as:

(7)

$d=wTx+b‖w‖$

On substituting (7) into (6), the orthogonal projection of x can be formulated as:

(8)

$y=x−w‖w‖d=x−(wTx+b)w‖w‖2=x−(wTx+b)w¯$

where

$w¯$ is defined as:

(9)

$w¯=w‖w‖2$

According to our idea, the local geometry structure within the points of each class should be preserved after the projection. Take class 1 for example, this leads to the following optimization problem.

(10)

$minw1=∑i,i=1m1‖yi(1)−yj(1)‖2Sij(1)$

where

$yi(1)$ is the orthogonal project of point

$xj(1)$ , i.e., i-th point of class 1, onto its corresponding fitting plane

$w1Tx+b1=0$ . Following (8),

$yi(1)$ can be expressed explicitly in the form:

(11)

$yi(1)=xi(1)−(w1Txi(1)+b)w1‖w1‖2=xi(1)−(w1Txi(1)+b)w¯1$

where:

(12)

$w¯1=w1‖w1‖2$

Thus, the objective function of (10) is reformulated as:

(13)

$∑i,i=1m1‖yi(1)−yj(1)‖2Sij2=∑i,i=1m1‖xi(1)−(w1Txi(1)+b)w¯1−xj(1)+(w1Txj(1)+b)w¯1‖2Sij(1)=∑i,i=1m1‖(xi(1)−xj(1))−w1T(xi(1)−xj(1))w¯1‖2Sij(1)$

For the sake of simplicity, let us define:

(14)

$dij(1)=xi(1)−xj(1)=ATti−ATtj=AT(ti−tj)$

where l_i is m₁-dimensional unit vector with the i-th element 1, 0 otherwise. Therefore, (13) can be expressed in the form:

(15)

$∑i,j=1m1‖yi(1)−yj(1)‖2Sij(1)=∑i,j=1m1‖dij(1)−w1Tdij(1)w¯1‖2Sij(1)=∑i,j=1m1(dij(1)−w1Tdij(1)w¯1)T(dij(1)−w1Tdij(1)w¯1)Sij(1)=∑i,j=1m1(dij(1)T−w1Tdij(1)w¯1T)(dij(1)−w1Tdij(1)w¯1)Sij(1)=∑i,j=1m1(dij(1)Tdij(1)−2w1Tdij(1)dij(1)Tw¯1+(w1Tdij(1))2w¯1Tw¯1)TSij(1)$

By dropping the terms irrelative to optimize variables and making use of the expression of w₁ and $dij(1)$ , i.e., (12) and (14), (15) can be further formulated as following:

(16)

$∑i,j=1m1‖yi(1)−yj(1)‖2Sij(1)∝∑i,j=1m1(−2w1Tdij(1)dij(1)Tw¯1+(w1Tdij(1))2w¯1Tw¯1)Sij(1)=∑i,j=1m1(−2(w1Tdij(1))2‖w1‖2+(w1Tdij(1))2‖w1‖2)Sij(1)=−1‖w1‖2∑i,j=1m1(w1Tdij(1))2Sij(1)=−1‖w1‖2w1T(∑i,j=1m1dij(1)dij(1)TSij(1))w1=−1‖w1‖2w1TAT(∑i,j=1m1(ti−tj)(ti−tj)TSij(1))Aw1$

where e_i is a m₁ dimensional unit vector with the i-th element 1, 0 otherwise. Now, let us derive a more explicit expression for

$∑i,j=1m1(ti−tj)(ti−tj)TSij(1)$ as:

(17)

$∑i,j=1m1(ti−tj)(ti−tj)TSij(1)=∑i,j=1m1(ti−tj)(tiT−tjT)Sij(1)=∑i,j=1m1(titiT−titjT−tjtiT+tjtjT)Sij(1)=2(∑i=1m1titiT∑j=1m1Sij(1)−∑i,j=1m1titjTSij(1))=2(D(1)−S(1))=2L(1)$

where D⁽¹⁾ is a diagonal matrix with its entries being the row sums of S⁽¹⁾, i.e.

$Dij(1)=∑jSij(1)$ , and L⁽¹⁾ = D⁽¹⁾ − S⁽¹⁾ is the Laplacian matrix of the data points of class 1. On substituting (17) into (16), we have:

(18)

$∑i,j=1m1‖yi(1)−yj(1)‖2Sij(1)∝−2‖w1‖2w1TATL(1)Aw1$

Considering minimization modus operandi in (10), we finally obtain the following relationship.

(19)

${minw1∑i,j=1m1‖yi(1)−yj(1)‖2Sij(1)maxw12‖w1‖2w1TATL(1)Aw1$

Now we should compare our novel LPP criteria (19) with the traditional one usually expressed in the form, and X is the set of x_i.

(20)

$minw1TXTLXw1w1TXTDXw1$

which is obtained by optimizing the vector w₁ onto which each point of is X projected meanwhile best preserving the local structure within X. In our context, X = A. We can gain some insights into the difference between (19) and (20), i.e., minimization is converted to maximization. This phenomenon is explained as follows. First of all, it can be noticed that w₁ is merely the normal vector of the fitting plane. Furthermore, our goal is to preserve local geometry information within original space after points are projected onto the fitting plane instead of its normal vector. After determining whether the normal vector is orthogonal to its derived plane, it’s clear that we reach that goal. A simple geometry example to visually illustrate this phenomenon is given in Fig. 4.

It’s obvious that the best vector that preserves local structure of data points corresponds to the worst plane doing so. As Fig. 4 left shows, A, B are projected meanwhile best preserving the local structure. However, in Fig. 4 right, a and b are projected, but they do not preserve the local structure.

4. Incorporate local geometry structure into GEPSVM

We are now in a position to introduce the new criteria (18) into classical GEPSVM. As aforementioned, GEPSVM aims to find two distinct planes which are not parallel to each other so that each closest to the points of one class and furthest away from the other. Recalling the objective function (5) of GEPSVM, it’s obvious that its numerator is similar to least square regression, which tends to generate overfitting solutions and that’s why we need regularization technique to avoid it. In original GEPSVM, Tikhonov regularization plays the role in tackling this problem. In this paper, besides Tikhonov term, manifold regularization, i.e., criteria (19), is also used to handle it.

Our idea is to seek two planes so that each one is required not only closest to the data set of its own and furthest from the other data set but also to preserve the local geometry structure of them, respectively. To achieve this, (10) is used to penalize the original GEPSVM, leading to the following modified objective functions for class 1.

(21)

${minw1,b1‖Aw1+e1b1‖2/‖w1‖2‖Bw1+e2b1‖2/‖w1‖2minw1∑i,j=1m1‖yi(1)−yj(1)‖2Sij(1)$

Making the result of (19), (21) can be expressed as following:

(22)

${minw1,b1‖Aw1+e1b1‖2/‖w1‖2‖Bw1+e2b1‖2/‖w1‖2maxw12‖w1‖2w1TATL(1)Aw1$

We can view the second function as manifold regularized and incorporated into the first function as in (23) . At the same time, in order to balance the effect of the two objectives, trade-off factor δ_M is introduced, which leads to the following optimization problems.

(23)

$minw1,b1‖Aw1+e1b1‖2/‖w1‖2‖Bw1+e2b1‖2/‖w1‖2+δMw1TATL(1)Aw1/‖w1‖2=minw1,b1‖Aw1+e1b1‖2‖Bw1+e2b1‖2+δMw1TATL(1)Aw1$

Under the definition in GEPSVM, (23) can be written as:

(24)

$minw1,b1‖Aw1+e1b1‖2‖Bw1+e2b1‖2+δM[w1T b1][ATL(1)A0n0nT0][w1b1]=z1TPz1z1TQz1+δMz1TM(1)z1=z1TPz1z1T(Q+δMM(1))z1$

where

$[ATL(1)A0n0nT0]$ and 0_n is a vector of zero with order n. By combining (24) with Tikhonov regularization with appropriate trade-off factor δ_T, we obtain the following optimization problems.

(25)

$minz1≠0z1T(P+δTI)z1z1T(Q+δMM(1))z1$

Then, the optimal z₁, i.e. [w₁; b₁], is the eigenvector corresponding to the smallest eigenvalue of the following generalized eigenvalue problem.

(26)

$(P+δTI)z1=λ(Q+δMM(1))z1$

Following the similar procedure above, we define an analogous minimization problem to (25) for determining Z₂ = [w₂; b₂], i.e., the coefficient of the second plane $w2Tx+b2=0$ by:

(27)

$minz2≠0z2T(Q+δTI)z2z2T(P+δMM(1))z2$

where

$M(2)=[BTL(2)B0n0nT0]$ , and L⁽²⁾ = D⁽²⁾ – S⁽²⁾.

The minimum of (27) is achieved at the eigenvector corresponding to the smallest eigenvalue of the following generalized eigenvalue problem.

(28)

$(Q+δTI)z2=λ(P+δMM(2))z2$

Once we obtain the eigenvectors corresponding to the smallest eigenvalues of (26) and (28), the two planes are determined at the same time. Therefore, for a new coming sample x, we first calculate the distance from x to the two planes given by (29):

(29)

$d1=|xTw1+b1|/‖w1‖2d2=|xTw2+b2|/‖w2‖2$

Then we classify x as belonging to class 1 if otherwise d₁ < d₂ to class 2.

We now describe our simple algorithm as follows for implementing a linear MRGEPSVM.

Given m₁ samples in ℝⁿ represented by A ∈ ℝ^m₁×n and m₂ samples represented by B ∈ ℝ^m₂×n:

(i)
Define the augmented matrix P, Q, M⁽¹⁾ and M⁽²⁾ in feature space
(ii)
Solve the eigenvalue problems of (36) and (38);
(iii)
For the new coming sample x, compute the distances d₁, d₂, and classify it to the nearest one.

Algorithm 1:

Linear Manifold Regularized Proximal Support Vector Machine via Generalized Eigen-value

Originally, GEPSVM were proposed to tackle XOR problems which is difficult for conventional linear classifications such as SVMs, by which it can obtain nearly 100% correct. A “cross planes” example is show in Fig. 5 for GEPSVM and the proposed MRGEPSVM. We observe that MRGEPSVM obtains nearly the same accuracy as GEPSVM, which implies the ability of GEPSVM to solve XOR problems is not depressed in MRGEPSVM by introducing manifold regularization technique.

5. Nonlinear MRGEPSVM

In this section, we extend our results to nonlinear classifiers. Suppose that the Euclidean space ℝⁿ is mapped to a Hilbert space H, named as feature space instead of input space ℝⁿ, through a nonlinear mapping function ϕ : ℝⁿ → H. Let K(x_i, x_j) be kernel function in feature space satisfying K(x_i, x_j) = ϕ(x_i)^T ϕ(x_j).The data samples A and B are mapped to ϕ(A) and ϕ(B), respectively. Let the fitting planes in feature space be denoted by:

(30)

$φ(xT)w1+b1=0φ(xT)w2+b2=0$

therefore, (23) can be expressed in following form in feature space.

(31)

$minw1,b1‖φ(A)w1+e1b1‖2‖φ(B)w1+e2b1‖2+δMw1Tφ(A)TL(1)φ(A)w1$

A basic principle in feature space is that w₁ can be expressed as the linear combinations of all data samples in H, leading to:

(32)

$w1=φ(CT)u1$

where C^T = [A^T B^T], and u₁ ∈ ℝ^m.

Substituting (32) into (31), we obtain:

(33)

$minu1,b1‖K(A,CT)u1+e1b1‖2‖K(B,CT)u1+e2b1‖2+δMu1TK(C,AT)L(1)K(A,CT)u1=z1TPz1z1TQz1+δMz1TM(1)z1$

where:

(34)

$P=[K(A,CT) e1]T[K(A,CT) e1]Q=[K(A,CT) e2]T[K(A,CT) e2]M(1)=[K(C,AT)L(1)K(A,CT)0m0mT0]$

Similarly, by combining (33) with Tikhonov regularization, we obtain:

(35)

$minz1≠0z1T(P+δTI)z1z1T(Q+δMM(1))z1$

By an entirely similar argument, we can define an analogous minimization problem for the fitting plane of class 2.

(36)

$minz2≠0z1T(P+δTI)z2z2T(Q+δMM(2))z2$

where:

(37)

$M(2)=[K(C,BT)L(1)K(B,CT)0m0mT0]$

After z₁ and z₂ are solved, two fitting planes in feature space are determined. In order to derive the decision rule in feature space, suppose we gave a new data point x ∈ ℝⁿ to be classified. Then the distance between its image in feature space, i.e. ϕ(x), and the fitting plane of class 1, i.e. ϕ(x^T) w₁ + b₁ = 0, is given by:

(38)

$d1=|φ(xT)w1+b1|‖w1‖2$

By using (32), (38) can be expressed as:

(39)

$d1=|φ(xT)φ(CT)u1+b1|/u1Tφ(C)φ(CT)u1=|K(xT,CT)u1+b1|/u1TK(C,CT)u1$

Similarly, the distance between and the fitting plane of class 2, i.e., ϕ(x^T) w₂ + b₂ = 0, is given by:

(40)

$d2=|K(xT,CT)u2+b2|/u2TK(C,CT)u2$

The decision rule for nonlinear MRGEPSVM is the same as its linear version, expect the distances from x to fitting planes need to be calculated by (35) and (36) .

We now give an explicit statement for nonlinear MRGEPSVM algorithm.

Given m₁ samples in ℝⁿ represented by A ∈ ℝ^m₁×n and m₂ samples represented by B ∈ ℝ^m₂×n.

(i)
Define the matrix P, Q, M⁽¹⁾ and M⁽²⁾ in input space;
(ii)
Solve the eigenvalue problems of (26) and (27);
(iii)
For the new coming sample x, compute the distances d₁, d₂, and classify it to the nearest one.

Algorithm 2:

Nonlinear Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue

6. Experimental Results

To demonstrate the performance of the proposed method, we conducted experiments on several benchmark datasets from UCI Repository ²⁷. Optimal values of all parameters involving each method were obtained by using a tuning set comprising of 10 percent of the data set. Table 1 shows the comparison of MRGEPSVM, TSVM, GEPSVM and SVM.

	MRGEPSVM	TSVM	GEPSVM	SVM
Hepatitis	85.16 ± 2.16	80.79 ± 3.74	58.29 ± 2.13	80.00 ± 2.03
Ionosphere	87.47 ± 3.38	88.03 ± 4.69	75.19 ± 3.43	86.04 ± 5.91
Heart-statlog	84.82 ± 1.98	84.44 ± 2.62	84.81 ± 3.11	84.07 ± 4.56
Votes	95.40 ± 1.62	96.08 ± 2.73	91.93 ± 3.08	94.50 ± 2.79
Sonar	79.81 ± 1.08	77.25 ± 3.75	66.76 ± 5.44	79.79 ± 2.13
Pima	74.88 ± 3.09	73.70 ± 5.14	74.60 ± 4.16	76.68 ± 3.82

Table 1.

Classification accuracy using linear kernel

From the experimental results listed in Table 1, we can see that MRGEPSVM gains superior performance than GEPSVM on nearly all datasets. We also performed experiments on several UCI datasets using RBF kernel. The performances of each method are shown in Table 2.

	MRGEPSVM	TSVM	GEPSVM	SVM
Hepatitis	81.94 ± 3.14	82.67 ± 5.03	78.25 ± 4.86	83.13 ± 3.56
BUPA	66.09 ± 2.71	67.83 ± 7.01	63.80 ± 3.25	58.32 ± 6.83
WPBC	79.72 ± 2.14	81.92 ± 4.17	62.70 ± 3.36	79.92 ± 3.16

Table 2.

Classification accuracy using nonlinear kernel

The aforementioned methods have been tested on benchmark data sets that are publicly available. Results regard their performance in terms of classification accuracy. The results regarding the linear kernel have been obtained using the first two repositories. The third one has been used in the non-linear kernel implementation. For each data set, the latter repository offers 100 predefined random splits into training and test sets. For several algorithms, results obtained from each trial, including SVMs, are recorded. Execution times and the other accuracy results have been calculated using an Intel Core i7 CPU 3.20 GHz, 8 GB RAM running Windows 8 with Matlab 2013b, during normal daylight operations. In the case of nonlinear kernel, we observe there are still perceptible improvements compared with GEPSVM. Meanwhile the accuracy of MRGEPSVM is close to TSVM and SVM. It seems that manifold structure didn’t greatly facilitate the performance in feature space. A possible explanation may lie in the higher dimensionality offset the effect brought by manifold structure within data samples after they are mapped.

Elapsed time are listed in Table 3 and Table 4, by different methods with Gaussian and linear kernel, respectively. In the linear one, MRGEPSVM and GEPSVM outperform TSVM and SVM in all cases. MRGEPSVM is at least twice faster than GEPSVM. When the Gaussian kernel is used, SVMs implementations achieve better performances with respect to the eigenvalues based methods. In all cases, MRGEPSVM is faster than GEPSVM.

Dataset	MRGEPSVM	GEPSVM	TSVM	SVM
Votes	1.1473	5.8744	0.4523	0.2022
Sonar	3.8717	5.8947	0.1543	0.4080
Pima	0.0296	0.1143	0.0302	7.1968
Flare-solar	1.9839	16.1654	2.1429	4.4562
Waveform	0.5998	4.480	0.9016	0.2284
Thyroid	0.0246	0.1280	0.0503	0.0718
Heart	0.0361	0.2187	0.0278	0.1732
Banana	0.4989	3.1102	0.0346	1.3505
Breast-cancer	0.0688	0.3544	0.0429	0.1188

Table 3.

Elapsed time in seconds using Gaussian kernel

Dataset	MRGEPSVM	GEPSVM	TSVM	SVM
Votes	0.0119	0.0277	0.0024	0.0019
Sonar	0.0364	0.0854	0.0589	0.0395
Pima	0.0015	0.2858	0.6809	0.0013
Flare-solar	0.0158	0.1673	0.1092	0.0893
Waveform	0.0013	0.0934	0.0438	0.0472
Thyroid	0.0011	0.0183	0.00524	0.0018
Heart	0.0012	0.1091	0.0019	0.0011
Banana	0.0024	0.1578	0.0063	0.0038
Breast-cancer	0.0002	0.0158	0.0016	0.0009

Table 4.

Elapsed time in seconds using linear kernel

USPS database is a handwritten numeral recognition database provided by the United States postal service, which includes 10 kinds of gray images from 0 to 9, where the gray value has been normalized, with each figure containing 1100 images of 16 × 16 pixels. Table 5 is the average accuracy and standard deviation of different algorithms using USPS dataset.

Num.	SVM Test±Std (%)	LPP Test±Std (%)	TSVM Test±Std (%)	GEPSVM Test±Std (%)	MRGEPSVM Test±Std (%)
l = 10	77.14 ± 1.97	82.26 ± 1.73	81.12 ± 1.67	81.41 ± 1.85	82.26 ± 1.37
l = 20	84.67 ± 0.93	89.19 ± 1.00	89.40 ± 1.09	87.78 ± 1.34	89.97 ± 1.06
l = 30	86.07 ± 1.62	91.60 ± 1.07	91.93 ± 0.87	90.51 ± 1.13	92.99 ± 0.98
l = 40	88.13 ± 1.13	93.22 ± 1.02	93.49 ± 0.95	91.88 ± 0.99	94.33 ± 0.93
l = 50	90.48 ± 1.34	94.50 ± 0.88	94.46 ± 1.05	93.40 ± 1.26	95.50 ± 1.08

Table 5.

The average accuracy and standard deviation of different algorithms in USPS dataset

In this experiment, we randomly selected 110 image data sets from each class of samples, and randomly selected 10% and 20% of the data as the training set to verify the MRGEPSVM algorithm. The penalty parameters C of SVM and TWSVM, as well as the regularization terms of MRGEPSVM and GEPSVM, are selected from the collection {2ⁱ| − 9,…, 9}. The value k of KNN is selected from the collection {3, 4,…,20}.

The experimental set-up is meant to simulate a real-world situation: we considered binary classification problems due to the splits of the training data, where all of one driver cases were labeled and all the rest were left unlabeled. The test set is composed of entirely new drivers, forming the separate group.

In Fig. 6, we compare the error rates of 30 binary classification problems of GEPSVM, MRGEPSVM algorithm. We train on the same driver from a training set of examples, and test on the remaining five set of samples. We considered the task of classifying the driver whether he meets obstacles as a binary classification problem.

7. Conclusions

In this paper, a novel proximal support vector machine, called manifold regularized GEPSVM, is presented. Our analysis allows us to see the role of manifold regularization in proximal support vector machine via generalized eigenvalue in a clear way. MRGEPSVM is derived by designing a modified locality preserving projections criteria and incorporating it into GEPSVM. Unlike conventional LPP, we project the data points onto planes and require the local geometry structure to be preserved. In GEPSVM, we also solve a pair of eigenvalue problems to determine the two fitting planes. One advantage of MRGEPSVM is that it demonstrates that manifold regularization is suitable for GEPSVM and its variants. As to future work, we believe the proposed regularization technique may be applied in more powerful classifiers, e.g. Twin SVM, to further boost its performance.

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant U1564201, No. 51108209, No. 50875112, No. 61573171 and No. 70972048), the Natural Science Foundation of Jiangsu Province (Grant No. BK2010339 and No. BK20140570), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 10KJD580001), the College Graduate Research and Innovation Program of Jiangsu Province(Dr. Innovation) (Grant No. CXLX11 0593), the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD), the Project Funded by the department of transport information (Grant 2013-364-836-900), the “Six Talent Peak” project in Jiangsu Province (DZXX-048), and the National Statistical Science Research Project (2014596).

References

1.CJC Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, Vol. 2, No. 2, 1998, pp. 1-43.

2.R Tempo, G Calafiore, and F Dabbene, Randomized Algorithms for Analysis and Control of Uncertain Systems: With Applications, Springer-Verlag, London, 2012, pp. 123-134.

3.OL Mangasarian and EW Wild, Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues, Pattern Analysis and Machine Intelligence, Vol. 28, No. 1, 2006, pp. 69-74.

4.R Khemchandani and S Chandra, Twin Support Vector Machines for Pattern Classification, Pattern Analysis and Machine Intelligence, Vol. 29, No. 5, 2007, pp. 905-910.

5.Y Zhang, Z Dong, S Wang, et al., Preclinical Diagnosis of Magnetic Resonance (MR) Brain Images via Discrete Wavelet Packet Transform with Tsallis Entropy and Generalized Eigenvalue Proximal Support Vector Machine (GEPSVM), Entropy, Vol. 17, No. 4, 2015, pp. 1795-1813.

6.F Dufrenois and JC Noyer, Generalized eigenvalue proximal support vector machines for outlier description, Killarney, in 2015 International Joint Conference on Neural Networks (IJCNN) (Ireland, 2015), pp. 1-9.

7.WJ Chen, YH Shao, DK Xu, et al., Manifold proximal support vector machine for semi-supervised classification, Applied intelligence, Vol. 40, No. 4, 2014, pp. 623-638.

8.J Wang, Geometric Structure of High-Dimensional Data and Dimensionality Reduction, Springer-Verlag, Berlin, 2012, pp. 203-220.

9.Z Zhang, TWS Chow, and M Zhao, M-Isomap: Orthogonal Constrained Marginal Isomap for Nonlinear Dimensionality Reduction, IEEE transactions on cybernetics, Vol. 43, No. 1, 2013, pp. 180-191.

10.X He and P Niyogi, Locality Preserving Projections, Advances in Neural Information Processing Systems, MIT, Vol. 16, 2004, pp. 153-160.

11.HT Chen, HW Chang, and TL Liu, Local Discriminant Embedding and Its Variants, IEEE Computer Society, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) (Washington DC, 2005), Vol. 2, pp. 846-853.

12.C Chen, L Zhang, J Bu, et al., Constrained Laplacian Eigenmap for dimensionality reduction, Neurocomputing, Vol. 73, No. 4, 2010, pp. 951-958.

13.S Sun, Multi-view Laplacian Support Vector Machines, Advanced Data Mining and Applications, Springer-Verlag, Berlin, 2011, pp. 209-222.

14.CM Bishop, Pattern Recognition and Machine Learning, Springer-Verlag, New York, 2006.

15.M Belkin and P Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation, Vol. 15, No. 6, 2003, pp. 1373-1396.

16.AN Tikhonov and VY Arsenin, Solutions of illposed problems, VH Winston and Sons, Washington DC, 1977.

17.X Yang, S Chen, B Chen, et al., Proximal support vector machine using local information, Neurocomputing, Vol. 73, No. 1, 2009, pp. 357-365.

18.YH Shao, CH Zhang, XB Wang, et al., Improvements on Twin Support Vector Machines, IEEE Transactions on Neural Networks, Vol. 22, No. 6, 2011, pp. 962-968.

19.D Zhang, F Wang, C Zhang, et al., Multi-View Local Learning, AAAI, in Proceedings of the 23rd AAAI Conference on Artificial Intelligence (2008), Vol. 2, pp. 752-757.

20.J Cheng, Q Liu, H Lu, et al., Supervised kernel locality preserving projections for face recognition, Neurocomputing, Vol. 67, 2005, pp. 443-449.

21.Z Zheng, X Huang, Z Chen, et al., Regression analysis of locality preserving projections via sparse penalty, Information Sciences, Vol. 303, 2015, pp. 1-14.

22.M Wu and B Schölkopf, A Local Learning Approach for Clustering, Advances in Neural Information Processing Systems, MIT, Vol. 19, 2006, pp. 1529-1536.

23.M Wu, K Yu, S Yu, et al., Local Learning Projections, ACM, in Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007) (Corvallis, 2007), pp. 1039-1046.

24.T Joachims, Training linear SVMs in linear time, ACM, in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (New York, 2006), pp. 217-226.

25.S Shalev-Shwartz, Y Singer, N Srebro, et al., Pegasos: primal estimated sub-gradient solver for SVM, Mathematical Programming, Vol. 127, No. 1, 2011, pp. 3-30.

26.D Cai, X He, and J Han, Semi-supervised Discriminant Analysis, IEEE, in 2007 IEEE 11th International Conference on Computer Vision (Rio de Janeiro, 2007), pp. 1-7.

27.M Lichman, UCI Machine Learning Repository, University of California, Irvine, 2013.

<Previous Article In Issue

Download article (PDF)

Next Article In Issue>

Journal: International Journal of Computational Intelligence Systems
Volume-Issue: 9 - 6
Pages: 1041 - 1054
Publication Date: 2016/12/01
ISSN (Online): 1875-6883
ISSN (Print): 1875-6891
DOI: 10.1080/18756891.2016.1256570 How to use a DOI?
Open Access: This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Jun Liang
AU  - Fei-yun Zhang
AU  - Xiao-xia Xiong
AU  - Xiao-bo Chen
AU  - Long Chen
AU  - Guo-hui Lan
PY  - 2016
DA  - 2016/12/01
TI  - Manifold Regularized Proximal Support Vector Machine via Generalized Eigenvalue
JO  - International Journal of Computational Intelligence Systems
SP  - 1041
EP  - 1054
VL  - 9
IS  - 6
SN  - 1875-6883
UR  - https://doi.org/10.1080/18756891.2016.1256570
DO  - 10.1080/18756891.2016.1256570
ID  - Liang2016
ER  -

download .riscopy to clipboard