100% found this document useful (2 votes)

833 views202 pages

Chapter 08

Learning from examples is Central to the study and design of artificial neural systems. Usefulness of the network depends primarily on the accuracy of its predictions of the output for unseen test patterns. A network might predict values based on unseen inputs rather inaccurately.

Uploaded by

Subbulakshmi Venkatachalam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

833 views202 pages

Chapter 08

Uploaded by

Subbulakshmi Venkatachalam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 202

Chapter 8 Support Vector Machines and Radial Basis Function Networks

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)
Copyright 2004 Tata McGraw Hill Publishing Co.

Statistical Learning Theory

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)
Copyright 2004 Tata McGraw Hill Publishing Co.

Learning from Examples

Learning from examples is natural in human beings Central to the study and design of artificial neural systems. Objective in understanding learning mechanisms Develop software and hardware that can learn from examples and exploit information in impinging data. Examples:
bit streams from radio telescopes around the globe 100 TB on the Internet

Generalization
Supervised systems learn from a training set T = {Xk, Dk} Xkn , Dk Basic idea: Use the system (network) in predictive mode ypredicted = f(Xunseen) Another way of stating is that we require that the machine be able to successfully generalize.
Regression: ypredicted is a real random variable Classification: ypredicted is either +1, -1

Approximation
Approximation results discussed in Chapter 6 give us a guarantee that with a sufficient number of hidden neurons it should be possible to approximate a given function (as dictated by the inputoutput training pairs) to any arbitrary level of accuracy. Usefulness of the network depends primarily on the accuracy of its predictions of the output for unseen test patterns.

Important Note
Reduction of the mean squared error on the training set to a low level does not guarantee good generalization! Neural network might predict values based on unseen inputs rather inaccurately, even when the network has been trained to considerably low error tolerances. Generalization should be measured using test patterns similar to the training patterns
patterns drawn from the same probability distribution as the training patterns.

Broad Objective
To model the generator function as closely as possible so that the network becomes capable of good generalization. Not to fit the model to the training data so accurately that it fails to generalize on unseen data.

Example of Overfitting
Networks with too many weights (free parameters) overfits training data too accurately and fail to generalize Example:
7 hidden node feedforward neural network 15 noisy patterns that describe the deterministic univariate function (dashed line). Error tolerance 0.0001. Network learns each data point extremely accurately Network function develops high curvature and fails to generalize

Occams Razor Principle

William Occam, c.12801349
No more things should be presumed to exist than are absolutely necessary.

Generalization ability of a machine is closely related to

the capacity of the machine (functions it can represent) the data set that is used for training.

Statistical Learning Theory

Proposed by Vapnik Essential idea: Regularization
Given a finite set of training examples, the search for the best approximating function must be restricted to a small space of possible architectures. When the space of representative functions and their capacity is large and the data set small, the models tend to over-fit and generalize poorly.

Given a finite training data set, achieve the correct balance between accuracy in training on that data set and the capacity of the machine to learn any data set without error.

Optimal Neural Network

Recall from Chapter 7 the sum of squares error function

The optimal network function we are in search of minimizes the error by trying to make the first integral zero

Optimal neural network satisfies

Residual error: average training data variance conditioned on the input

Training Dependence on Data

Network deviation from the desired average is measured by Deviation depends on a particular instance of a training data set Dependence is easily eliminated by averaging over the ensemble of data sets of size Q

Causes of Error: Bias and Variance

Bias: network function itself differs from the regression function E[d|X] Variance: network function is sensitive to the selection of the data set
generates large error on some data sets and small errors on others.

Quantification of Bias & Variance

Consequently,

Bias-Variance Dilemma
Separation of the ensemble average into the bias and variance terms:

Strike a balance between the ratio of training set size to network complexity such that both bias and variance are minimized. Good generalization Two important factors for valid generalization the number of training patterns used in learning the number of weights in the network

Stochastic Nature of T
T is sampled stochastically Xk X n, dk D Xk does not map uniquely to an element, rather a distribution Unkown probability distribution p(X,d) defined on X D determines the probability of observing (Xk, dk)

X P(X)

D P(d|x)

Risk Functional
To successfully solve the regression or classification task, a neural network learns an approximating function f(X, W) Define the expected risk as

The risk is a function of functions f drawn from a function space F

Loss Functions
Square error function Absolute error function 0-1 Loss function

Optimal Function
The optimal function fo minimizes the expected risk R[f]

fo defined by optimal parameters; Wo is the ideal estimator Remember: p(X,d) is unknown, and fo has to be estimated from finite samples fo cannot be found in practice!

Empirical Risk Minimization (ERM)

ERM principle is an induction principle that we can use to train the machine using the limited number of data samples at hand ERM generates a stochastic approximation of R using T called the empirical risk Re

Empirical Risk Minimization (ERM)

The best minimizer of the empirical risk replaces the optimal function fo ERM replaces R by Re and fo by f Question:
Is the minimizer

close to fo ? f

Two Important Sequence Limits

To ensure minimizer close to fo we need f to find the conditions for consistency of the ERM principle. Essentially requires specifying the necessary and sufficient conditions for convergence of the following two limits of sequences in a probabilistic sense.

First Limit
Convergence of the values of expected risks R[ ] fQ of functions fQ ,Q = 1,2, that minimize the empirical risk R e [Q ] over training sets of size f Q, to the minimum of the true risk

Another way of saying that solutions found using ERM converge to the best possible solution.

Second Limit
Convergence of the values of empirical risk R e [fQ ] Q = 1,2, over training sets of size Q, to the minimum of the true risk

This amounts to stating that the empirical risk converges to the value of the smallest risk. Leads to the Key Theorem by Vapnik and Chervonenkis

Key Theorem
Let L(d,f(X,W)) be a set of functions with a bounded loss for probability measure p(X,d) :

Then for the ERM principle to be consistent, it is necessary and sufficient that the empirical risk Re[f] converge uniformly to the expected risk R[f] over the set L(d,f(X,W)) such that

This is called uniform one-sided convergence in probability

Points to Take Home

In the context of neural networks, each function is defined by the weights W of the network. Uniform convergence Theorem and VC Theory ensure that W which is obtained by minimizing Re also minimizes R as the number Q of data points increases towards infinity.

Points to Take Home

Remember: we have a finite data set to train our machine. When any machine is trained on a specific data set (which is finite) the function it generates is a biased approximant which may minimize the empirical risk or approximation error, but not necessarily the expected risk or the generalization error.

Indicator Functions and Labellings

Consider the set of indicator functions F = {f(X,W)} mapping points in n into {0,1}
or {-1,1}.

Labelling: An assignment of 0,1 to Q points in n Q points can be labelled in 2Q ways

Labellings in 3-d

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Three points in R2 can be labelled in eight different ways. A linear oriented decision boundary can shatter all eight labellings.

VapnikChervonenkis Dimension
If the set of indicator functions can correctly classify each of the possible 2Q labellings, we say the set of points is shattered by F. The VC-dimension h of a set of functions F is the largest set of points that can be shattered by the set in question.

VC-Dimension of Linear Decision Functions in 2 is 3

Labelling of four points in 2 that cannot be correctly separated by a linear oriented decision boundary A quadratic decision boundary can separate this labelling!

VC-Dimension of Linear Decision Functions in n

At most n+1 points can be shattered by oriented hyperplanes in n VC-dimension is n+1 Equal to the number of free parameters

Growth Function
Consider Q points in n NXQ labellings can be shattered by F NXQ 2Q Growth function
G(Q) = ln sup (NXQ ) Q ln2 X Q

Growth Function and VC Dimension

G(Q)

Nothing in between these is allowed

The point of deviation is the VC-dimension

Towards Complexity Control

In a machine trained on a given training set the appoximants generated are naturally biased towards those data points. Necessary to ensure that the model chosen for representation of the underlying function has a complexity (or capacity) that matches the data set in question. Solution: structural risk minimization Consequence of VC-theory
The difference between the empirical and expected risk can be bounded in terms of the VC-dimension.

VC-Confidence, Confidence Level

For binary classification loss functions which take on values either 0,1, for some 01 the following bound holds with probability at least 1-:

VC-confidence holds with confidence level 1- Empirical error

Structural Risk Minimization

Structural Risk Minimization (SRM):
Minimize the combination of the empirical risk and the complexity of the hypothesis space.

Space of functions F is very large, and so restrict the focus of learning to a smaller space called the hypothesis space. SRM therefore defines a nested sequence of hypothesis spaces F1 F2 Fn
Increasing complexity

VC-dimensions h1 h2 hn

Nested Hypothesis Spaces form a Structure

VC-dimensions h1 h2 hn

Empirical and Expected Risk Minimizers

points in space Fi

fi,Q minimizes the empirical error over the Q

Is different from fi the true minimizer of the expected risk R in Fi

A Trade-off

Successive models have greater flexibility such that the empirical error can be pushed down further. Increasing i increases the VC-dimension and thus the second term Find Fn(Q), the minimizer of the r.h.s. Goal: select an appropriate hypothesis space to match the training data complexity to the model capacity. This gives the best generalization.

Approximation Error: Bias

Essentially two costs associated with the learning of the underlying function. Approximation error, EA:

Introduced by restricting the space of possible functions to be less complex than the target space Measured by the difference in the expected risks associated with the best function and the optimal function that measures R in the target space Does not depend on the training data set; only on the approximation power of the function space

Estimation Error: Variance

Now introduce the finite training set with which we train the machine. Estimation Error, EE:

Generalization error = EA +EE

Learning from finite data minimizes the empirical risk; not the expected risk. The system thus searches a minimizer of the expirical risk; not the expected risk This introduces a second level of error.

A Warning on Bound Accuracy

As the number of training points increase, the difference between the empirical and expected risk decreases. As the confidence level increases ( becomes smaller), the VC confidence term becomes increasingly large. With a finite set of training data, one cannot increase the confidence level indefinitely:
the accuracy provided by the bound decreases!

Support Vector Machines

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)
Copyright 2004 Tata McGraw Hill Publishing Co.

Origins
Support Vector Machines (SVMs) have a firm grounding in the VC theory of statistical learning Essentially implements structural risk minimization Originated in the work of Vapnik and co-workers at the AT&T Bell Laboratories Initial work focussed on Later applications
optical character recognition object recognition tasks

regression and time series prediction tasks

Context
Consider two sets of data points that are to be classified into one of two classes C1, C2 Linear indicator functions (TLN hyperplane classifiers) which is the bipolar signum function Data set is linearly separable T = {Xk, dk}, Xk n, dk {-1,1} C1: positive samples C2: negative samples

SVM Design Objective

Find the hyperplane that maximizes the margin
Class 1 Class 1 Class 2 Class 2

Distance to closest points on either side of hyperplane

Hypothesis Space
Our hypothesis space is the space of functions

Similar to Perceptron, but now we want to maximize the margins from the separating hyperplane to the nearest positive and negative data points. Find the maximum margin hyperplane for the given training set.

Definition of Margin
The perpendicular distance to the closest positive sample (d+) or negative sample (d-) is called the margin

Class 1

X+
Class 2

d+
X-

Reformulation of Classification Criteria

Originally

Reformulated as

Introducing a margin so that the hyperplane satisfies

Canonical Separating Hyperplanes

Satisfy the constraint = 1 Then we may write

or more compactly

Notation
X+ is the data point from C1 closest to hyperplane , and X is the unique point on that is closest to X+ Maximize d+
d+ = || X+ - X || From the defining equation of hyperplane ,
Class 1

X+ d+ X

Expression for the Margin

Defining equations of hyperplane yield Eventually yields

Noting that X+ - X is also perpendicular to

Total margin

Support Vectors
+
Margin Class 1

Vectors on the margin are the support vectors, and the total margin is 2/llWll

support vectors

Total Margin

SVM and SRM

If all data point lie within an n-dimensional hypersphere of radius then the set of indicator functions

has a VC-dimension that satisfies the following bound

Distance to closest point is 1/||W|| Constrain ||W|| A then the distance from the hyperplane to the closest data point must be greater than 1/A. Therefore, Minimize ||W||

SVM Implements SRM

An SVM implements SRM by constraining hyperplanes to lie outside hyperspheres of radius 1/A
radius

1/A

Objective of the Support Vector Machine

Given T = {Xk, dk}, Xk n, dk {-1,1} C1: positive samples C2: negative samples Attempt to classify the data using the smallest possible weight vector norm ||W|| or ||W||2 Maximize the margin 1/||W|| Minimize subject to the constraints

Method of Lagrange Multipliers

Used for two reasons
the constraints on the Lagrangian multipliers are easier to handle; the training data appear in the form of dot products in the final equations a fact that we extensively exploit in the non-linear support vector machine.

Construction of the Lagrangian

Formulate problem in primal space = (1, , Q), i 0 is a vector of Lagrange multipliers

Saddle point of Lp is the solution to the problem

Shift to Dual Space

Makes the optimization problem much cleaner in the sense that requires only maximization of i Translation to the dual form is possible because both the cost function and the constraints are strictly convex. KuhnTucker conditions for the optimum of a constrained optimization problem are invoked to effect the translation of Lp to the dual form

Shift to Dual Space

Partial derivatives of Lp with respect to the primal variables must vanish at the solution points

D = (d1,dQ)T is the vector of desired values

KuhnTucker Complementarity Conditions

Constraint Must be satisfied with equality Yields the dual formulation

Final Dual Optimization Problem

Maximize

with respect to the Lagrange multipliers, subject to the constraints:

Quadratic programming optimization problem

Support Vectors
Numeric optimization yields optimized Lagrange multipliers T
= (1,..., Q )

Observation: some Lagrange multipliers go to zero. Data vectors for which the Lagrange multipliers are greater than zero are called support vectors. For all other data points which are not support vectors, i = 0.

Optimal Weights and Bias

ns is the number of support vectors Optimal bias computed from the complementarity conditions Usually averaged over all support vectors and uses Hessian

Classifying an Unknown Data Point

Use a linear indicator function:

MATLAB Code: Linear SVM Linearly Separable Case

X=[0.5 0.5 % Data points 0.5 1.0 1. 1.5 1.5 0.5 1.5 2.0 2.0 1.0 2.5 2.0 2.5 2.5]; D=[-1 -1 -1 -1 1 1 1 1];% Corresponding classes q = size(X,1); % Size of data set epsilon = 1e-5; % threshold for checking support vectors H = zeros(q,q); % Initialize Hessian matrix for i = 1:q % Set up the Hessian for j = 1:q H(i,j) = D(i)*D(j)*X(i,:)*X(j,:); end end ... f = -ones(q,1); %Vectors of ones % Parameters for the Optimization problem numeqconstraints = 1; % Number of equality constraints = 1 A = D; % Set up the equality constraint b = 0; vlb = zeros(q,1); % Lower bound of lambdas = 0 vub = Inf*ones(q,1); % No upper bound x0 = zeros(q,1); % Initial point is 0 %Invoke MATLABs standard qp function for quadratic optimization... [lambda alpha how] = qp(H, f, A, b, vlb, vub, x0, numeqconstraints); svindex= find(lambda > epsilon);% Support vector indices ns = length(svindex); % Number of support vectors w_0 = (1/ns)*sum(D(svindex) - ...% Optimal bias H(svindex,svindex)*lambda(svindex).*D(svindex));

Simulation Result
Details of linearly separable data, support vectors and Lagrange multipliers values after optimization

Simulation Result

(a) Class 1 data (triangles) and Class 2 data (stars) plotted against a shaded background indicating the magnitude of the hyperplane: large negative values are black; large positive values are white. The class separating boundary (solid line) is shown along with the margins (dotted lines). Four support vectors are visible. (b) Intersection of the hyperplane and the indicator function gives the class separating boundaries and the margins. These are also indicated by the contour lines

Soft Margin Hyperplane Classifier

For non-linearly separable data classes overlap Constraint cannot be satisfied for all data points Optimization procedure will go on increasing the dual Lagrangian to arbitrarily large values Solution: Permit the algorithm to misclassify some of the data points albeit with an increased cost A soft margin is generated within which all the misclassified data lie

Soft Margin Classifier

+
Class 1

d(X2)=-1+2
-

X2
d( x )=1

X1
Class 2

d(X1)=1-1
d( x

d( x

)=0

)=1

Slack Variables
Introduce Q slack variables i

Data point is misclassified if the corresponding slack variable exceeds unity

i represents an upper bound on the number of misclassifications

Cost Function
Optimization problem is modified as:
Minimize

subject to the constraints

Notes
C is a parameter that assigns a penalty to the misclassifications Choose k = 1 to make the problem quadratic Minimizing ||W||2 minimizes the VC-dimension while maximizing the margin C provides a trade-off between the VCdimension and the empirical risk by changing the relative weights of the two terms in the objective function

Lagrangian in Primal Variables

For the re-formulated optimization problem

Definitions
= (1,,Q)T i 0 = (1,,Q)T 0 = (1,, Q)T 0 Reformulate the optimization problem in the dual space Invoke KuhnTucker conditions for the optimum

Intermediate Result
Partial derivatives with respect to the primal variables must vanish at the saddle point.

Kuhn-Tucker Complementarity Conditions

These are Which finally yields the dual formulation:

Recast into matrix form

Hessian matrix has elements Hij = di dj(Xi Xj)

Dual Optimization Problem

Maximize

Subject to the constraints

Optimal Weight Vector

Lagrange dual for the non-separable case is identical to that of the separable case
No slack variables or their Lagrange multipliers appear Difference: Lagrange multipliers now have an upper bound: C

Compute the optimal weight vector

KuhnTucker Complementarity
Application yields

And we know for support vectors, i 0 and,

Implies that the following constraint is satisfied exactly:

di(W X i + w 0 ) 1+ i = 0

Bounded and Unbounded Support Vectors

Option 1
i = 0 the support vector is on the margin i > 0 i < C For support vectors on the margin 0 < i < C These are called unbounded support vectors

Option 2
i > 0 i = 0 i = C Support vectors between the margins have their Lagrange multipliers equal to the bound C These are called bounded support vectors

Computation of the Bias

By averaging over unbounded support vectors

Unknown data point classified using the indicator function

MATLAB Code: Non-separable Classes, Linear SVM

clear all; X=[0.5 0.5 % Data points 0.5 1.0 1. 1.5 1.5 0.5 1.5 2.0 2.0 1.0 2.5 2.0 2.5 2.5]; % Corresponding classes D=[ -1 -1 1 -1 -1 1 1 1]; q = size(X,1); % size of data set epsilon = 1e-5; % threshold to check support vectors C = 5; % Control parameter H = zeros(q,q); % Initialize Hessian matrix for i = 1:q % Set up the Hessian for j = 1:q H(i,j) = D(i)*D(j)*X(i,:)*X(j,:); end end f = -ones(q,1); %Vectors of ones % Parameters for the Optimization problem vlb = zeros(q,1); % Lower bound of lambdas = 0 vub = C*ones(q,1); % Upper bound C x0 = zeros(q,1); % Initial point is 0 numconstraints = 1; % Number of equality constraints = 1 A = D; % Set up the equality constraint b = 0; %Invoke MATLABs standard qp function for quadratic optimization... [lambda alpha how] = qp(H, f, A, b, vlb, vub, x0, numconstraints); svindex= find( lambda > epsilon); %Find support vectors %Find unbounded support vectors usvindex= find( lambda > epsilon & lambda < C epsilon); ns = length(usvindex);% Number of unbounded support vectors w_0 = (1/ns)*sum(D(usvindex) - ... % Optimal bias H(usvindex,svindex)*lambda(svindex).*D(usvindex )); ...

Simulation
Linearly non-separable data set, Lagrange multiplier values after optimization, and type of support/nonsupport vector

Simulation
Same data set as in the linearly separable case, except for

Number of support vectors is now six Two are unbounded, and four being bounded

classes of data points 3 and 5 interchanged to make the problem nonseparable

Towards the Non-linear SVM

Next:
Lay down the method of designing a support vector machine that has a non-linear decision boundary.

Ideas about the linear SVM are directly extendable to the non-linear SVM using an amazingly simple technique that is based on the notion of an inner product kernel.

Feature Space Maps

Basic idea:
Map the data points using a feature space map into a very high dimensional feature space H

Non-separable data become linearly separable Work on a linear decision boundary in that space

Map everything back to the original pattern space

Pictorial Representation of Nonlinear SVM Design Philosophy

Low dimensional X space High dimensional feature space

Class 1

Class 2

Linear separating boundary in feature space maps to nonlinear boundary in X space

Class 2

Kernel function evaluation

K(Xi, Xj)

Inner product of feature vectors

(Xi) . (Xj)

Kernel Function
Note: X values of the training data appear in the Hessian term Hij only in the form of dot products Search for a Kernel Function that satisfies

Allows us to use K(Xi, Xj) directly in our equations without knowledge of the map!

Example: Computing Feature Space Inner Products via Kernel Functions

Assume X = x, (x) = (1,x,x2,,xm) Choose al = 1, l = 1,,m, and the decision surface is a polynomial in x The inner product (x) (y) = (1,x,x2,,xm)T(1,y,y2,,ym) = 1 + xy + (xy)2 + (xy)m is polynomial of degree m Computing in high dimensions can become computationally very expensive

An Amazing Trick
Careful choice of the coefficients can change everything! m Example: Choosing al = l Yields
m (x) (y) = (xy)l = (1+ xy)m l =0 l
m

A kernel function evaluation equals the inner product, making computation simple.

Example
X = (x1, x2)T Input space 2 Feature space 6 :
2 (X) = (1, 2 x1, 2 x 2 , x1 , x 2 , 2 x 1x 2 ) 2

Admits the kernel function

K(X, Y) = (X) (Y) = (1 + XY)2

Non-Linear SVM Discriminant with Polynomial Kernel Functions

Using kernel functions for inner products the SVM discriminant becomes

This is a non-linear decision boundary in input space generated from a linear superposition of ns kernel function terms This requires the identification of suitable kernel functions that can be employed

Mercers Condition
There exists a mapping and an expansion of a symmetric kernel function

iff

such that

Inner Product Kernels (1)

Polynomial discriminant functions

admit the kernel function

Inner Product Kernels (2)

Radial basis indicator functions of the form

admit the kernel function

Inner Product Kernels (3)

Neural network indicator functions of the form

admit the kernel function

Operational Summary of SVM Learning Algorithm

MATLAB Code Segment for Hessian Computation

% All code same as for linear SVM non-separable data % Code snippet shown for polynomial kernel ord = 2; % Order of polynomial kernel H = zeros(q,q); % Initialize Hessian matrix for i = 1:q % Set up the Hessian for j = 1:q H(i,j) = D(i)*D(j)*(X(i,:)*X(j,:) + 1) ord; end end

SVM Computations Portrayed as a Feedforward Neural Network!

X1 K(X,X1) X1

w 1 = 1d1
w 2 = 2 d2

K(X,X2)

w ns = ns dns

K(X,Xns)

Xns Applied Test vector Support Vector

Kernel function layer

Weights as products Of Lagrange multipliers and desired values

XOR Simulation
XOR Classification, C = 3, Polynomial kernel (XiTXj +1)2

Margins and class separating boundary using a second order polynomial kernel

Intersection of the signum indicator function and non-linear polynomial surface

XOR Simulation
Data specification for the XOR classification problem with Lagrange multiplier values after optimization

Non-linearly Separable Data Scatter Simulation

C = 10: Stress on large margin sacrifice classification accuracy

Non-linearly Separable Data Scatter Simulation

C = 10

Non-linearly Separable Data Scatter Simulation

C= 150: Small margin, high classification accuracy

Non-linearly Separable Data Scatter Simulation

C = 150

Support Vector Machines for Regression

The outputs can take on real values, and thus the training data now take on the form T = {Xk, dk} Xk n, dk Find the functional that models the dependence of d on X in a probabilistic sense Support vector machines for regression approximate functions of the form

High dimensional feature vector

Measure of the Approximation Error

Vapnik introduced a more general error function called the -insensitive loss function

No loss if error range within Loss equal to linear error - if error greater than

-Insensitive Loss Function

Minimization Problem
Assume the empirical risk

subject to

Introduce two sets of slack variables i, i for each of Q input patterns

Cost Functional
Define

The empirical risk minimization problem is then equivalent to minimizing the functional

Primal Variable Lagrangian

Slack variables = (1,, Q)T = (1,, Q)T Lagrange multipliers = (1, Q)T = (1, Q)T = (1, Q)T = (1, Q)T

Saddle Point Behaviour

Simplified Dual Form

Substitution results in the dual form

Dual Lagrangian in Vector Form

Maximize

subject to the constraints

Hij = K(Xi, Xj) D = (d1, , dQ)T 1 = (1,,1)T

Optimal Weight Vector

For ns support vectors

Computing the Optimal Bias

Invoke Kuhn-Tucker complementarity

Substitution of the optimal weight vector yields

Simulation
Regression on noisy hyperbolic tangent data scatter Third order polynomial kernel =0.05, C=10

Simulation
Regression on noisy hyperbolic tangent data scatter Eight order polynomial kernel =0.00005, C=10

Simulation: Zoom Plot

Eight order polynomial kernel =0.00005, C=10 Shows the fine margin, and the support vector

Radial Basis Function Networks

Neural Networks: A Classroom Approach Satish Kumar Department of Physics & Computer Science Dayalbagh Educational Institute (Deemed University)
Copyright 2004 Tata McGraw Hill Publishing Co.

Radial Basis Function Networks

Feedforward neural networks
compute activations at the hidden neurons using an exponential of a [Euclidean] distance measure between the input vector and a prototype vector that characterizes the signal function at a hidden neuron.

Originally introduced into the literature for the purpose of interpolation of data points on a finite training set

Interpolation Problem
Given T = {Xk, dk} Xk n, dk Solving the interpolation problem means finding the map f(Xk) = dk, k = 1,,Q (target points are scalars for simplicity of exposition) RBFN assumes a set of exactly Q non-linear basis functions (||X - Xi||) Map is generated using a superposition of these

Exact Interpolation Equation

Interpolation conditions

Matrix definitions

Yields a compact matrix equation

Michelli Functions
Gaussian functions

Multiquadrics

Inverse multiquadrics

Solving the Interpolation Problem

Choosing correctly ensures invertibility: W = -1 D Solution is a set of weights such that the interpolating surface generated passes through exactly every data point Common form of is the localized Gaussian basis function with center and spread

Radial Basis Function Network

x1 1

Interpolation Example
Assume a noisy data scatter of Q = 10 data points Generator: 2 sin(x) + x In the graphs that follow:
data scatter (indicated by small triangles) is shown along the generating function (the fine line) interpolation shown by the thick line

Interpolant: Smoothness-Accuracy
=1 = 0.3

Derivative Square Function

=1 = 0.3

Notes
Making the spread factor smaller makes the function increasingly non-smooth being able to achieve a 100 per cent mapping accuracy on the ten data points rather than smoothness of the interpolation Quantify the oscillatory behaviour of the interpolants by considering their derivatives Taking the derivative of the function Square it (to make it positive everywhere) Measure the areas under the curves Provides a nice measure of the non-smoothnessthe greater the area, the more non-smooth the function!

Problems
Oscillatory behaviour is highly undesirable for proper generalization Better generalization is achievable with smoother functions which are fitted to noisy data Number of basis functions in the expansion is equal to the number of data points!

Not possible to have for real world data sets can be extremely large Computational and storage requirements for can explode very quickly

The RBFN Solution

Choose the number of basis functions to be some number q < Q No longer restrict the centers of the basis functions to be fixed to the data point values. Spreads of each of the basis functions is permitted to be different and trainable. A bias is included in the final linear superposition
Learning can be done either by supervised or unsupervised techniques Now made trainable parameters of the model

Interpolation with Fewer than Q Basis Functions

Assume centers and spreads of the basis functions are optimized and fixed Proceed to determine the hiddenoutput neuron weights using the procedure adopted in the interpolation case

Solving the Problem in a Least Squares Sense

To formalize this, consider interpolating a set of data points with a number q < Q Then, Introduce the notion of error since the interpolation is not exact

Compute the Optimal Weights

Differentiating w.r.t. wi and setting it equal to zero

Then

Pseudo-Inverse
This yields

Equation solved using singular value decomposition

where

Pseudo-inverse (is not square: q Q)

Two Observations
Straightforward to include a bias term w0 into the approximation equation

Basis function is generally chosen to be the Gaussian

Generalizing Further
RBFs can be generalized to include arbitrary covariance matrices Ki

Universal approximator RBFNs have the best approximation property

The set of approximating functions that RBFNs are capable of generating, there is one function that has the minimum approximation error for any given function which has to be approximated

Simulation Example
Consider approximating the ten noisy data points with fewer than ten basis functions f(x) = 2 sin(x) + x Five basis functions chosen for approximation
half the number of data points.

Selected to be centered at data points 1, 3, 5, 7 and 9 (data points numbered from 1 through 10 from left to right on the graph [next slide])

Simulation Example
= 0.5 =1

Simulation Example
=5 = 10

MATLAB Code for RBFN

Q = 10; % 10 data points noise = 0.6; % additive noise x= linspace(-2*pi,2*pi,Q); % X samples scatter = (2*rand(1,Q) - 1)*noise; d = (2*sin(x) + x + scatter); % Y data testpts = 100; % Number of test data testx= linspace(-2*pi, 2*pi, testpts); testy = (2*sin(testx) + testx); sigma = .5; for i = 1:Q k=1; for j = 1:Q/2 phi(i,j) = exp(-(x(i)x(k))2/(2*sigma2)); k=k+2; end end % Compute pseudo inverse pseudoinv = inv(phi * phi) * phi; W = pseudoinv * d; % Compute weights % Generate phi matrixfor test data for i = 1:testpts k=1; for j = 1:Q/2 testphi(i,j) = exp(-(testx(i)x(k))2/(2*sigma2)); k=k+2; end end % Generate approximant f = testphi* W; ...

RBFN Classifier to Solve the XOR Problem

Will serve to show how a bias term is included at the output linear neuron RBFN classifier is assumed to have two basis functions centered at data points 1 and 4

Visualizing the Basis Functions

RBFN Architecture

+1 x1
1

w1 w2

2
Basis functions centered at data points 1 and 4

Finding the Solution

We have the D, W, vectors and matrices as shown alongside

Pseudo inverse

Weight vector

Visualization of Solution

Ill Posed, Well Posed Problems

Ill-posed problems originally identified by Hadamard in the context of partial differential equations. Problems are well-posed if their solutions satisfy three conditions: they exist they are unique they depend continuously on the data set Problems that are not well posed are ill-posed Example differentiation is an ill-posed problem because some solutions need not depend continuously on the data inverse kinematics problem which maps external real world movements into an internal coordinate system

Approximation Problem is Ill Posed

The solution to the problem is not unique Sufficient data is not available to reconstruct the mapping uniquely Data points are generally noisy The solution to the ill-posed approximation problem lies in regularization Necessarily problem dependent Regularization techniques impose smoothness constraints on the approximating set of functions. Some degree of smoothness is necessary for the representative function since it has to be robust against noise.
essentially requires the introduction of certain constraints that impose a restriction on the solution space

Regularization Risk Functional

Assume training data T generated by random sampling of the function Regularization techniques replace the standard error minimization problem with minimization of a regularization risk functional

Tikhonov Functional
Regularization risk functional comprises two terms

error function

smoothness functional

intuitively appealing to consider using function derivatives to characterize smoothness

Regularization Parameter
The smoothness functional is expressed as
P is a linear differential operator, |||| is a norm defined on the function space (Hilbert space)

The regularization risk functional to be minimized is regularization parameter

EulerLagrange Equations
We need to calculate the functional derivative of Rr called the Frechet differential, and set it equal to zero

A series of algebraic steps (see text) yields the Euler-Lagrange equations for the Tikhonov functional

Solving the EulerLagrange System

Requires the use of the Greens function for the linear differential operator Greens function for a linear differential operator Q satisfies prescribed boundary conditions and has continuous partial derivatives with respect to X everywhere except at X = Xi where there is a singularity. Satisfies the differential equation QG(X,Y) = 0

~ Q = PP

Solving the EulerLagrange System

See algebra in the text Yields the final solution

Linear weighted sum of Q Greens functions centered at the data points Xi

Quick Summary
The regularization solution uses Q Greens functions in a weighted summation The nature of the chosen Greens function depends on the kind of differential operator P chosen for the regularization term of Rr

Solving for Weights

Starting point Evaluate the equation at each data point

Solving for Weights

Introduce matrix notation

Solving for Weights

With these matrix definitions

and Finally (!)

Euclidean Norm Dependence

If the differential operator P is
rotationally invariant translationally invariant

Then the Greens function G(X,Y) depends only on the Euclidean norm of the difference of the vectors Then

Multivariate Gaussian is a Greens Function

Gaussian function defined by The final minimizer is then

is a Greens function defined by the selfadjoint differential operator

MATLAB Code Segment for RBFN Regularized Interpolation

% Code segment for Regularized Interpolation lambda = 0.5; for i = 1:Q for j = 1:Q phi(i,j) = exp(-(x(i)-x(j))2/(2*sigma2)); end end Wreg = inv(phi + (lambda * eye(Q))) * d; for k = 1:testpts for i = 1:Q phitest(k,i) = exp(-(testx(k)-x(i))2/(2*sigma2)); end f(k) = phitest(k,:)*W_reg; end ...

Comparing Regularized and Nonregularized Interpolations

No regularizing term =0 Regularizing term = 0.5

Comparing Regularized and Nonregularized Interpolations

No regularizing term =0 Regularizing term = 0.5

Generalized Radial Basis Function Network

We now proceed to generalize the RBFN in two steps
Reduce the Number of Basis Functions, Use Non-Data Centers Use a Weighted Norm

Reduce the Number of Basis Functions, Use Non-Data Centers

The approximating function is,

Interested in minimizing the regualrized risk

Simplifying the First Term

Using the matrix substitutions

yields

Simplifying the Second Term

Use the properties of the adjoint of the differential operator and Greens function where

Finally

Using a Weighted Norm

Replace the standard Euclidean norm by

S is a norm-weighting matrix of dimension nn Substituting into the Gaussian yields

With K = 2I is a restricted form

where K is the covariance matrix

Generalized Radial Basis Function Network

Some properties
Fewer than Q basis functions A weighted norm to compute distances, which manifests itself as a general covariance matrix A bias weight at the output neuron Tunable weights, centers, and covariance matrices

Learning in RBFNs
Random Subset Selection
Out of Q data points, q of them are selected at random Centers of the Gaussian basis functions are set to those data points.

Semi-random selection
A basis function is placed at every rth data point

Random, Semi-random Selection

Spreads are a function of the maximum distance between chosen centers and q

Gaussians are then defined

such that

Operational Summary of Radial Basis Function Network

Design assuming random placement of centers and fixed spreads

Hybrid Learning Procedure

Determine the centers of the basis functions using a clustering algorithm such as the k-means clustering algorithm Tune the hidden to output weights using the LMS procedure

k-Means Clustering

Supervised Learning of Centers

All the parameters being free and subject to a standard supervised learning procedure such as gradient descent Define an error function

Free parameters: centers, spreads (covariance matrices), weights

Partial Derivatives

Update Equations

Image Classification Application

High dimensional feature space leads to poor generalization performance of image classification algorithms Indexing and retrieval of image collections in the World Wide Web is a major challenge Support vector machines provide much promise in such applications. We now describe the application of support vector machines to the problem of image classification

Extending SVMs to the Multi-class Case

One against the others C hyperplanes for C classes

Class CJ is assigned to point X if

Description of Image Data Set

Corel Stock Photo collection: 200 classes each with 100 images Two databases derived from the original collection as follows:
Corel14 Classes were from the original Corel classification:
14 classes and 1400 images (100 images per category) air shows, bears, elephants, tigers, Arabian horses, polar bears, African specialty animals, cheetahs-leopards-jaguars, bald eagles, mountains, fields, deserts, sunrises-sunsets, night scenes

Corel7

This database has many outliers, deliberately retained Newly designed categories 7 classes and 2670 images

airplanes, birds, boats, buildings, fish, people, vehicles

Corel14

Corel7

Colour Histogram
Colour is represented by a point in a three dimensional colour space:
Huesaturationluminance value (HSV) Is in direct correspondence with the RGB space.

Sixteen bins per colour component are selected yielding a dimension of 4096

Selection of Kernel
Polynomial Gaussian General kernels

Gaussian Radial Basis Function Classifiers and SVMs

Support vector machine is indeed a radial basis function network where
the centers correspond to the support vectors the number of centers is the number of support vectors the weights and bias are all chosen automatically using the SVM learning procedure

This procedure gives excellent results when compared with Gaussian radial basis function networks trained with non-SVM methods.

Experiment 1
For the preliminary experiment, 1400 Corel14 samples were divided into 924 training and 476 test samples For Corel7 the 2670 samples were divided into 1375 training and test samples each Error Rates

Experiment 2
Introducing Non-Gaussian Kernels

In addition to a linear SVM, the authors employed three kernels: Gaussian, Laplacian, sub-linear

Corel14

Corel7

Weight Regularization
Regularization is a technique that builds a penalty function into the error function itself
increases the error on poorly generalizing networks

Feedforward neural networks with large number and magnitude of weights generate over-fitted network mappings that have high curvature in pattern space Weight regularization: Reduce the curvature by penalizing networks that have large weight values

Introducing a Regularizer
Basic idea: add a sum of weight squares term over all weights in the network presently being optimized is a weight regularization parameter A weight decay regularizer needs to treat both input-hidden and hidden-output weights differently in order to work well

MATLAB Simulation
Two-class data for weight regularization example

MATLAB Simulation = 0, 0.01

Signal function Contours Weight space trajectories

MATLAB Simulation = 0.1, 1

Signal function Contours Weight space trajectories

Committees of Networks
A set of different neural network architectures that work together to generate an estimate of the underlying function f(X) Each network is assumed to have been trained on the same data distribution although not necessarily the same data set An averaging out of noise components reduces the overall noise in prediction Performance can actually improve at a minimal computational cost when using a committee of networks

Architecture of Committee Network

AVG

Averaging Reduces the Error

Analysis shows that the error can only reduce on averaging Assume

Mixtures of Experts
Learning a map is decomposed into the problem of learning mappings over different regions of the pattern space Different networks are trained over those regions Outputs of these individual networks can then be employed to generate an output for the entire pattern space by appropriately selecting the correct networks output Latter task can be done by a separate gating network The entire collection of individual networks together with the gating network is called the mixture of experts model

Industrial Management
No ratings yet
Industrial Management
11 pages
Machine Learning Concepts & Design
No ratings yet
Machine Learning Concepts & Design
20 pages
Java Programming Concepts Overview
No ratings yet
Java Programming Concepts Overview
9 pages
Unit 3
No ratings yet
Unit 3
21 pages
IMDB - Movie Recomendation-DA Project
No ratings yet
IMDB - Movie Recomendation-DA Project
4 pages
Toc Cie Papers
No ratings yet
Toc Cie Papers
5 pages
Bda Unit 4 PPT 2
No ratings yet
Bda Unit 4 PPT 2
44 pages
STLD
No ratings yet
STLD
18 pages
Web Development Internship Report
No ratings yet
Web Development Internship Report
14 pages
VTU Eligibility Test For Research (VTU-ETR) Syllabii For Ph.D/M.Sc. (Engg) Programmes
No ratings yet
VTU Eligibility Test For Research (VTU-ETR) Syllabii For Ph.D/M.Sc. (Engg) Programmes
49 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
65 pages
Credit Card Fraud Detection Seminar
No ratings yet
Credit Card Fraud Detection Seminar
16 pages
Deep Learning Techniques Notes
No ratings yet
Deep Learning Techniques Notes
42 pages
Discrete Mathematics: Second Edition
No ratings yet
Discrete Mathematics: Second Edition
11 pages
Programming Languages: Names & Scopes
No ratings yet
Programming Languages: Names & Scopes
119 pages
Unit 1 Machine Learning Aktu
No ratings yet
Unit 1 Machine Learning Aktu
10 pages
Operations Research 8th & 6th Sem Mechanical
100% (1)
Operations Research 8th & 6th Sem Mechanical
308 pages
Class Notes
No ratings yet
Class Notes
41 pages
Multicore Architecture Insights
No ratings yet
Multicore Architecture Insights
186 pages
Cryptography Study Guide
No ratings yet
Cryptography Study Guide
10 pages
AL 3451 ML Unit-1
No ratings yet
AL 3451 ML Unit-1
38 pages
Project Stage I Modi
No ratings yet
Project Stage I Modi
24 pages
Unit-3 of Ai
No ratings yet
Unit-3 of Ai
19 pages
AI Mid-Term
No ratings yet
AI Mid-Term
3 pages
Bannari Amman Institute of Technology
No ratings yet
Bannari Amman Institute of Technology
10 pages
BCS602 Model Set 1 Paper
No ratings yet
BCS602 Model Set 1 Paper
2 pages
Information Theory Coding and Cryptograp PDF
No ratings yet
Information Theory Coding and Cryptograp PDF
140 pages
AI Exam Papers for CSE & ECC Students
No ratings yet
AI Exam Papers for CSE & ECC Students
4 pages
M.Tech Computer Science Engineering Syllabus
No ratings yet
M.Tech Computer Science Engineering Syllabus
33 pages
ML in Materials Science: A Review
No ratings yet
ML in Materials Science: A Review
40 pages
04 - IoT - Unit 4 - Data Handling & Analytics
No ratings yet
04 - IoT - Unit 4 - Data Handling & Analytics
52 pages
of Bayesian Statistics (Chirayu Jain & Group)
No ratings yet
of Bayesian Statistics (Chirayu Jain & Group)
8 pages
DL Unit-V
100% (1)
DL Unit-V
8 pages
Data Science and Big Data Analytics-1-82
No ratings yet
Data Science and Big Data Analytics-1-82
82 pages
III and IV SEM Syllabus
No ratings yet
III and IV SEM Syllabus
36 pages
Ad Hoc & Wireless Sensor Networks
No ratings yet
Ad Hoc & Wireless Sensor Networks
318 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Probability and Queueing Theory: August 2012
No ratings yet
Probability and Queueing Theory: August 2012
3 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
Transportation Model and Its Variants
No ratings yet
Transportation Model and Its Variants
71 pages
10th Kannada Maths 1
No ratings yet
10th Kannada Maths 1
200 pages
Alarm Problem
No ratings yet
Alarm Problem
4 pages
Chapter 02 Information Theory
No ratings yet
Chapter 02 Information Theory
15 pages
Understanding Version Spaces in ML
No ratings yet
Understanding Version Spaces in ML
26 pages
Unit 2: Feature Extraction & Selection: Artificial Intelligence & Machine Learning
No ratings yet
Unit 2: Feature Extraction & Selection: Artificial Intelligence & Machine Learning
42 pages
Unit-1 - Computer Networks Notes
No ratings yet
Unit-1 - Computer Networks Notes
47 pages
M.Tech Project Report Guidelines
No ratings yet
M.Tech Project Report Guidelines
6 pages
Nria20-Dl - Unit-4 Notes-Final
No ratings yet
Nria20-Dl - Unit-4 Notes-Final
21 pages
The Internet of Things. Enabling Technologies - Platforms - and Use Cases by Pethuru Raj and Anupama C. Raman
No ratings yet
The Internet of Things. Enabling Technologies - Platforms - and Use Cases by Pethuru Raj and Anupama C. Raman
13 pages
MA3355 - AprMay 2023
No ratings yet
MA3355 - AprMay 2023
4 pages
Week 5: Logistic Regression & SVM Quiz
100% (1)
Week 5: Logistic Regression & SVM Quiz
4 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
7 pages
Unit 1
100% (1)
Unit 1
19 pages
Max Flow Solutions for Algorithm Design
No ratings yet
Max Flow Solutions for Algorithm Design
10 pages
BRMK557 Quiz
No ratings yet
BRMK557 Quiz
2 pages
Apache Mahout: Scalable ML Algorithms
0% (1)
Apache Mahout: Scalable ML Algorithms
26 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
Training Neural Networks: Key Concepts
No ratings yet
Training Neural Networks: Key Concepts
37 pages
Lecture 2
No ratings yet
Lecture 2
67 pages
Guardians of The Wild
No ratings yet
Guardians of The Wild
10 pages
CFPQ Maths10
50% (2)
CFPQ Maths10
227 pages
Lecture 2 v2 (Compatibility Mode)
No ratings yet
Lecture 2 v2 (Compatibility Mode)
240 pages
VLSI Companies Chennai
No ratings yet
VLSI Companies Chennai
2 pages
LG Ht903ta
No ratings yet
LG Ht903ta
24 pages
Apple Product Catalogue: WWW - Syntech.co - Za Info@syntech - Co.za
No ratings yet
Apple Product Catalogue: WWW - Syntech.co - Za Info@syntech - Co.za
22 pages
GOT1020 Product Explanation GOT1020 Product Explanation GOT1020 Product Explanation GOT1020 Product Explanation
No ratings yet
GOT1020 Product Explanation GOT1020 Product Explanation GOT1020 Product Explanation GOT1020 Product Explanation
26 pages
Color Theory - S&a
No ratings yet
Color Theory - S&a
35 pages
CS 1000/CS 3000 Peripherals: Toc A-1
No ratings yet
CS 1000/CS 3000 Peripherals: Toc A-1
206 pages
WWW Rejinpaul Com 2013 04 Anna University Question Papers El
100% (1)
WWW Rejinpaul Com 2013 04 Anna University Question Papers El
5 pages
Digital Systems and Circuit Design Overview
No ratings yet
Digital Systems and Circuit Design Overview
3 pages
Faisal Al-Kathiri: Telecom Engineer Profile
No ratings yet
Faisal Al-Kathiri: Telecom Engineer Profile
5 pages
R61733XX Error List L1 Vers 2 0
No ratings yet
R61733XX Error List L1 Vers 2 0
43 pages
CCNPv7 ROUTE Lab2-2 EIGRP-Stub-Routing Student
No ratings yet
CCNPv7 ROUTE Lab2-2 EIGRP-Stub-Routing Student
17 pages
Flyer
No ratings yet
Flyer
2 pages
TP-Link TD-W8961ND Modem Router Overview
No ratings yet
TP-Link TD-W8961ND Modem Router Overview
3 pages
Im12k01b02-01e (1) Uv700g
No ratings yet
Im12k01b02-01e (1) Uv700g
144 pages
Jbase Basic
100% (1)
Jbase Basic
437 pages
01 WT 00
No ratings yet
01 WT 00
3 pages
Tzres DLL
No ratings yet
Tzres DLL
8 pages
16GB DDR4-2133 ECC Memory Module Specs
No ratings yet
16GB DDR4-2133 ECC Memory Module Specs
2 pages
EXPDM1540.MK - UP.Unattended Probes - 004 Hardware User Manual - v1.1.6
No ratings yet
EXPDM1540.MK - UP.Unattended Probes - 004 Hardware User Manual - v1.1.6
16 pages
Bookkeeping NC III: Computer Operations
No ratings yet
Bookkeeping NC III: Computer Operations
18 pages
Capacitance and Dissipation Factor Measurements
No ratings yet
Capacitance and Dissipation Factor Measurements
10 pages
SC8100 Remote Monitoring Unit Overview
No ratings yet
SC8100 Remote Monitoring Unit Overview
6 pages
Jetline 9500-FW-MP-SER-PC Manual
No ratings yet
Jetline 9500-FW-MP-SER-PC Manual
98 pages
Pro-Form 965r Bike Manual
No ratings yet
Pro-Form 965r Bike Manual
16 pages
Apache Spark Tutorial
100% (4)
Apache Spark Tutorial
36 pages
Acclaim Thin+Bezel Manual Rev1 PDF
No ratings yet
Acclaim Thin+Bezel Manual Rev1 PDF
12 pages
MS7180 - Manual
No ratings yet
MS7180 - Manual
52 pages
Communication Interface
100% (1)
Communication Interface
45 pages
Operating Instructions SITRANS Probe LU (HART) 08/2017 Edition Ultrasonic Transmitters
No ratings yet
Operating Instructions SITRANS Probe LU (HART) 08/2017 Edition Ultrasonic Transmitters
837 pages
Game Informer: Year Game Genre Platform(s) Developer(s)
No ratings yet
Game Informer: Year Game Genre Platform(s) Developer(s)
9 pages
PROJECT TIMELINE IT Infra and Others @HMA Medika Indonesia
No ratings yet
PROJECT TIMELINE IT Infra and Others @HMA Medika Indonesia
1 page