Keywords:
Computational learning theory -- Congresses.
;
Machine learning -- Mathematical models -- Congresses.
;
Electronic books.
Description / Table of Contents:
This text details advances in learning theory that relate to problems studied in neural networks, machine learning, mathematics and statistics.
Type of Medium:
Online Resource
Pages:
1 online resource (438 pages)
Edition:
1st ed.
ISBN:
9781601294012
Series Statement:
Nato Science Series ; v.190
URL:
https://ebookcentral.proquest.com/lib/geomar/detail.action?docID=267471
DDC:
006.3/1
Language:
English
Note:
Cover -- Title page -- Preface -- Organizing committee -- List of chapter contributors -- Contents -- 1 An Overview of Statistical Learning Theory -- 1.1 Setting of the Learning Problem -- 1.1.1 Function estimation model -- 1.1.2 Problem of risk minimization -- 1.1.3 Three main learning problems -- 1.1.4 Empirical risk minimization induction principle -- 1.1.5 Empirical risk minimization principle and the classical methods -- 1.1.6 Four parts of learning theory -- 1.2 The Theory of Consistency of Learning Processes -- 1.2.1 The key theorem of the learning theory -- 1.2.2 The necessary and sufficient conditions for uniform convergence -- 1.2.3 Three milestones in learning theory -- 1.3 Bounds on the Rate of Convergence of the Learning Processes -- 1.3.1 The structure of the growth function -- 1.3.2 Equivalent definition of the VC dimension -- 1.3.3 Two important examples -- 1.3.4 Distribution independent bounds for the rate of convergence of learning processes -- 1.3.5 Problem of constructing rigorous (distribution dependent) bounds -- 1.4 Theory for Controlling the Generalization of Learning Machines -- 1.4.1 Structural risk minimization induction principle -- 1.5 Theory of Constructing Learning Algorithms -- 1.5.1 Methods of separating hyperplanes and their generalization -- 1.5.2 Sigmoid approximation of indicator functions and neural nets -- 1.5.3 The optimal separating hyperplanes -- 1.5.4 The support vector network -- 1.5.5 Why can neural networks and support vectors networks generalize? -- 1.6 Conclusion -- 2 Best Choices for Regularization Parameters in Learning Theory: On the Bias-Variance Problem -- 2.1 Introduction -- 2.2 RKHS and Regularization Parameters -- 2.3 Estimating the Confidence -- 2.4 Estimating the Sample Error -- 2.5 Choosing the optimal & -- #947 -- -- 2.6 Final Remarks -- 3 Cucker Smale Learning Theory in Besov Spaces.
,
3.1 Introduction -- 3.2 Cucker Smale Functional and the Peetre K-Functional -- 3.3 Estimates for the CS-Functional in Anisotropic Besov Spaces -- 4 High-dimensional Approximation by Neural Networks -- 4.1 Introduction -- 4.2 Variable-basis Approximation and Optimization -- 4.3 Maurey-Jones-Barron's Theorem -- 4.4 Variation with respect to a Set of Functions -- 4.5 Rates of Approximate Optimization over Variable Basis Functions -- 4.6 Comparison with Linear Approximation -- 4.7 Upper Bounds on Variation -- 4.8 Lower Bounds on Variation -- 4.9 Rates of Approximation of Real-valued Boolean Functions -- 5 Functional Learning through Kernels -- 5.1 Some Questions Regarding Machine Learning -- 5.2 r.k.h.s Perspective -- 5.2.1 Positive kernels -- 5.2.2 r.k.h.s and learning in the literature -- 5.3 Three Principles on the Nature of the Hypothesis Set -- 5.3.1 The learning problem -- 5.3.2 The evaluation functional -- 5.3.3 Continuity of the evaluation functional -- 5.3.4 Important consequence -- 5.3.5 R[sup(& -- #967 -- )] the set of the pointwise denned functions on & -- #967 -- -- 5.4 Reproducing Kernel Hilbert Space (r.k.h.s) -- 5.5 Kernel and Kernel Operator -- 5.5.1 How to build r.k.h.s? -- 5.5.2 Carleman operator and the regularization operator -- 5.5.3 Generalization -- 5.6 Reproducing Kernel Spaces (r.k.k.s) -- 5.6.1 Evaluation spaces -- 5.6.2 Reproducing kernels -- 5.7 Representer Theorem -- 5.8 Examples -- 5.8.1 Examples in Hilbert space -- 5.8.2 Other examples -- 5.9 Conclusion -- 6 Leave-one-out Error and Stability of Learning Algorithms with Applications -- 6.1 Introduction -- 6.2 General Observations about the Leave-one-out Error -- 6.3 Theoretical Attempts to Justify the Use of the Leave-one-out Error -- 6.3.1 Early work in non-parametric statistics -- 6.3.2 Relation to VC-theory -- 6.3.3 Stability.
,
6.3.4 Stability of averaging techniques -- 6.4 Kernel Machines -- 6.4.1 Background on kernel machines -- 6.4.2 Leave-one-out error for the square loss -- 6.4.3 Bounds on the leave-one-out error and stability -- 6.5 The Use of the Leave-one-out Error in Other Learning Problems -- 6.5.1 Transduction -- 6.5.2 Feature selection and rescaling -- 6.6 Discussion -- 6.6.1 Sensitivity analysis, stability, and learning -- 6.6.2 Open problems -- 7 Regularized Least-Squares Classification -- 7.1 Introduction -- 7.2 The RLSC Algorithm -- 7.3 Previous Work -- 7.4 RLSC vs. SVM -- 7.5 Empirical Performance of RLSC -- 7.6 Approximations to the RLSC Algorithm -- 7.6.1 Low-rank approximations for RLSC -- 7.6.2 Nonlinear RLSC application: image classification -- 7.7 Leave-one-out Bounds for RLSC -- 8 Support Vector Machines: Least Squares Approaches and Extensions -- 8.1 Introduction -- 8.2 Least Squares SVMs for Classification and Function Estimation -- 8.2.1 LS-SVM classifiers and link with kernel FDA -- 8.2.2 Function estimation case and equivalence to a regularization network solution -- 8.2.3 Issues of sparseness and robustness -- 8.2.4 Bayesian inference of LS-SVMs and Gaussian processes -- 8.3 Primal-dual Formulations to Kernel PGA and CCA -- 8.3.1 Kernel PCA as a one-class modelling problem and a primal-dual derivation -- 8.3.2 A support vector machine formulation to Kernel CCA -- 8.4 Large Scale Methods and On-line Learning -- 8.4.1 Nyström method -- 8.4.2 Basis construction in the feature space using fixed size LS-SVM -- 8.5 Recurrent Networks and Control -- 8.6 Conclusions -- 9 Extension of the & -- #957 -- -SVM Range for Classification -- 9.1 Introduction -- 9.2 & -- #957 -- Support Vector Classifiers -- 9.3 Limitation in the Range of & -- #957 -- -- 9.4 Negative Margin Minimization -- 9.5 Extended & -- #957 -- -SVM.
,
9.5.1 Kernelization in the dual -- 9.5.2 Kernelization in the primal -- 9.6 Experiments -- 9.7 Conclusions and Further Work -- 10 Kernels Methods for Text Processing -- 10.1 Introduction -- 10.2 Overview of Kernel Methods -- 10.3 From Bag of Words to Semantic Space -- 10.4 Vector Space Representations -- 10.4.1 Basic vector space model -- 10.4.2 Generalised vector space model -- 10.4.3 Semantic smoothing for vector space models -- 10.4.4 Latent semantic kernels -- 10.4.5 Semantic diffusion kernels -- 10.5 Learning Semantics from Cross Language Correlations -- 10.6 Hypertext -- 10.7 String Matching Kernels -- 10.7.1 Efficient computation of SSK -- 10.7.2 n-grams- a language independent approach -- 10.8 Conclusions -- 11 An Optimization Perspective on Kernel Partial Least Squares Regression -- 11.1 Introduction -- 11.2 PLS Derivation -- 11.2.1 PGA regression review -- 11.2.2 PLS analysis -- 11.2.3 Linear PLS -- 11.2.4 Final regression components -- 11.3 Nonlinear PLS via Kernels -- 11.3.1 Feature space K-PLS -- 11.3.2 Direct kernel partial least squares -- 11.4 Computational Issues in K-PLS -- 11.5 Comparison of Kernel Regression Methods -- 11.5.1 Methods -- 11.5.2 Benchmark cases -- 11.5.3 Data preparation and parameter tuning -- 11.5.4 Results and discussion -- 11.6 Case Study for Classification with Uneven Classes -- 11.7 Feature Selection with K-PLS -- 11.8 Thoughts and Conclusions -- 12 Multiclass Learning with Output Codes -- 12.1 Introduction -- 12.2 Margin-based Learning Algorithms -- 12.3 Output Coding for Multiclass Problems -- 12.4 Training Error Bounds -- 12.5 Finding Good Output Codes -- 12.6 Conclusions -- 13 Bayesian Regression and Classification -- 13.1 Introduction -- 13.1.1 Least squares regression -- 13.1.2 Regularization -- 13.1.3 Probabilistic models -- 13.1.4 Bayesian regression -- 13.2 Support Vector Machines.
,
13.3 The Relevance Vector Machine -- 13.3.1 Model specification -- 13.3.2 The effective prior -- 13.3.3 Inference -- 13.3.4 Making predictions -- 13.3.5 Properties of the marginal likelihood -- 13.3.6 Hyperparameter optimization -- 13.3.7 Relevance vector machines for classification -- 13.4 The Relevance Vector Machine in Action -- 13.4.1 Illustrative synthetic data: regression -- 13.4.2 Illustrative synthetic data: classification -- 13.4.3 Benchmark results -- 13.5 Discussion -- 14 Bayesian Field Theory: from Likelihood Fields to Hyperfields -- 14.1 Introduction -- 14.2 The Bayesian framework -- 14.2.1 The basic probabilistic model -- 14.2.2 Bayesian decision theory and predictive density -- 14.2.3 Bayes' theorem: from prior and likelihood to the posterior -- 14.3 Likelihood models -- 14.3.1 Log-probabilities, energies, and density estimation -- 14.3.2 Regression -- 14.3.3 Inverse quantum theory -- 14.4 Prior models -- 14.4.1 Gaussian prior factors and approximate symmetries -- 14.4.2 Hyperparameters and hyperfields -- 14.4.3 Hyperpriors for hyperfields -- 14.4.4 Auxiliary fields -- 14.5 Summary -- 15 Bayesian Smoothing and Information Geometry -- 15.1 Introduction -- 15.2 Problem Statement -- 15.3 Probability-Based Inference -- 15.4 Information-Based Inference -- 15.5 Single-Case Geometry -- 15.6 Average-Case Geometry -- 15.7 Similar-Case Modeling -- 15.8 Locally Weighted Geometry -- 15.9 Concluding Remarks -- 16 Nonparametric Prediction -- 16.1 Introduction -- 16.2 Prediction for Squared Error -- 16.3 Prediction for 0 - 1 Loss: Pattern Recognition -- 16.4 Prediction for Log Utility: Portfolio Selection -- 17 Recent Advances in Statistical Learning Theory -- 17.1 Introduction -- 17.2 Problem Formulations -- 17.2.1 Uniform convergence of empirical means -- 17.2.2 Probably approximately correct learning -- 17.3 Summary of "Classical" Results.
,
17.3.1 Fixed distribution case.
Permalink