博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
scikit对超参数模型优化对比(网格搜索与随机搜索对比)
阅读量:4145 次
发布时间:2019-05-25

本文共 13729 字,大约阅读时间需要 45 分钟。

超参数优化的方法  引自维基百科

Hyperparameter optimization

From Wikipedia, the free encyclopedia

In , hyperparameter optimization or tuning is the problem of choosing a set of optimal  for a learning algorithm. A hyperparameter is a  whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned.

The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined  on given independent data. The objective function takes a tuple of hyperparameters and returns the associated loss.  is often used to estimate this generalization performance.

 

Contents

Approaches[]

Grid search[]

The traditional way of performing hyperparameter optimization has been grid search, or a parameter sweep, which is simply an  through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by  on the training set or evaluation on a held-out validation set.

Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search.

For example, a typical soft-margin   equipped with an  has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say

{\displaystyle C\in \{10,100,1000\}}

{\displaystyle \gamma \in \{0.1,0.2,0.5,1.0\}}

Grid search then trains an SVM with each pair (C, γ) in the  of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure.

Grid search suffers from the , but is often  because the hyperparameter settings it evaluates are typically independent of each other.

Random search[]

Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also , and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample.

Bayesian optimization[]

Main article: 

Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization, aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.

Gradient-based optimization[]

For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as  or logistic regression.

A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using . 

Evolutionary optimization[]

Main article: 

Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses  to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a  inspired by the biological concept of :

  1. Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+)
  2. Evaluate the hyperparameters tuples and acquire their  (e.g., 10-fold  accuracy of the machine learning algorithm with those hyperparameters)
  3. Rank the hyperparameter tuples by their relative fitness
  4. Replace the worst-performing hyperparameter tuples with new hyperparameter tuples generated through  and 
  5. Repeat steps 2-4 until satisfactory algorithm performance is reached or algorithm performance is no longer improving

Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms, ,  architecture search, as well as training of the weights in deep neural networks.

Population-based[]

Population Based Training (PBT) learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. Poorly performing models are iteratively replaced with models that adopt modified hyperparameter values from a better performer. The modification allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures.

Others[]

 and  approaches have also been developed.

Open-source software[]

Grid search[]

  •  is a Kubernetes-native system which includes grid search.
  •  is a Python package which includes  search.
  •  is a Python library for distributed hyperparameter tuning and supports grid search.
  •  includes grid search for .
  •  provides grid search over algorithms in the H2O open source machine learning library.

Random search[]

  • , also via  and , are Python packages which include random search.
  •  is a Kubernetes-native system which includes random search.
  •  is a Python package which includes  search.
  •  is a Python library for distributed hyperparameter tuning and supports random search over arbitrary parameter distributions.
  •  includes a customizable random search for .

Bayesian[]

  •  is a Bayesian hyperparameter optimization layer on top of .
  •  is a Python-based experimentation platform that supports Bayesian optimization and bandit optimization as exploration strategies.
  •  is a Matlab package which uses  for minimizing a black-box function over discrete inputs. A Python 3 implementation is also included.
  •  is a Python package which combines Bayesian optimization with bandit-based methods.
  •  is a Kubernetes-native system which includes bayesian optimization.
  • , also with , is an  package for model-based/Bayesian optimization of black-box functions.
  •  is a Python package or sequential model-based optimization with a scipy.optimize interface.
  •  SMAC is a Python/Java library implementing Bayesian optimization.
  •  is an R package for tuning random forests using model-based optimization.
  •  is a Python package for black box optimization, compatible with arbitrary functions that need to be optimized.

Gradient-based optimization[]

  •  is a Python package containing Tensorflow implementations and wrappers for gradient-based hyperparamteter optimization with forward and reverse mode algorithmic differentiation.
  •  is an open-source software library which provides a gradient boosting framework for C++, Java, Python, R, and Julia.

Evolutionary[]

  •  is a Python framework for general evolutionary computation which is flexible and integrates with parallelization packages like  and , and other Python frameworks like  via .
  •  is a Python package that performs Deep Neural Network architecture search using .
  •  is a Python package which includes population control methods and particle swarm optimization.
  •  is a Python library for distributed hyperparameter tuning and leverages  for evolutionary algorithm support.

Other[]

  •  is a C++ package with a Python API which has a parameter-free optimizer based on  and  optimizers working in tandem.
  •  is a Python library for hyperparameter tuning execution and integrates with/scales many existing hyperparameter optimization libraries such as , , and .
  •  is a Python package for spectral hyperparameter optimization.
  • , also via  and , are Python packages which include  based distributed hyperparameter optimization.
  •  is a Kubernetes-native system which includes grid, random search, bayesian optimization, hyperband, and NAS based on reinforcement learning.
  •  is a Python package for gradient-free optimization using techniques such as differential evolution, sequential quadratic programming, fastGA, covariance matrix adaptation, population control methods, and particle swarm optimization.
  •  is a Python package which includes hyperparameter tuning for neural networks in local and distributed environments. Its techniques include TPE, random, anneal, evolution, SMAC, batch, grid, and hyperband.
  •  is a similar Python package which includes several techniques grid search, Bayesian and genetic Optimization
  •  is a Python implementation of .
  •  is a Python package that uses a  model

Commercial services[]

  •  uses Gaussian processes to tune hyperparameters.
  •  supports mixed search domains
  •  supports mixed search domains
  •  supports multiobjective, multifidelity and constraint optimization
  •  supports mixed search domains, multiobjective, constraints, parallel optimization and surrogate models.
  •  supports mixed search domains, multiobjective, multisolution, multifidelity, constraint (linear and black-box), and parallel optimization.

See also[]

References[]

  1. ^   Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine Learning". : [].
  2. ^    Bergstra, James; Bengio, Yoshua (2012).  (PDF). Journal of Machine Learning Research13: 281–305.
  3.  Chin-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin (2010). . Technical Report, .
  4.  Chicco D (December 2017). . BioData Mining10 (35): 35. :.  .  .
  5.  Ziyu, Wang; Frank, Hutter; Masrour, Zoghi; David, Matheson; Nando, de Feitas (2016). "Bayesian Optimization in a Billion Dimensions via Random Embeddings". Journal of Artificial Intelligence Research55: 361–387. :. :.
  6.  Hutter, Frank; Hoos, Holger; Leyton-Brown, Kevin (2011),  (PDF), Learning and Intelligent Optimization, Lecture Notes in Computer Science, 6683: 507–523,  , :,  
  7. ^    Bergstra, James; Bardenet, Remi; Bengio, Yoshua; Kegl, Balazs (2011),  (PDF), Advances in Neural Information Processing Systems
  8.  Snoek, Jasper; Larochelle, Hugo; Adams, Ryan (2012).  (PDF). Advances in Neural Information Processing Systems. :. :.
  9.  Thornton, Chris; Hutter, Frank; Hoos, Holger; Leyton-Brown, Kevin (2013).  (PDF). Knowledge Discovery and Data Mining. :. :.
  10.  Larsen, Jan; Hansen, Lars Kai; Svarer, Claus; Ohlsson, M (1996).  (PDF). Proceedings of the 1996 IEEE Signal Processing Society Workshop: 62–71.  . :.  .
  11.  Olivier Chapelle; Vladimir Vapnik; Olivier Bousquet; Sayan Mukherjee (2002).  (PDF). Machine Learning46: 131–159. :.
  12.  Chuong B; Chuan-Sheng Foo; Andrew Y Ng (2008).  (PDF). Advances in Neural Information Processing Systems 20.
  13.  Domke, Justin (2012). (PDF). Aistats22.
  14.  Maclaurin, Douglas; Duvenaud, David; Adams, Ryan P. (2015). "Gradient-based Hyperparameter Optimization through Reversible Learning". :[].
  15.  Franceschi, Luca; Donini, Michele; Frasconi, Paolo; Pontil, Massimiliano (2017).  (PDF). Proceedings of the 34th International Conference on Machine Learning. :. :.
  16.  Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B (2017). "Evolving Deep Neural Networks". : [].
  17.  Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K (2017). "Population Based Training of Neural Networks". : [].
  18.  Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017). "Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning". : [].
  19.  Li, Ang; Spyra, Ola; Perel, Sagi; Dalibard, Valentin; Jaderberg, Max; Gu, Chenjie; Budden, David; Harley, Tim; Gupta, Pramod (2019-02-05). "A Generalized Framework for Population Based Training". : [].
  20. ^   Diaz, Gonzalo; Fokoue, Achille; Nannicini, Giacomo; Samulowitz, Horst (2017). "An effective algorithm for hyperparameter optimization of neural networks". : [].
  21. ^   Hazan, Elad; Klivans, Adam; Yuan, Yang (2017). "Hyperparameter Optimization: A Spectral Approach". : [].
  22.  Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015). . Advances in Neural Information Processing Systems 28 (NIPS 2015): 2962–2970.
  23.  . 2019.
  24.  Baptista, Ricardo; Poloczek, Matthias (2018). "Bayesian Optimization of Combinatorial Structures". : [].
  25.  Falkner, Stefan; Klein, Aaron; Hutter, Frank (2018). "BOHB: Robust and Efficient Hyperparameter Optimization at Scale". : [].
  26.  . scikit-optimize.github.io.
  27.  Hutter F, Hoos HH, Leyton-Brown K.  (PDF). Proceedings of the Conference on Learning and Intelligent OptimizatioN (LION 5).
  28. ^   . GitHub.
  29. ^   . December 20, 2018.
  30.  . February 25, 2019 – via GitHub.
  31.  King

 

转载地址:http://asfti.baihongyu.com/

你可能感兴趣的文章
CORBA links
查看>>
读后感:>
查看>>
ideas about sharing software
查看>>
different aspects for software
查看>>
To do list
查看>>
Study of Source code
查看>>
如何使用BBC英语学习频道
查看>>
spring事务探索
查看>>
浅谈Spring声明式事务管理ThreadLocal和JDKProxy
查看>>
初识xsd
查看>>
java 设计模式-职责型模式
查看>>
构造型模式
查看>>
svn out of date 无法更新到最新版本
查看>>
java杂记
查看>>
RunTime.getRuntime().exec()
查看>>
Oracle 分组排序函数
查看>>
删除weblogic 域
查看>>
VMware Workstation 14中文破解版下载(附密钥)(笔记)
查看>>
日志框架学习
查看>>
日志框架学习2
查看>>