Most Important Publications

  1. BURNHAM, K.P. and D.R. ANDERSON, 2003. Model Selection and Multimodel Inference. Technometrics. [Cited by 684] (265.62/year)
  2. BOOK (REVIEW)
  3. BURNHAM, K.P., D.R. ANDERSON and D.A. ANDERSON, 1998. Model selection and inference: a practical information-theoretic approach. New York: Springer. [Cited by 1138] (150.23/year)
  4. BOOK
  5. KOHAVI, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI. [Cited by 477] (45.11/year)
  6. We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on articial data and theoretical results in restricted settings have shown that for selecting a good classier from a set of classifiers (model selection) ten-fold cross-validation may be better than the more expensive leave-one-out cross-validation. We report on a large-scale experiment--over half a million runs of C4.5 and a Naive-Bayes algorithm--to estimate the effects of different parameters on these algorithms on real-world datasets. For cross-validation, we vary the number of folds and whether the folds are stratified or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours the best method to use for model selection is ten-fold stratified cross validation, even if computation power allows using more folds.
  7. RAFTERY, A.E. and P.V. MARSDEN, 1995. Bayesian model selection in social research. Sociological Methodology. [Cited by 353] (33.38/year)
  8. It is argued that P-values and the tests based upon them give unsatisfactory results, especially in large samples. It is shown that, in regression, when there are many candidate independent variables, standard variable selection procedures can give very misleading results. Also, by selecting a single model, they ignore model uncertainty and so underestimate the uncertainty about quantities of interest. The Bayesian approach to hypothesis testing, model selection, and accounting for model uncertainty is presented. Implementing this is straightforward through the use of the simple and accurate BIC approximation, and it can be done using the output from standard software. Specific results are presented for most of the types of model commonly used in sociology. It is shown that this approach overcomes the difficulties with P-values and standard model selection procedures based on them. It also allows easy comparison of nonnested models, and permits the quantification of the evidence for a null hypothesis of interest, such as a convergence theory or a hypothesis about societal norms.
  9. HANSEN, M.H. and B. YU, 2001. Model Selection and the Principle of Minimum Description Length.. Journal of the American Statistical Association. [Cited by 142] (31.04/year)
  10. This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed attention within the statistics community. Here we review both the practical and the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines we find many interesting interpretations of popular frequentist and Bayesian procedures. As we show, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis. Because model selection in linear regression is an extremely common problem that arises in many applications, we present detailed derivations of several MDL criteria in this context and discuss their properties through a number of examples. Our emphasis is on the practical application of MDL, and hence we make extensive use of real datasets. In writing this review, we tried to make the descriptive philosophy of MDL natural to a statistics audience by examining classical problems in model selection. In the engineering literature, however, MDL is being applied to ever more exotic modeling situations. As a principle for statistical modeling in general, one strength of MDL is that it can be intuitively extended to provide useful tools for new problems.
  11. JOHNSON, J.B. and K.S. OMLAND, 2004. Model selection in ecology and evolution. Trends in Ecology & Evolution. [Cited by 48] (30.47/year)
  12. Recently, researchers in several areas of ecology and evolution have begun to change the way in which they analyze data and make biological inferences. Rather than the traditional null hypothesis testing approach, they have adopted an approach called model selection, in which several competing hypotheses are simultaneously confronted with data. Model selection can be used to identify a single best model, thus lending support to one particular hypothesis, or it can be used to make inferences based on weighted support from a complete set of competing models. Model selection is widely accepted and well developed in certain fields, most notably in molecular systematics and mark-recapture analysis. However, it is now gaining support in several other areas, from molecular evolution to landscape ecology. Here, we outline the steps of model selection and highlight several ways that it is now being implemented. By adopting this approach, researchers in ecology and evolution will find a valuable alternative to traditional null hypothesis testing, especially when more than one hypothesis is plausible.
  13. WAX, M. and T. KAILATH, 1985. Detection of signals by information theoretic criteria. IEEE Transactions on Acoustics, Speech, and Signal …. [Cited by 532] (25.86/year)
  14. A new approach is presented to the problem of detecting the number of signals in a multichannel time-series, based on the application of the information theoretic criteria for model selection introduced by Akaike (AIC) and by Schwartz and Rissanen (MDL). Unlike the conventional hypothesis testing based approach, the new approach does not require any subjective threshold settings; the number of signals is obtained merely by minimizing the AIC or the MDL criteria. Simulation results that illustrate the performance of the new method for the detection of the number of signals received by a sensor array are presented.
  15. VUONG, Q.H., 1989. Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica. [Cited by 422] (25.46/year)
  16. In this paper, we develop a classical approach to model selection. Using the Kullback-Leibler Information Criterion to measure the closeness of a model to the truth, we propose simple likelihood-ratio based statistics for testing the null hypothesis that the competing models are equally close to the true data generating process against the alternative hypothesis that one model is closer. The tests are directional and are derived successively for the cases where the competing models are non-nested, overlapping, or nested and whether both, one, or neither is misspecified. As a prerequisite, we fully characterize the asymptotic distribution of the likelihood ratio statistic under the most general conditions. We show that it is a weighted sum of chi-square distribution or a normal distribution depending on whether the distributions in the competing models closest to the truth are observationally identical. We also propose a test of this latter condition.
  17. HAUSSLER, D., 1992. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications. Information and Computation. [Cited by 337] (24.82/year)
  18. We describe a generalization of the PAC learning model that is based on statistical decision theory. In this model the learner receives randomly drawn examples, each example consisting of an instance x X and an outcome y Y, and tries to find a decision rule h: X ? A, where h , that specifies the appropriate action a A to take for each instance x in order to minimize the expectation of a loss l(y, a). Here X, Y, and A are arbitrary sets, l is a real-valued function, and examples are generated according to an arbitrary joint distribution on X × Y. Special cases include the problem of learning a function from X into Y, the problem of learning the conditional probability distribution on Y given X (regression), and the problem of learning a distribution on X (density estimation). We give theorems on the uniform convergence of empirical loss estimates to true expected loss rates for certain decision rule spaces , and show how this implies learnability with bounded sample size, disregarding computational complexity. As an application, we give distribution-independent upper bounds on the sample size needed for learning with feedforward neural networks. Our theorems use a generalized notion of VC dimension that applies to classes of real-valued functions, adapted from Vapnik and Pollard's work, and a notion of capacity and metric dimension for classes of functions that map into a bounded metric space.
  19. MADIGAN, D. and A.E. RAFTERY, 1994. Model Selection and Accounting for Model Uncertainly in Graphical Models Using Occam's Window.. Journal of the American Statistical Association. [Cited by 279] (24.10/year)
  20. We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism that averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximizing predictive ability. But this has not been used in practice, because computing the posterior model probabilities is hard and the number of models is very large (often greater than 10 11 ). We argue that the standard Bayesian formalism is unsatisfactory and propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty by averaging over a much smaller set of models. An efficient search algorithm is developed for finding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable log-linear models. For each of these, we develop efficient ways of computing exact Bayes factors and hence posterior model probabilities. For the decomposable log-linear models, this is based on properties of chordal graphs and hyper-Markov prior distributions and the resultant calculations can be carried out locally. The end product is an overall strategy for model selection and accounting for model uncertainty that searches efficiently through the very large classes of models involved. Three examples are given. The first two concern data sets that have been analyzed by several authors in the context of model selection. The third addresses a urological diagnostic problem. In each example, our model averaging approach provides better out-of-sample predictive performance than any single model that might reasonably have been selected.
  21. BURNHAM, K.P. and D.R. ANDERSON, 1998. Model selection and inference: a practical information-theoretic approach: Springer-Verlag. New York. [Cited by 164] (21.65/year)
  22. BOOK
  23. HURVICH, C.M. and C.L. TSAI, 1989. Regression and Time Series Model Selection in Small Samples. Biometrika. [Cited by 350] (21.12/year)
  24. A bias correction to the Akaike information criterion, AIC, is derived for regression and autoregressive time series models. The correction is of particular use when the sample size is small, or when the number of fitted parameters is a moderate to large fraction of the sample size. The corrected method, called AICC, is asymptotically efficient if the true model is infinite dimensional. Furthermore, when the true model is of finite dimension, AICC is found to provide better model order choices than any other asymptotically efficient method. Applications to nonstationary autoregressive and mixed autoregressive moving average time series models are also discussed.
  25. BERGER, J.O. and L.R. PERICCHI, 1996. The Intrinsic Bayes Factor for Model Selection and Prediction.. Journal of the American Statistical Association. [Cited by 201] (20.99/year)
  26. In the Bayesian approach to model selection or hypothesis testing with models or hypotheses of differing dimensions, it is typically not possible to utilize standard noninformative (or default) prior distributions. This has led Bayesians to use conventional proper prior distributions or crude approximations to Bayes factors. In this article we introduce a new criterion called the intrinsic Bayes factor, which is fully automatic in the sense of requiring only standard noninformative priors for its computation and yet seems to correspond to very reasonable actual Bayes factors. The criterion can be used for nested or nonnested models and for multiple model comparison and prediction. From another perspective, the development suggests a general definition of a "reference prior" for model comparison.
  27. BARTLETT, P.L., S. BOUCHERON and G. LUGOSI, 2002. Model selection and error estimation. Machine Learning. [Cited by 66] (18.46/year)
  28. We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
  29. BARRON, A., L. BIRGé and P. MASSART, 1995. Risk bounds for model selection via penalization. [Cited by 189] (17.87/year)
  30. Abstract Performance bounds for criteria for model selection are developed using recent theory for sieves. The model selection criteria are based on an empirical loss or contrast function with an added penalty term motivated by empirical process theory and roughly proportional to the number of parameters needed to describe the model divided by the number of observations. Most of our examples involve density or regression estimation settings and we focus on the problem of estimating the unknown density or regression function. We show that the quadratic risk of the minimum penalized empirical contrast estimator is bounded by an index of the accuracy of the sieve. This accuracy index quantifies the trade-off among the candidate models between the approximation error and parameter dimension relative to sample size. If we choose a list of models which exhibit good approximation properties with respect to different classes of smoothness, the estimator can be simultaneously minimax rate optimal in each of those classes. This is what is usually called adaptation. The type of classes of smoothness in which one gets adaptation depends heavily on the list of models. If too many models are involved in order to get accurate approximation of many wide classes of functions simultaneously, it may happen that the estimator is only approximately adaptive (typically up to a slowly varying function of the sample size). We shall provide various illustrations of our method such as penalized maximum likelihood, projection or least squares estimation. The models will involve commonly used finite dimensional expansions such as piecewise polynomials with fixed or variable knots, trigonometric polynomials, wavelets, neural nets and related nonlinear expansions defined by superposition of ridge functions.
  31. BOZDOGAN, H. and S.L. SCLOVE, 1987. Model selection and Akaike's information criterion (AIC): The general theory and its analytical …. Psychometrika. [Cited by 322] (17.34/year)
  32. This paper studies the general theory of Akaike's Information Criterion (AIC) and provides two analytical extensions. The extensions make AIC asymptotically consistent and penalize overparameterization more stringently to pick only the simplest of the two models. The criteria are applied in two Monte Carlo experiments.
  33. BREIMAN, L., 1996. Heuristics of Instability and Stabilization in Model Selection. The Annals of Statistics. [Cited by 163] (17.02/year)
  34. In model selection, usually a "best" predictor is chosen from a collection ${\hat{\mu}(\cdot, s)}$ of predictors where $\hat{\mu}(\cdot, s)$ is the minimum least-squares predictor in a collection $\mathsf{U}_s$ of predictors. Here s is a complexity parameter; that is, the smaller s, the lower dimensional/smoother the models in $\mathsf{U}_s$. If $\mathsf{L}$ is the data used to derive the sequence ${\hat{\mu}(\cdot, s)}$, the procedure is called unstable if a small change in $\mathsf{L}$ can cause large changes in ${\hat{\mu}(\cdot, s)}$. With a crystal ball, one could pick the predictor in ${\hat{\mu}(\cdot, s)}$ having minimum prediction error. Without prescience, one uses test sets, cross-validation and so forth. The difference in prediction error between the crystal ball selection and the statistician's choice we call predictive loss. For an unstable procedure the predictive loss is large. This is shown by some analytics in a simple case and by simulation results in a more complex comparison of four different linear regression methods. Unstable procedures can be stabilized by perturbing the data, getting a new predictor sequence ${\hat{\mu'}(\cdot, s)}$ and then averaging over many such predictor sequences.
  35. HALL, A., 1994. Testing for a Unit Root in Time Series with Pretest Data-Based Model Selection. Journal of Business & Economic Statistics. [Cited by 163] (14.08/year)
  36. The author examines the impact of data based lag length estimation on the behavior of the augmented Dickey-Fuller (ADF) test for a unit root. He derives conditions under which the augmented Dickey-Fuller test converges to the Dickey-Fuller distribution and verify that these conditions are satisfied by many popular lag selection strategies. Simulation evidence indicates that the performance of the augmented Dickey-Fuller test is considerably improved when the lag length is selected from the data. An application to inventory series illustrates that inference about a unit root can be very sensitive to the method of lag-length selection.
  37. KROLZIG, H.M. and D.F. HENDRY, 2001. Computer automation of general-to-specific model selection procedures. Journal of Economic Dynamics and Control. [Cited by 62] (13.55/year)
  38. Disputes about econometricmethodology partly reflect a lack of evidence on alternative approaches. We reconsider econometric model selection from a computer-automation perspective, focusing on general-to-specific reductions, embodied in PcGets. Starting from a general congruent model, standard testing procedures eliminate statistically-insignificant variables, with diagnostic tests checking the validity of reductions, ensuring a congruent final selection. Since jointly selecting and diagnostic testing has eluded theoretical analysis, we study modelling strategies by simulation. The Monte Carlo experiments show that PcGets recovers the DGP specification from a general model with size and power close to commencing from the DGP itself.
  39. ANDERSON, D.R., K.P. BURNHAM and G.C. WHITE, 1994. AIC model selection in overdispersed capture-recapture data. Ecology. [Cited by 153] (13.22/year)
  40. Selection of a proper model as a basis for statistical inference from capture—recapture data is critical. This is especially so when using open models in the analysis of multiple, interrelated data sets (e.g., males and females, with 2—3 age classes, over 3—5 areas and 10—15 yr.) The most general model considered for such data sets might contain 1000 survival and recapture parameters. This paper presents numerical results on three information—theoretic methods for model selection when the data are overdispersed (i.e., a lack of independence so that extra—binomial variation occurs). Akaike's information criterion (AIC), a second—order adjustment to AIC for bias (AICc), and a dimension—consistent criterion (CAIC) were modified using an empirical estimate of the average overdispersion, based on quasi—likelihood theory. Quality of model selection was evaluated based on the Euclidian distance between standardized ° and ° (parameter ° is vector valued); this quantity (a type of residual sum of squares, hence donated as RSS) is a combination of squared bias and variance. Five results seem to be of general interest for these product—multinomial models. First, when there was overdispersion the most direct estimator of the variance inflation factor was positively biased and the relative bias increased with the amount of overdispersion. Second, AIC and AICc, unadjusted for overdispersion using quasi—likelihood theory, performed poorly in selecting a model with a small RSS value when the data were overdispersed (i.e., overfitted models were selected when compared to the model with the minimum RRS value). Third, the information—theoretic criteria, adjusted for overdispersion, performed well, selected parismonious models, and had a good balance between under— and overfitting the data. Fourth, generally, the dimension—consistent criterion selected models with fewer parameters than the other criteria, had smaller RSS values, but clearly was in error by underfitting when compared with the model with the minimum RSS value. Fifth, even if the true model structure (but not the actual parameter values in the model) is known, that true model, when fitted to the data (by parameter estimation) is a relatively poor basis for statistical inference when that true model includes several, let alone many, estimated parameters that are not significantly different from O.
  41. CHAPELLE, O. and V. VAPNIK, 1999. Model Selection for Support Vector Machines. NIPS. [Cited by 84] (12.78/year)
  42. New functionals for parameter (model) selection of Support Vector Machines are introduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, one can both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.
  43. BIRGE, L. and P. MASSART, 2001. Gaussian model selection. J. Eur. Math. Soc. [Cited by 58] (12.68/year)
  44. Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and importance of model selection come from the fact that it provides a suitable approach to many different types of problems, starting from model selection per se (among a family of parametric models, which one is more suitable for the data at hand), which includes for instance variable selection in regression models, to nonparametric estimation, for which it provides a very powerful tool that allows adaptation under quite general circumstances. Our approach to model selection also provides a natural connection between the parametric and nonparametric points of view and copes naturally with the fact that a model is not necessarily true. The method is based on the penalization of a least squares criterion which can be viewed as a generalization of Mallows' Cp. A large part of our efforts will be put on choosing properly the list of models and the penalty function for various estimation problems like classical variable selection or adaptive estimation for various types of lp-bodies.
  45. MCQUARRIE, A.D.R. and C.L. TSAI, 1998. Regression and Time Series Model Selection. books.google.com. [Cited by 89] (11.75/year)
  46. BOOK
  47. HENDRY, D.F. and H.M. KROLZIG, 2001. Automatic Econometric Model Selection. Timberlake Consultants Ltd. [Cited by 53] (11.58/year)
  48. Concerns software
  49. WASSERMAN, L., 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology. [Cited by 62] (11.12/year)
  50. This paper reviews the Bayesian approach to model selection and model averaging. In this review, I emphasize objective Bayesian methods based on noninformative priors. I will also discuss implementation details, approximations, and relationships to other methods.
  51. LINHART, H. and W. ZUCCHINI, 1986. Model selection. New York: Wiley. [Cited by 214] (10.93/year)
  52. BOOK
  53. RAFTERY, A.E., 1995. Bayesian model selection in social research (with discussion by Andrew Gelman & Donald B. Rubin, and …. Sociological Methodology. [Cited by 114] (10.78/year)
  54. It is argued that P??values and the tests based upon them give unsatisfactory results especially in large samples It is shown that in regression when there are many candidate independent variables standard variable selection procedures can give very misleading results Also by selecting a single model they ignore model uncertainty and so underestimate the uncertainty about quantities of interest The Bayesian approach to hypothesis testing model selection and accounting for model uncertainty is presented Implementing this is straightforward using the simple and accurate BIC approximation and can be done using the output from standard software Specic results are presented for most of the types of model commonly used in sociology It is shown that this approach overcomes the diculties with P values and standard model selection procedures based on them It also allows easy comparison of non??nested models and permits the quantication of the evidence for a null hypothesis of interest such as a convergence theory or a hypothesis about societal norms
  55. CREMERS, K.J.M., 2002. Stock Return Predictability: A Bayesian Model Selection Perspective. Review of Financial Studies. [Cited by 36] (10.07/year)
  56. Attempts to characterize stock return predictability have generated a plethora of papers documenting the ability of various variables to explain conditional expected returns. However, there is little consensus on what the important conditioning variables are, giving rise to a great deal of model uncertainty and data snooping fears. In this paper, we introduce a new methodology that explicitly takes the model uncertainty into account by comparing all possible models simultaneously and in which the priors are calibrated to reflect economically meaningful prior information. Therefore, our approach minimizes data snooping given the information set and the priors. We compare the prior views of a skeptic and a confident investor. The data imply posterior probabilities that are in general more supportive of stock return predictability than the priors for both types of investors. Furthermore, the stalwarts such as dividends and past returns do not perform well. The out-of-sample results for the Bayesian average models show improved forecasts relative to the classical statistical model selection methods, are consistent with the in-sample results and show some, albeit small, evidence of predictability.
  57. KEARNS, M., et al., 1997. An Experimental and Theoretical Comparison of Model Selection Methods. Machine Learning. [Cited by 84] (9.80/year)
  58. We investigate the problem of model selection in the setting of supervised learning of boolean functions from independent random examples. More precisely, we compare methods for finding a balance between the complexity of the hypothesis chosen and its observed error on a random training sample of limited size, when the goal is that of minimizing the resulting generalization error. We undertake a detailed comparison of three well-known model selection methods — a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV). We introduce a general class of model selection methods (called penalty-based methods) that includes both GRM and MDL, and provide general methods for analyzing such rules. We provide both controlled experimental evidence and formal theorems to support the following conclusions: *Even on simple model selection problems, the behavior of the methods examined can be both complex and incomparable. Furthermore, no amount of tuning of the rules investigated (such as introducing constant multipliers on the complexity penalty terms, or a distribution-specific effective dimension) can eliminate this incomparability. *It is possible to give rather general bounds on the generalization error, as a function of sample size, for penalty-based methods. The quality of such bounds depends in a precise way on the extent to which the method considered automatically limits the complexity of the hypothesis selected. *For any model selection problem, the additional error of cross validation compared to any other method can be bounded above by the sum of two terms. The first term is large only if the learning curve of the underlying function classes experiences a phase transition between (1-)m and m examples (where gamma is the fraction saved for testing in CV). The second and competing term can be made arbitrarily small by increasing . *The class of penalty-based methods is fundamentally handicapped in the sense that there exist two types of model selection problems for which every penalty-based method must incur large generalization error on at least one, while CV enjoys small generalization error on both.
  59. ANDERS, U. and O. KORN, 1999. Model selection in neural networks. Neural Networks. [Cited by 64] (9.73/year)
  60. In this article, we examine how model selection in neural networks can be guided by statistical procedures such as hypothesis tests, information criteria and cross validation. The application of these methods in neural network models is discussed, paying attention especially to the identification problems encountered. We then propose five specification strategies based on different statistical procedures and compare them in a simulation study. As the results of the study are promising, it is suggested that a statistical analysis should become an integral part of neural network modeling.
  61. KANATANI, K., 2001. Motion Segmentation by Subspace Separation and Model Selection. ICCV. [Cited by 44] (9.62/year)
  62. Reformulating the Costeira-Kanade algorithm as a pure mathematical theorem independent of the Tomasi-Kanade factorization, we present a robust segmentation algorithm by incorporating such techniques as dimension correction, model selection using the geometric AIC, and least-median fitting. Doing numerical simulations, we demonstrate that our algorithm dramatically outperforms existing methods. It does not involve any parameters which need to be adjusted empirically.
  63. SWANSON, N.R. and H. WHITE, 1997. A Model Selection Approach to Real-Time Macroeconomic Forecasting Using Linear Models and Artificial …. The Review of Economics and Statistics. [Cited by 80] (9.33/year)
  64. We take a model selection approach to the question of whether a class of adaptive prediction models ("artificial neural networks") are useful for predicting future values of 9 macroeconomic variables. We use a variety of out-of-sample forecast-based model selection criteria including forecast error measures and forecast direction accuracy. In order to compare our predictions to professionally available survey predictions, we implement an ex ante (or real-time) forecasting procedure. One dimension of this approach is that we construct a real-time economic data set which has the characteristic that data available at time t do not contain any information which has been allowed to "leak" in from future time periods, as often happens with fully revised macroeconomic data. We also investigate the issue of appropriate window sizes for rolling-window-based prediction methods. Results indicate that adaptive models often outperform a variety of nonadaptive models, as well as professional forecasters, when used to predict levels as well as the direction of change in various macroeconomic variables. Further, model selection based on an in-sample Schwarz Information Criterion (SIC) does not appear to be a reliable guide to out-of-sample performance, in the case of the variables considered here. Thus, the in-sample SIC apparently fails to offer a convenient shortcut to true out-of-sample performance measures.
  65. ANDRIEU, C. and A. DOUCET, 1999. Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC. IEEE Transactions on Signal Processing. [Cited by 61] (9.28/year)
  66. In this paper, the problem of joint Bayesian model selection and parameter estimation for sinusoids in white Gaussian noise is addressed. An original Bayesian model is proposed that allows us to define a posterior distribution on the parameter space. All Bayesian inference is then based on this distribution. Unfortunately, a direct evaluation of this distribution and of its features, including posterior model probabilities, requires evaluation of some complicated high-dimensional integrals. We develop an efficient stochastic algorithm based on reversible jump Markov chain Monte Carlo methods to perform the Bayesian computation. A convergence result for this algorithm is established. In simulation, it appears that the performance of detection based on posterior model probabilities outperforms conventional detection schemes.
  67. TORR, P.H.S., 1998. Geometric motion segmentation and model selection. … Transactions Mathematical Physical and Engineering Sciences. [Cited by 70] (9.24/year)
  68. Motion segmentation involves clustering features together that belong to independently moving objects. The image features on each of these objects conform to one of several putative motion models, but the number and type of motion is unknown a priori. In order to cluster these features, the problems of model selection, robust estimation and clustering must all be addressed simultaneously. Within this paper we place the three problems into a common statistical framework; investigating the use of information criteria and robust mixture models as a principled way for motion segmentation of images. The final result is a general fully automatic algorithm for clustering that works in the presence of noise and outliers.
  69. SPEED, K.W.B.T.P., 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses. Journal Of The Royal Statistical Society Series B. [Cited by 33] (9.23/year)
  70. YE, J., 1998. On Measuring and Correcting the Effects of Data Mining and Model Selection.. Journal of the American Statistical Association. [Cited by 69] (9.11/year)
  71. In the theory of linear models, the concept of degrees of freedom plays an important role. This concept is often used for measurement of model complexity, for obtaining an unbiased estimate of the error variance, and for comparison of different models. I have developed a concept of generalized degrees of freedom (GDF) that is applicable to complex modeling procedures. The definition is based on the sum of the sensitivity of each fitted value to perturbation in the corresponding observed value. The concept is nonasymptotic in nature and does not require analytic knowledge of the modeling procedures. The concept of GDF offers a unified framework under which complex and highly irregular modeling procedures can be analyzed in the same way as classical linear models. By using this framework, many difficult problems can be solved easily. For example, one can now measure the number of observations used in a variable selection process. Different modeling procedures, such as a tree-based regression and a projection pursuit regression, can be compared on the basis of their residual sums of squares and the GDF that they cost. I apply the proposed framework to measure the effect of variable selection in linear models, leading to corrections of selection bias in various goodness-of-fit statistics. The theory also has interesting implications for the effect of general model searching by a human modeler.
  72. COETZEE, J.F., et al., 1995. Pharmacokinetic model selection for target controlled infusions of propofol: assessment of three …. Anesthesiology. [Cited by 90] (8.51/year)
  73. RAFTERY, A.E., et al., 1996. Hypothesis testing and model selection. Markov Chain Monte Carlo in Practice. [Cited by 81] (8.46/year)
  74. BOOK
  75. BIRGé, L. and P. MASSART, 1995. From model selection to adaptive estimation. [Cited by 89] (8.42/year)
  76. GEIGER, D., et al., 2001. Stratified Exponential Families: Graphical Models and Model Selection. The Annals of Statistics. [Cited by 38] (8.30/year)
  77. We describe a hierarchy of exponential families which is useful for distinguishing types of graphical models. Undirected graphical models with no hidden variables are linear exponential families (LEFs). Directed acyclic graphical (DAG) models and chain graphs with no hidden variables, includ­ ing DAG models with several families of local distributions, are curved exponential families (CEFs). Graphical models with hidden variables are what we term stratified exponential families (SEFs). A SEF is a finite union of CEFs of various dimensions satisfying some regularity conditions. We also show that this hierarchy of exponential families is noncollapsing with respect to graphical models by providing a graphical model which is a CEF but not a LEF and a graphical model that is a SEF but not a CEF. Finally, we show how to compute the dimension of a stratified exponential family. These results are discussed in the context of model selection of graphical models.
  78. FINNOFF, W., F. HERGERT and H.G. ZIMMERMANN, 1993. Improving model selection by nonconvergent methods.. Neural Networks. [Cited by 104] (8.27/year)
  79. Many techniques for model selection in the field of neural networks correspond to well established statistical methods. For example, architecture modifications based on test variables calculated after convergence of the training process can be viewed as part of a hypothesis testing search, and the use of complexity penalty terms is essentially a type of regularization or biased regression. The method of “stopped” or “cross-validation” training, on the other hand, in which an oversized network is trained until the error on a further validation set of examples deteriorates, then training is stopped, is a true innovation since model selection doesn't require convergence of the training process. Here, the training process is used to perform a directed search of the parameter space for a model which doesn't overfit the data and thus demonstrates superior generalization performance. In this paper we show that this performance can be significantly enhanced by expanding the “nonconvergent method” of stopped training to include dynamic topology modifications (dynamic weight pruning) and modified complexity penalty term methods in which the weighting of the penalty term is adjusted during the training process. On an extensive sequence of simulation examples we demonstrate the general superiority of the “extended” nonconvergent methods compared to classical penalty term methods, simple stopped training, and methods which only vary the number of hidden units.
  80. MYUNG, I.J., 2000. The importance of complexity in model selection. Journal of Mathematical Psychology. [Cited by 46] (8.25/year)
  81. Model selection should be based not solely on goodness-of-fit, but must also consider model complexity. While the goal of mathematical modeling in cognitive psychology is to select one model from a set of competing models that best captures the underlying mental process, choosing the model that best fits a particular set of data will not achieve this goal. This is because a highly complex model can provide a good fit without necessarily bearing any interpretable relationship with the underlying process. It is shown that model selection based solely on the fit to observed data will result in the choice of an unnecessarily complex model that overfits the data, and thus generalizes poorly. The effect of over-fitting must be properly offset by model selection methods. An application example of selection methods using artificial data is also presented.
  82. SMYTH, P., 2000. Model selection for probabilistic clustering using cross-validated likelihood. Statistics and Computing. [Cited by 46] (8.25/year)
  83. Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modeling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to model selection is straightforward in the sense that models are judged directly on their estimated out-of-sample predictive performance. The cross-validation approach, as well as penalized likelihood and McLachlan's bootstrap method, are applied to two data sets and the results from all three methods are in close agreement. The second data set involves a well-known clustering problem from the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Cross-validated likelihood provides an interpretable and objective solution to the atmospheric clustering problem. The clusters found are in agreement with prior analyses of the same data based on non-probabilistic clustering techniques.
  84. CHIPMAN, H., E.I. GEORGE and R.E. MCCULLOCH, 2001. The practical implementation of Bayesian model selection. Model Selection. [Cited by 37] (8.09/year)
  85. In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is relevant for model selection. However, the practical implementation of this approach often requires carefully tailored priors and novel posterior calculation methods. In this article, we illustrate some of the fundamental practical issues that arise for two different model selection problems:the variable selection problem for the linear model and the CART model selection problem.
  86. SUGIYAMA, M. and H. OGAWA, 2001. Subspace Information Criterion for Model Selection. Neural Computation. [Cited by 37] (8.09/year)
  87. The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this article, we propose a new criterion for model selection, the subspace information criterion (SIC), which is a generalization of Mallows's CL. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least-mean-squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
  88. RAJA, Y., S. MCKENNA and S. GONG, 1998. Colour model selection and adaptation in dynamic scenes. Proc. of European Conf. on Computer Vision. [Cited by 60] (7.92/year)
  89. We use colour mixture models for real-time colour-based object localisation, tracking and segmentation in dynamic scenes. Within such a framework, we address the issues of model order selection, modelling scene background and model adaptation in time. Experimental results are given to demonstrate our approach in different scale and lighting conditions.
  90. MYUNG, I.J., V. BALASUBRAMANIAN and M.A. PITT, 2000. Counting probability distributions: Differential geometry and model selection. PNAS. [Cited by 44] (7.89/year)
  91. A central problem in science is deciding among competing explanations of data containing random errors. We argue that assessing the "complexity" of explanations is essential to a theoretically well-founded model selection procedure. We formulate model complexity in terms of the geometry of the space of probability distributions. Geometric complexity provides a clear intuitive understanding of several extant notions of model complexity. This approach allows us to reconceptualize the model selection problem as one of counting explanations that lie close to the "truth." We demonstrate the usefulness of the approach by applying it to the recovery of models in psychophysics.
  92. BARAUD, Y., 2002. Model selection for regression on a random design. ESAIM P & S. [Cited by 27] (7.55/year)
  93. We consider the problem of estimating an unknown regression function when the design is random with values in . Our estimation procedure is based on model selection and does not rely on any prior information on the target function. We start with a collection of linear functional spaces and build, on a data selected space among this collection, the least-squares estimator. We study the performance of an estimator which is obtained by modifying this least-squares estimator on a set of small probability. For the so-defined estimator, we establish nonasymptotic risk bounds that can be related to oracle inequalities. As a consequence of these, we show that our estimator possesses adaptive properties in the minimax sense over large families of Besov balls with R>0, and where is a positive number satisfying . We also study the particular case where the regression function is additive and then obtain an additive estimator which converges at the same rate as it does when k=1.
  94. BURNHAM, K.P., G.C. WHITE and D.R. ANDERSON, 1995. Model Selection Strategy in the Analysis of Capture-Recapture Data. Biometrics. [Cited by 78] (7.38/year)
  95. FORSTER, M.R., 2000. Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology. [Cited by 41] (7.35/year)
  96. What is model selection? What are the goals of model selection? What are the methods of model selection, and how do they work? Which methods perform better than others, and in what circumstances? These questions rest on a number of key concepts in a relatively underdeveloped field. The aim of this essay is to explain some background concepts, highlight some of the results in this special issue, and to add my own. The standard methods of model selection include classical hypothesis testing, maximum likelihood, Bayes method, minimum description length, cross-validation and Akaike?s information criterion. They all provide an implementation of Occam?s razor, in which parsimony or simplicity is balanced against goodness-of-fit. These methods primarily take account of the sampling errors in parameter estimation, although their relative success at this task depends on the circumstances. However, the aim of model selection should also include the ability of a model to generalize to predictions in a different domain. Errors of extrapolation, or generalization, are different from errors of parameter estimation. So, it seems that simplicity and parsimony may be an additional factor in managing these errors, in which case the standard methods of model selection are incomplete implementations of Occam?s razor.
  97. BERGER, J.O. and L.R. PERICCHI, 2001. Objective Bayesian Methods for Model Selection: Introduction and Comparison. Model Selection. [Cited by 33] (7.21/year)
  98. The basics of the Bayesian approach to model selection are first presented, as well as the motivations for the Bayesian approach. We then review four methods of developing default Bayesian procedures that have undergone considerable recent development, the Conventional Prior approach, the Bayes Information Criterion, the Intrinsic Bayes Factor, and the Fractional Bayes Factor. As part of the review, these methods are illustrated on examples involving the normal linear model. The later part of the chapter focuses on comparison of the four approaches, and includes an extensive discussion of criteria for judging model selection procedures. As the chapter is lengthy, we include here an index to the sections.
  99. SWANSON, N.R. and H. WHITE, 1995. A Model-Selection Approach to Assessing the Information in the Term Structure Using Linear Models …. Journal of Business & Economic Statistics. [Cited by 76] (7.19/year)
  100. We take a model selection approach to the question of whether forward interest rates are useful in predicting future spot rates, using a variety of out-of-sample forecast-based model selection criteria: forecast mean squared error, forecast direction accuracy, and forecast-based trading system profitability. We also examine the usefulness of a class of novel prediction models called "artificial neural networks," and investigate the issue of appropriate window sizes for rolling-window-based prediction methods. Results indicate that the premium of the forward rate over the spot rate helps to predict the sign of future changes in the interest rate. Further, model selection based on an in-sample Schwarz Information Criterion (SIC) does not appear to be a reliable guide to out-of-sample performance, in the case of short-term interest rates. Thus, the in-sample SIC apparently fails to offer a convenient shortcut to true out-of-sample performance measures.
  101. GRüNWALD, P., 2000. Model selection based on minimum description length. Journal of Mathematical Psychology. [Cited by 39] (6.99/year)
  102. We introduce the minimum description length (MDL) principle, a general principle for inductive inference based on the idea that regularities (laws) underlying data can always be used to compress data. We introduce the fundamental concept of MDL, called the stochastic complexity, and we show how it can be used for model selection. We briefly compare MDL-based model selection to other approaches and we informally explain why we may expect MDL to give good results in practical applications.
  103. BARAUD, Y., 2000. Model selection for regression on a fixed design. Probab. Theory Related Fields. [Cited by 39] (6.99/year)
  104. We deal with the problem of estimating some unknown regression function involved in a regression framework with deterministic design points. For this end, we consider some collection of finite dimensional linear spaces (models) and the least-squares estimator built on a data driven selected model among this collection. This data driven choice is performed via the minimization of some penalized model selection criterion that generalizes on Mallows' Cp. We provide non asymptotic risk bounds for the sodened estimator from which we deduce adaptivity properties. Our results hold under mild moment conditions on the errors. The statement and the use of a new moment inequality for empirical processes is at the heart of the techniques involved in our approach.
  105. ZUCCHINI, W., 2000. An Introduction to Model Selection. Journal of Mathematical Psychology. [Cited by 38] (6.82/year)
  106. This paper is an introduction to model selection intended for nonspecialists who have knowledge of the statistical concepts covered in a typical first (occasionally second) statistics course. The intention is to explain the ideas that generate frequentist methodology for model selection, for example the Akaike information criterion, bootstrap criteria, and cross-validation criteria. Bayesian methods, including the Bayesian information criterion, are also mentioned in the context of the framework outlined in the paper. The ideas are illustrated using an example in which observations are available for the entire population of interest. This enables us to examine and to measure effects that are usually invisible, because in practical applications only a sample from the population is observed. The problem of selection bias, a hazard of which one needs to be aware in the context of model selection, is also discussed.
  107. ANDRIEU, C., N. DE and A. DOUCET, 1999. Sequential MCMC for Bayesian Model Selection. Higher-Order Statistics, 1999. Proceedings of the IEEE …. [Cited by 40] (6.08/year)
  108. CHAO, J.C. and P.C.B. PHILLIPS, 1999. Model selection in partially nonstationary vector autoregressive processes with reduced rank …. Journal of Econometrics. [Cited by 40] (6.08/year)
  109. SEEGER, M., 1999. Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel …. NIPS. [Cited by 40] (6.08/year)
  110. KONISHI, S. and G. KITAGAWA, 1996. Generalised information criteria in model selection. Biometrika. [Cited by 58] (6.06/year)
  111. The problem of evaluating the goodness of statistical models is investigated from an information-theoretic point of view. Information criteria are proposed for evaluating models constructed by various estimation procedures when the specified family of probability distributions does not contain the distribution generating the data. The proposed criteria are applied to the evaluation of models estimated by maximum likelihood, robust, penalised likelihood, Bayes procedures, etc. We also discuss the use of the bootstrap in model evaluation problems and present a variance reduction technique in the bootstrap simulation.
  112. KANATANI, K., 1998. Geometric information criterion for model selection. International Journal of Computer Vision. [Cited by 45] (5.94/year)
  113. In building a 3-D model of the environment from image and sensor data, one must fit to the data an appropriate class of models, which can be regarded as a parametrized manifold, or geometric model, defined in the data space. In this paper, we present a statistical framework for detecting degeneracies of a geometric model by evaluating its predictive capability in terms of the expected residual and derive the geometric AIC. We show that it allows us to detect singularities in a structure-from-motion analysis without introducing any empirically adjustable thresholds. We illustrate our approach by simulation examples. We also discuss the application potential of this theory for a wide range of computer vision and robotics problems.
  114. OLDEN, J.D. and D.A. JACKSON, 2000. Torturing data for the sake of generality: How valid are our regression models?. Ecoscience. [Cited by 32] (5.74/year)
  115. Multiple regression analysis continues to be a quantitative tool used extensively in the ecological literature. Consequently, methods for model selection and validation are important considerations, yet ecologists appear to pay little attention to how the choice of method can potentially influence the outcome and interpretation of their results. In this study we review commonly employed model selection and validation methods and use a Monte Carlo simulation approach to evaluate their ability to accurately estimate variable inclusion in the final regression model and model prediction error. We found that all methods of model selection erroneously excluded or included variables in the final model and the error rate depended on sample size and the number of predictor variables. In general, forward selection, backward elimination and stepwise selection showed better performance with small sample sizes, whereas a modified bootstrap approach outperformed other methods with larger sample sizes. Model selection using all-subsets or exhaustive search was highly biased, at times never selecting the correct predictor variables. Methods for model validation were also highly biased, with resubstitution and data-splitting (i.e., dividing the data into training and test samples) techniques producing biased and variable estimates of model prediction error. In contrast, jackknife validation was generally unbiased. Using an empirical example we show that the interpretation of the ecological relationships between fish species richness and lake habitat is highly dependent on the type of model selection and validation method employed. The fact that model selection is frequently unsuited to determine correct ecological relationships, and that traditional approaches for model validation over-estimate the strength and value of our empirical models, is a major concern.
  116. GRANGER, C.W.J., M.L. KING and H. WHITE, 1995. Comments on testing economic theories and the use of model selection criteria. Journal of Econometrics. [Cited by 56] (5.30/year)
  117. This paper outlines difficulties with testing economic theories, particularly that the theories may be vague, may relate to a decision interval different from the observation period, and may need a metric to convert a complicated testing situation to an easier one. We argue that it is better to use model selection procedures rather than formal hypothesis testing when deciding on model specification. This is because testing favors the null hypothesis, typically uses an arbitrary choice of significance level, and researchers using the same data can end up with different final models.
  118. TIBSHIRANI, R. and K. KNIGHT, 1999. The covariance ination criterion for adaptive model selection. Journal of the Royal Statistical Society, B. [Cited by 33] (5.02/year)
  119. We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when the prediction rule is applied to permuted versions of the data set. This criterion can be applied to general prediction problems (e.g. regression or classification) and to general prediction rules (e.g. stepwise regression, tree-based models and neural nets). As a by-product we obtain a measure of the effective number of parameters used by an adaptive procedure. We relate the covariance inflation criterion to other model selection procedures and illustrate its use in some regression and classification problems. We also revisit the conditional bootstrap approach to model selection.
  120. YANG, Y. and A.R. BARRON, 1998. An asymptotic property of model selection criteria. IEEE Trans. Inform. Theory. [Cited by 37] (4.88/year)
  121. Probability models are estimated by use of penalized log-likelihood criteria related to Akaike (1973) information criterion (AIC) and minimum description length (MDL). The accuracies of the density estimators are shown to be related to the tradeoff between three terms: the accuracy of approximation, the model dimension, and the descriptive complexity of the model classes. The asymptotic risk is determined under conditions on the penalty term, and is shown to be minimax optimal for some cases. As an application, we show that the optimal rate of convergence is simultaneously achieved for log-densities in Sobolev spaces W2s(U) without knowing the smoothness parameter s and norm parameter U in advance. Applications to neural network models and sparse density function estimation are also provided
  122. CHRISTOFFERSEN, P.F. and F.X. DIEBOLD, 1996. Further results on forecasting and model selection under asymmetric loss. Journal of Applied Econometrics. [Cited by 46] (4.80/year)
  123. We make three related contributions. First, we propose a new technique for solving prediction problems under asymmetric loss using piecewise-linear approximations to the loss function, and we establish existence and uniqueness of the optimal predictor. Second, we provide a detailed application to optimal prediction of a conditionally heteroskedastic process under asymmetric loss, the insights gained from which are broadly applicable. Finally, we incorporate our results into a general framework for recursive prediction-based model selection under the relevant loss function.
  124. TORR, P.H.S., 1997. An assessment of information criteria for motion model selection. Computer Vision and Pattern Recognition, 1997. Proceedings., …. [Cited by 40] (4.66/year)
  125. Rigid motion imposes constraints on the motion of image points between the two images. The matched points must conform to one of several possible constraints, such as that given by the fundamental matrix or image-image homography, and it is essential to know which model to fit to the data before recovery of structure, matching or segmentation can be performed successfully This paper compares several model selection methods with a particular emphasis on providing a method that will workfully automatically on real imagery
  126. SHAO, J., 1996. Bootstrap Model Selection.. Journal of the American Statistical Association. [Cited by 43] (4.49/year)
  127. In a regression problem, typically there are p explanatory variables possibly related to a response variable, and we wish to select a subset of the p explanatory variables to fit a model between these variables and the response. A bootstrap variable/model selection procedure is to select the subset of variables by minimizing bootstrap estimates of the prediction error, where the bootstrap estimates are constructed based on a data set of size n. Although the bootstrap estimates have good properties, this bootstrap selection procedure is inconsistent in the sense that the probability of selecting the optimal subset of variables does not converge to 1 as n ?8. This inconsistency can be rectified by modifying the sampling method used in drawing bootstrap observations. For bootstrapping pairs (response, explanatory variable), it is found that instead of drawing n bootstrap observations (a customary bootstrap sampling plan), much less bootstrap observations should be sampled: The bootstrap selection procedure becomes consistent if we draw m bootstrap observations with m ?8 and m/n?0. For bootstrapping residuals, we modify the bootstrap sampling procedure by increasing the variability among the bootstrap observations. The consistency of the modified bootstrap selection procedures is established in various situations, including linear models, nonlinear models, generalized linear models, and autoregressive time series. The choice of the bootstrap sample size m and some computational issues are also discussed. Some empirical results are presented.
  128. DOM, B. and S. VAITHYANATHAN, 1999. Model selection in unsupervised learning with applications to document clustering. The Sixteenth International Conference on Machine Learning. [Cited by 29] (4.41/year)
  129. MARON, O. and A.W. MOORE, 1994. Hoeding races: Accelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems. [Cited by 51] (4.41/year)
  130. Selecting a good model of a set of input points by cross validation is a computationally intensive process, especially if the number of possible models or the number of training points is high. Techniques such as gradient descent are helpful in searching through the space of models, but problems such as local minima, and more importantly, lack of a distance metric between various models reduce the applicability of these search methods. Hoeffding Races is a technique for finding a good model for the data by quickly discarding bad models, and concentrating the computational effort at differentiating between the better ones. This paper focuses on the special case of leave-one-out cross validation applied to memorybased learning algorithms, but we also argue that it is applicable to any class of model selection problems.
  131. MEADE, N. and T. ISLAM, 1998. Technological Forecasting--Model Selection, Model Stability, and Combining Models. Management Science. [Cited by 33] (4.36/year)
  132. The paper identifies 29 models that the literature suggests are appropriate for technological forecasting. These models are divided into three classes according to the timing of the point of inflexion in the innovation or substitution process. Faced with a given data set and such a choice, the issue of model selection needs to be addressed. Evidence used to aid model selection is drawn from measures of model fit and model stability. An analysis of the forecasting performance of these models using simulated data sets shows that it is easier to identify a class of possible models rather than the "best" model. This leads to the combining of model forecasts. The performance of the combined forecasts appears promising with a tendency to outperform the individual component models.
  133. SCHUURMANS, D., 1997. A New Metric-Based Approach to Model Selection. AAAI/IAAI. [Cited by 37] (4.31/year)
  134. BERAN, J., R.J. BHANSALI and D. OCKER, 1998. On Unified Model Selection for Stationary and Nonstationary Short-and Long-Memory Autoregressive …. Biometrika. [Cited by 31] (4.09/year)
  135. The question of model choice for the class of stationary and nonstationary, fractional and nonfractional autoregressive processes is considered. This class is defined by the property that the dth difference, for – << d << , is a stationary autoregressive process of order po << . A version of the Akaike information criterion, AIC, for determining an appropriate autoregressive order when d and the autoregressive parameters are estimated simultaneously by a maximum likelihood procedure (Beran, 1995) is derived and shown to be of the same general form as for a stationary autoregressive process, but with d treated as an additional estimated parameter. Moreover, as in the stationary case, this criterion is shown not to provide a consistent estimator of po. The corresponding versions of the BIC of Schwarz (1978) and the HIC of Hannan & Quinn (1979) are shown to yield consistent estimators of po. The results provide a unified treatment of fractional and nonfractional, stationary and integrated nonstationary autoregressive models.
  136. MADIGAN, D., et al., 1996. Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Comm. Statist. Theory Methods. [Cited by 39] (4.07/year)
  137. Bibliography