Mercurial > hg > octave-lyh
diff doc/interpreter/stats.txi @ 11436:e151e23f73bc
Overhaul base statistics functions and documentation of same.
Add or improve input validation.
Add input validation tests.
Add functional tests.
Improve or re-write documentation strings.
author | Rik <octave@nomad.inbox5.com> |
---|---|
date | Mon, 03 Jan 2011 21:23:08 -0800 |
parents | 757efa1d7e2a |
children | 76f15f3da207 |
line wrap: on
line diff
--- a/doc/interpreter/stats.txi +++ b/doc/interpreter/stats.txi @@ -21,12 +21,12 @@ @chapter Statistics Octave has support for various statistical methods. This includes -basic descriptive statistics, statistical tests, random number generation, -and much more. +basic descriptive statistics, probability distributions, statistical tests, +random number generation, and much more. -The functions that analyze data all assume that multi-dimensional data +The functions that analyze data all assume that multidimensional data is arranged in a matrix where each row is an observation, and each -column is a variable. So, the matrix defined by +column is a variable. Thus, the matrix defined by @example @group @@ -42,91 +42,101 @@ different arrangements. It should be noted that the statistics functions don't test for data -containing NaN, NA, or Inf. Such values need to be handled explicitly. +containing NaN, NA, or Inf. These values need to be detected and dealt +with explicitly. See @ref{doc-isnan,,isnan}, @ref{doc-isna,,isna}, +@ref{doc-isinf,,isinf}, @ref{doc-isfinite,,isfinite}. @menu * Descriptive Statistics:: * Basic Statistical Functions:: * Statistical Plots:: +* Correlation and Regression Analysis:: +* Distributions:: * Tests:: -* Models:: -* Distributions:: * Random Number Generation:: @end menu @node Descriptive Statistics @section Descriptive Statistics -Octave can compute various statistics such as the moments of a data set. +One principal goal of descriptive statistics is to represent the essence of a +large data set concisely. Octave provides the mean, median, and mode functions +which all summarize a data set with just a single number corresponding to +the central tendency of the data. @DOCSTRING(mean) @DOCSTRING(median) -@DOCSTRING(quantile) +@DOCSTRING(mode) -@DOCSTRING(prctile) +Using just one number, such as the mean, to represent an entire data set may +not give an accurate picture of the data. One way to characterize the fit is +to measure the dispersion of the data. Octave provides several functions for +measuring dispersion. + +@DOCSTRING(range) + +@DOCSTRING(iqr) @DOCSTRING(meansq) @DOCSTRING(std) +In addition to knowing the size of a dispersion it is useful to know the shape +of the data set. For example, are data points massed to the left or right +of the mean? Octave provides several common measures to describe the shape +of the data set. Octave can also calculate moments allowing arbitrary shape +measures to be developed. + @DOCSTRING(var) -@DOCSTRING(mode) - -@DOCSTRING(cov) - -@DOCSTRING(cor) - -@DOCSTRING(corrcoef) +@DOCSTRING(skewness) @DOCSTRING(kurtosis) -@DOCSTRING(skewness) +@DOCSTRING(moment) + +A summary view of a data set can be generated quickly with the +@code{statistics} function. @DOCSTRING(statistics) -@DOCSTRING(moment) - @node Basic Statistical Functions @section Basic Statistical Functions -Octave also supports various helpful statistical functions. - -@DOCSTRING(mahalanobis) +Octave supports various helpful statistical functions. Many are useful as +initial steps to prepare a data set for further analysis. Others provide +different measures from those of the basic descriptive statistics. @DOCSTRING(center) @DOCSTRING(studentize) -@DOCSTRING(nchoosek) +@DOCSTRING(histc) + +@DOCSTRING(cut) -@DOCSTRING(histc) +@c FIXME: really want to put a reference to unique here +@c @DOCSTRING(values) + +@DOCSTRING(nchoosek) @DOCSTRING(perms) -@DOCSTRING(table) - -@DOCSTRING(spearman) +@DOCSTRING(ranks) @DOCSTRING(run_count) -@DOCSTRING(ranks) - -@DOCSTRING(range) - @DOCSTRING(probit) @DOCSTRING(logit) @DOCSTRING(cloglog) -@DOCSTRING(kendall) +@DOCSTRING(mahalanobis) -@DOCSTRING(iqr) - -@DOCSTRING(cut) +@DOCSTRING(table) @node Statistical Plots @section Statistical Plots @@ -146,127 +156,23 @@ @DOCSTRING(ppplot) -@node Tests -@section Tests +@node Correlation and Regression Analysis +@section Correlation and Regression Analysis -Octave can perform several different statistical tests. The following -table summarizes the available tests. +@c FIXME: Need Intro Here + +@DOCSTRING(cov) + +@DOCSTRING(cor) -@tex -\vskip 6pt -{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt -\halign{ -\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em & -# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr -\noalign{\hrule height 0.6pt} -& @strong{Hypothesis} && {\bf Test Functions} &\cr -\noalign{\hrule} -& Equal mean values && anova, hotelling\_test2, t\_test\_2, &\cr -& && welch\_test, wilcoxon\_test, z\_test\_2 &\cr -& Equal medians && kruskal\_wallis\_test, sign\_test &\cr -& Equal variances && bartlett\_test, manova, var\_test &\cr -& Equal distributions && chisquare\_test\_homogeneity, &\cr -& && kolmogorov\_smirnov\_test\_2, u\_test &\cr -& Equal marginal frequencies && mcnemar\_test &\cr -& Equal success probabilities && prop\_test\_2 &\cr -& Independent observations && chisquare\_test\_independence, &\cr -& && run\_test &\cr -& Uncorrelated observations && cor\_test &\cr -& Given mean value && hotelling\_test, t\_test, z\_test &\cr -& Observations from distribution && kolmogorov\_smirnov\_test &\cr -& Regression && f\_test\_regression, t\_test\_regression &\cr -\noalign{\hrule height 0.6pt} -}}\hfill}} -@end tex -@ifnottex -@multitable @columnfractions .4 .5 -@item @strong{Hypothesis} - @tab @strong{Test Functions} -@item Equal mean values - @tab @code{anova}, @code{hotelling_test2}, @code{t_test_2}, - @code{welch_test}, @code{wilcoxon_test}, @code{z_test_2} -@item Equal medians - @tab @code{kruskal_wallis_test}, @code{sign_test} -@item Equal variances - @tab @code{bartlett_test}, @code{manova}, @code{var_test} -@item Equal distributions - @tab @code{chisquare_test_homogeneity}, @code{kolmogorov_smirnov_test_2}, - @code{u_test} -@item Equal marginal frequencies - @tab @code{mcnemar_test} -@item Equal success probabilities - @tab @code{prop_test_2} -@item Independent observations - @tab @code{chisquare_test_independence}, @code{run_test} -@item Uncorrelated observations - @tab @code{cor_test} -@item Given mean value - @tab @code{hotelling_test}, @code{t_test}, @code{z_test} -@item Observations from given distribution - @tab @code{kolmogorov_smirnov_test} -@item Regression - @tab @code{f_test_regression}, @code{t_test_regression} -@end multitable -@end ifnottex +@DOCSTRING(corrcoef) + +@DOCSTRING(spearman) -The tests return a p-value that describes the outcome of the test. -Assuming that the test hypothesis is true, the p-value is the probability -of obtaining a worse result than the observed one. So large p-values -corresponds to a successful test. Usually a test hypothesis is accepted -if the p-value exceeds @math{0.05}. - -@DOCSTRING(anova) - -@DOCSTRING(bartlett_test) - -@DOCSTRING(chisquare_test_homogeneity) - -@DOCSTRING(chisquare_test_independence) - -@DOCSTRING(cor_test) - -@DOCSTRING(f_test_regression) - -@DOCSTRING(hotelling_test) - -@DOCSTRING(hotelling_test_2) - -@DOCSTRING(kolmogorov_smirnov_test) - -@DOCSTRING(kolmogorov_smirnov_test_2) - -@DOCSTRING(kruskal_wallis_test) +@DOCSTRING(kendall) -@DOCSTRING(manova) - -@DOCSTRING(mcnemar_test) - -@DOCSTRING(prop_test_2) - -@DOCSTRING(run_test) - -@DOCSTRING(sign_test) - -@DOCSTRING(t_test) - -@DOCSTRING(t_test_2) +@c FIXME: Need discussion of ols & gls and references to them in optim.txi -@DOCSTRING(t_test_regression) - -@DOCSTRING(u_test) - -@DOCSTRING(var_test) - -@DOCSTRING(welch_test) - -@DOCSTRING(wilcoxon_test) - -@DOCSTRING(z_test) - -@DOCSTRING(z_test_2) - -@node Models -@section Models @DOCSTRING(logistic_regression) @@ -275,12 +181,11 @@ Octave has functions for computing the Probability Density Function (PDF), the Cumulative Distribution function (CDF), and the quantile -(the inverse of the CDF) of a large number of distributions. +(the inverse of the CDF) for a large number of distributions. The following table summarizes the supported distributions (in alphabetical order). -@c Do the table explicitly in TeX if possible to get a better layout. @tex \vskip 6pt {\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt @@ -414,133 +319,252 @@ @end multitable @end ifnottex +@DOCSTRING(betapdf) + @DOCSTRING(betacdf) @DOCSTRING(betainv) -@DOCSTRING(betapdf) +@DOCSTRING(binopdf) @DOCSTRING(binocdf) @DOCSTRING(binoinv) -@DOCSTRING(binopdf) +@DOCSTRING(cauchy_pdf) @DOCSTRING(cauchy_cdf) @DOCSTRING(cauchy_inv) -@DOCSTRING(cauchy_pdf) +@DOCSTRING(chi2pdf) @DOCSTRING(chi2cdf) @DOCSTRING(chi2inv) -@DOCSTRING(chi2pdf) +@DOCSTRING(discrete_pdf) @DOCSTRING(discrete_cdf) @DOCSTRING(discrete_inv) -@DOCSTRING(discrete_pdf) +@DOCSTRING(empirical_pdf) @DOCSTRING(empirical_cdf) @DOCSTRING(empirical_inv) -@DOCSTRING(empirical_pdf) +@DOCSTRING(exppdf) @DOCSTRING(expcdf) @DOCSTRING(expinv) -@DOCSTRING(exppdf) +@DOCSTRING(fpdf) @DOCSTRING(fcdf) @DOCSTRING(finv) -@DOCSTRING(fpdf) +@DOCSTRING(gampdf) @DOCSTRING(gamcdf) @DOCSTRING(gaminv) -@DOCSTRING(gampdf) +@DOCSTRING(geopdf) @DOCSTRING(geocdf) @DOCSTRING(geoinv) -@DOCSTRING(geopdf) +@DOCSTRING(hygepdf) @DOCSTRING(hygecdf) @DOCSTRING(hygeinv) -@DOCSTRING(hygepdf) +@DOCSTRING(kolmogorov_smirnov_cdf) -@DOCSTRING(kolmogorov_smirnov_cdf) +@DOCSTRING(laplace_pdf) @DOCSTRING(laplace_cdf) @DOCSTRING(laplace_inv) -@DOCSTRING(laplace_pdf) +@DOCSTRING(logistic_pdf) @DOCSTRING(logistic_cdf) @DOCSTRING(logistic_inv) -@DOCSTRING(logistic_pdf) +@DOCSTRING(lognpdf) @DOCSTRING(logncdf) @DOCSTRING(logninv) -@DOCSTRING(lognpdf) +@DOCSTRING(nbinpdf) @DOCSTRING(nbincdf) @DOCSTRING(nbininv) -@DOCSTRING(nbinpdf) +@DOCSTRING(normpdf) @DOCSTRING(normcdf) @DOCSTRING(norminv) -@DOCSTRING(normpdf) +@DOCSTRING(poisspdf) @DOCSTRING(poisscdf) @DOCSTRING(poissinv) -@DOCSTRING(poisspdf) +@DOCSTRING(tpdf) @DOCSTRING(tcdf) @DOCSTRING(tinv) -@DOCSTRING(tpdf) +@DOCSTRING(unidpdf) @DOCSTRING(unidcdf) @DOCSTRING(unidinv) -@DOCSTRING(unidpdf) +@DOCSTRING(unifpdf) @DOCSTRING(unifcdf) @DOCSTRING(unifinv) -@DOCSTRING(unifpdf) +@DOCSTRING(wblpdf) @DOCSTRING(wblcdf) @DOCSTRING(wblinv) -@DOCSTRING(wblpdf) +@node Tests +@section Tests + +Octave can perform many different statistical tests. The following +table summarizes the available tests. + +@tex +\vskip 6pt +{\hbox to \hsize {\hfill\vbox{\offinterlineskip \tabskip=0pt +\halign{ +\vrule height2.0ex depth1.ex width 0.6pt #\tabskip=0.3em & +# \hfil & \vrule # & # \hfil & # \vrule width 0.6pt \tabskip=0pt\cr +\noalign{\hrule height 0.6pt} +& @strong{Hypothesis} && {\bf Test Functions} &\cr +\noalign{\hrule} +& Equal mean values && anova, hotelling\_test2, t\_test\_2, &\cr +& && welch\_test, wilcoxon\_test, z\_test\_2 &\cr +& Equal medians && kruskal\_wallis\_test, sign\_test &\cr +& Equal variances && bartlett\_test, manova, var\_test &\cr +& Equal distributions && chisquare\_test\_homogeneity, &\cr +& && kolmogorov\_smirnov\_test\_2, u\_test &\cr +& Equal marginal frequencies && mcnemar\_test &\cr +& Equal success probabilities && prop\_test\_2 &\cr +& Independent observations && chisquare\_test\_independence, &\cr +& && run\_test &\cr +& Uncorrelated observations && cor\_test &\cr +& Given mean value && hotelling\_test, t\_test, z\_test &\cr +& Observations from distribution && kolmogorov\_smirnov\_test &\cr +& Regression && f\_test\_regression, t\_test\_regression &\cr +\noalign{\hrule height 0.6pt} +}}\hfill}} +@end tex +@ifnottex +@multitable @columnfractions .4 .5 +@item @strong{Hypothesis} + @tab @strong{Test Functions} +@item Equal mean values + @tab @code{anova}, @code{hotelling_test2}, @code{t_test_2}, + @code{welch_test}, @code{wilcoxon_test}, @code{z_test_2} +@item Equal medians + @tab @code{kruskal_wallis_test}, @code{sign_test} +@item Equal variances + @tab @code{bartlett_test}, @code{manova}, @code{var_test} +@item Equal distributions + @tab @code{chisquare_test_homogeneity}, @code{kolmogorov_smirnov_test_2}, + @code{u_test} +@item Equal marginal frequencies + @tab @code{mcnemar_test} +@item Equal success probabilities + @tab @code{prop_test_2} +@item Independent observations + @tab @code{chisquare_test_independence}, @code{run_test} +@item Uncorrelated observations + @tab @code{cor_test} +@item Given mean value + @tab @code{hotelling_test}, @code{t_test}, @code{z_test} +@item Observations from given distribution + @tab @code{kolmogorov_smirnov_test} +@item Regression + @tab @code{f_test_regression}, @code{t_test_regression} +@end multitable +@end ifnottex + +The tests return a p-value that describes the outcome of the test. +Assuming that the test hypothesis is true, the p-value is the probability +of obtaining a worse result than the observed one. So large p-values +corresponds to a successful test. Usually a test hypothesis is accepted +if the p-value exceeds 0.05. + +@DOCSTRING(anova) + +@DOCSTRING(bartlett_test) + +@DOCSTRING(chisquare_test_homogeneity) + +@DOCSTRING(chisquare_test_independence) + +@DOCSTRING(cor_test) + +@DOCSTRING(f_test_regression) + +@DOCSTRING(hotelling_test) + +@DOCSTRING(hotelling_test_2) + +@DOCSTRING(kolmogorov_smirnov_test) + +@DOCSTRING(kolmogorov_smirnov_test_2) + +@DOCSTRING(kruskal_wallis_test) + +@DOCSTRING(manova) + +@DOCSTRING(mcnemar_test) + +@DOCSTRING(prop_test_2) + +@DOCSTRING(run_test) + +@DOCSTRING(sign_test) + +@DOCSTRING(t_test) + +@DOCSTRING(t_test_2) + +@DOCSTRING(t_test_regression) + +@DOCSTRING(u_test) + +@DOCSTRING(var_test) + +@DOCSTRING(welch_test) + +@DOCSTRING(wilcoxon_test) + +@DOCSTRING(z_test) + +@DOCSTRING(z_test_2) @node Random Number Generation @section Random Number Generation