IGLib
1.7.2
The IGLib base library EXTENDED - with other lilbraries and applications.
|
Represents a multivariate sample. More...
Public Member Functions | |
MultivariateSample (int dimension) | |
Initializes a new multivariate sample. More... | |
MultivariateSample (params string[] names) | |
Initializes a new multivariate sample with the given variable names. More... | |
void | Add (params double[] value) |
Adds an entry to the sample. More... | |
void | Add (IList< double > value) |
Adds an entry to the sample. More... | |
bool | Remove (params double[] value) |
Removes an entry from the sample. More... | |
bool | Remove (IList< double > values) |
Removes an entry from the sample. More... | |
bool | Contains (params double[] value) |
Determines whether the sample contains a given entry. More... | |
bool | Contains (IList< double > value) |
Determines whether the sample contains a given entry. More... | |
void | Clear () |
Removes all entries from the sample. More... | |
Sample | Column (int c) |
Gets the indicated column as a univariate Sample. More... | |
BivariateSample | TwoColumns (int cx, int cy) |
Gets the indicated columns as a BivariateSample. More... | |
MultivariateSample | Columns (IList< int > columnIndexes) |
Gets the indicated columns as a multivariate sample. More... | |
MultivariateSample | Columns (params int[] columnIndexes) |
Gets the indicated columns as a multivariate sample. More... | |
MultivariateSample | Copy () |
Copies the multivariate sample. More... | |
double | Moment (params int[] powers) |
Computes the given sample raw moment. More... | |
double | Moment (IList< int > powers) |
Computes the given sample raw moment. More... | |
double | MomentAboutMean (IList< int > powers) |
Computes the given sample central moment. More... | |
double | MomentAboutMean (params int[] powers) |
Computes the given sample central moment. More... | |
FitResult | LinearRegression (int outputIndex) |
Performs a linear regression analysis. More... | |
IEnumerator< double[]> | GetEnumerator () |
Gets an enumerator over the sample entries. More... | |
PrincipalComponentAnalysis | PrincipalComponentAnalysis () |
Performs a principal component analysis of the data. More... | |
void | Load (IDataReader reader, IList< int > dbIndexes) |
Loads values from a data reader. More... | |
void | Load (IDataReader reader, params int[] dbIndexes) |
Loads values from a data reader. More... | |
Properties | |
int | Dimension [get] |
Gets the dimension of the sample. More... | |
int | Count [get] |
Gets the number of enties in the sample. More... | |
bool | IsReadOnly [get] |
Gets a value indicating whether the multivariate sample can be modified. More... | |
Private Member Functions | |
int | IndexOf (IList< double > value) |
bool | Matches (IList< double > value, int i) |
FitResult | LinearRegression_Internal (int outputIndex) |
IEnumerator IEnumerable. | GetEnumerator () |
void ICollection< double[]>. | CopyTo (double[][] array, int start) |
bool | ReadValues (IDataReader reader, IList< int > dbIndexes, double[] entry) |
Private Attributes | |
List< double[]> | data = new List<double[]>() |
SampleStorage[] | storage |
bool | isReadOnly |
Represents a multivariate sample.
A multivariate sample is simply a sample in which more than one number is associated with each data point. A study which records only the height of each subject could use the Sample class to store its data, but a study which records the income and height of each subject should use MutlivariateSample class. In addition to descriptive statistics, this class offers tests for studying the associations between the recorded variables, and routines for fitting the sample to a model.
|
inline |
Initializes a new multivariate sample.
dimension | The dimension of the sample space, that is, the number of variables recorded for each sample entry. |
ArgumentOutOfRangeException | dimension is less than one. |
|
inline |
Initializes a new multivariate sample with the given variable names.
names | The names of the variables. |
|
inlineprivate |
|
inlineprivate |
|
inline |
Adds an entry to the sample.
value | The values associated with the entry. |
Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.MultivariateSampleTest.CreateMultivariateNormalSample(), Test.DataSetTest.FitDataToLineUncertaintyTest(), Test.DataSetTest.FitDataToPolynomialUncertaintiesTest(), Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement(), Test.MultivariateSampleTest.MultivariateLinearRegressionBadInputTest(), Test.MultivariateSampleTest.MultivariateLinearRegressionNullDistribution(), Test.MultivariateSampleTest.MultivariateLinearRegressionTest(), Test.MultivariateSampleTest.MultivariateManipulations(), Test.MultivariateSampleTest.MultivariateMoments(), Test.MultivariateSampleTest.OldMultivariateLinearRegressionTest(), Test.RectangularMatrixTest.PC(), Test.MultivariateSampleTest.PrincipalComponentAnalysis(), Test.SampleTest.SamplePopulationMomentEstimateVariances(), Meta.Numerics.Statistics.PrincipalComponentAnalysis.TransformedSample(), and Test.SampleTest.WeibullFitUncertainties().
|
inline |
Adds an entry to the sample.
value | The values associated with the entry. |
|
inline |
Removes an entry from the sample.
value | The values associated with the entry to remove. |
Referenced by Test.MultivariateSampleTest.MultivariateManipulations().
|
inline |
Removes an entry from the sample.
values | The values associated with the entry to remove. |
|
inline |
Determines whether the sample contains a given entry.
value | The values associated with the entry to search for. |
Referenced by Test.MultivariateSampleTest.MultivariateManipulations().
|
inline |
Determines whether the sample contains a given entry.
value | The values associated with the entry to search for. |
|
inline |
Removes all entries from the sample.
Referenced by Test.MultivariateSampleTest.MultivariateManipulations().
|
inline |
Gets the indicated column as a univariate Sample.
c | The (zero-based) column index. |
Use this method to obtain column-specific information, such as the Sample.Median or Sample.Variance of the column.
Note that this is a fast, O(1) operation, which does not create an independent copy of the column. The advantage of this is that you can access columns as independent samples as often as you like without worying about performance. The disadvantage of this is that the returned sample cannot be altered. If you need to alter values in a column independent of the multi-variate sample, use the Sample.Copy method to obtain an independent copy of the column.
Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.DataSetTest.FitDataToPolynomialUncertaintiesTest(), Test.MultivariateSampleTest.GetTotalVariance(), Test.MultivariateSampleTest.MultivariateMoments(), Test.MultivariateSampleTest.MultivariateNormalSummaryStatistics(), Test.RectangularMatrixTest.PC(), Test.MultivariateSampleTest.PrincipalComponentAnalysis(), Test.SampleTest.SamplePopulationMomentEstimateVariances(), and Test.SampleTest.WeibullFitUncertainties().
|
inline |
Gets the indicated columns as a BivariateSample.
cx | The (zero-based) column index of the X variable. |
cy | The (zero-based) column index of the Y variable. |
Use this method to obtain information specific to the two columns, such as the BivariateSample.Covariance, or to perform tests specific to the two columns, such as a BivariateSample.PearsonRTest.
Note that this is a fast, O(1) operation, which does not create independent copies of the columns. The advantage of this is that you can access pairs of columns as bivariate samples as often as you like without worying about performance. The disadvantage of this is that the returned bivariate sample cannot be altered. If you need to alter values independent of the multi-variate sample, use the BivariateSample.Copy method to obtain an independent copy of the bivariate sample.
Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement(), and Test.MultivariateSampleTest.MultivariateMoments().
|
inline |
Gets the indicated columns as a multivariate sample.
columnIndexes | A list of column indexes. |
Use this method to perform multivariate analyses, such as regression and principal component analyis, using only a subset of the variables in the original multivariate sample.
Note that this is a fast operation, which does not create independent copies of the columns.
Referenced by Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement().
|
inline |
Gets the indicated columns as a multivariate sample.
columnIndexes | A list of columns indexes. |
|
inline |
Copies the multivariate sample.
References Meta.Numerics.Statistics.MultivariateSample.Copy().
Referenced by Meta.Numerics.Statistics.MultivariateSample.Copy().
|
inline |
Computes the given sample raw moment.
powers | The power to which each component should be raised. |
Referenced by Test.MultivariateSampleTest.MultivariateMoments().
|
inline |
Computes the given sample raw moment.
powers | The power to which each component should be raised. |
ArgumentNullException | powers is null. |
DimensionMismatchException | The length of powers is not equal to the Dimension of the multivariate sample. |
References Meta.Numerics.MoreMath.Pow().
|
inline |
Computes the given sample central moment.
powers | The power to which each component should be raised. |
ArgumentNullException | powers is null. |
DimensionMismatchException | The length of powers is not equal to the Dimension of the multivariate sample. |
References Meta.Numerics.MoreMath.Pow().
Referenced by Test.MultivariateSampleTest.MultivariateMoments().
|
inline |
Computes the given sample central moment.
powers | The power to which each component should be raised. |
|
inlineprivate |
|
inline |
Performs a linear regression analysis.
outputIndex | The index of the variable to be predicted. |
Linear regression finds the linear combination of the other variables that best predicts the output variable.
The noise term epsilon is assumed to be drawn from the same normal distribution for each data point. Note that the model makes no assumptions about the distribution of the x's; it merely asserts a particular underlying relationship between the x's and the y.
In the returned fit result, the indices of the parameters correspond to indices of the coefficients. The intercept parameter has the index of the output variable. Thus if a linear regression analaysis is done on a 4-dimensional multivariate sample to predict variable number 2, the coefficients of variables 0, 1, and 3 will be parameters 0, 1, and 3, of the returned fit result, and the intercept will be parameter 2.
If you want to include fewer input variables in your regression, use the Columns(IList{Int32}) method to create a multivariate sample that includes only the variables you want to use in your regression.
The correlation matrix among fit parameters is also returned with the fit result, as is an F-test for the goodness of the fit. If the result of the F-test is not significant, no conclusions should be drawn from the regression coefficients.
If a given coefficient is significantly positive, then a change in the value of the corresponding input variable, holding all other input variables constant, will tend to increase the output variable. Note that italicized condition means that, when there is more than one input variable, a linear regression coefficient measures something different than a linear correlation coefficient.
Suppose, for example, we take a large number of measurements of water temperature, plankton concentration, and fish density in a large number of different locations. A simple correlation analysis might indicate that fish density is positively correlated with both water temperature and plankton concentration. But a regression analysis might reveal that increasing water temperature actually decreases the fish density. This seeming paradoxical situation might occur because fish do much better with more plankton, and plankton do much better at higher temperatures, and this positive knock-on effect of temperature on fish is larger than the negative direct effect.
If we are in a situation where we can control the input variables independently – for example we are running an aquarium – we would ceratainly want to know the specific effect of one variable – that our fishes would actually prefer us to turn down the temperature while maintaining a high plankton level – rather than the observed effect as a variable changes along with all the others that tend to change with it. This does not mean that the correlation analysis is wrong – higher temperatures are indeed associated with higher fish densitites in our hypothetical data set. It simply means that you need to be careful to ask the right question for your purpose.
In most cases, it is indeed the specific effect of one variable when others are held constant that we seek. In a controlled experiment, the confounding effects of other variables are removed by the experimental design, either by random assignment or specific controls. In an observational experiment, though, confounding effects can be, and often are, large, and correlation analysis is not sufficient. It is worthwhile keeping this in find in politically charged debates in which easily observed correlations are likely to be bandied about as evidence, while a more difficult regression analysis that would actually be required to support an assertion is left undone.
It can occur that two theoretically independent variables are so closely correlated in the observational data that a regression analsysis cannot reliably tease out the independent effect of each. In that case, a fit using only one of the variables will be as good or nearly as good as a fit using both, and the covariance between their corresponding linear regression coefficients will be large. In a situation like this, you should be wary of drawing any conclusions about their seperate effects.
It can also occur that an input variable or a set of input variables is indeed good predictor of an output variable, but via a complex and non-linear relationship that a linear regression analysis will completely miss.
ArgumentOutOfRangeException | outputIndex is outside the range of allowed indexes. |
InsufficientDataException | There are fewer entries than the dimension of the multivariate sample. |
Referenced by Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement(), Test.MultivariateSampleTest.MultivariateLinearRegressionBadInputTest(), Test.MultivariateSampleTest.MultivariateLinearRegressionNullDistribution(), Test.MultivariateSampleTest.MultivariateLinearRegressionTest(), and Test.MultivariateSampleTest.OldMultivariateLinearRegressionTest().
|
inline |
Gets an enumerator over the sample entries.
Referenced by Test.RectangularMatrixTest.PC().
|
inlineprivate |
|
inlineprivate |
|
inline |
Performs a principal component analysis of the data.
InsufficientDataException | The number of data entries (Count) is less than the number of variables (Dimension). |
Referenced by Test.RectangularMatrixTest.PC(), and Test.MultivariateSampleTest.PrincipalComponentAnalysis().
|
inline |
Loads values from a data reader.
reader | The data reader. |
dbIndexes | The database column indexes of the sample columns. |
|
inlineprivate |
|
inline |
Loads values from a data reader.
reader | The data reader. |
dbIndexes | The database column indexes of the sample columns. |
|
private |
|
private |
|
private |
|
get |
Gets the dimension of the sample.
Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.MultivariateSampleTest.GetTotalVariance(), Test.MultivariateSampleTest.MultivariateManipulations(), Test.MultivariateSampleTest.PrincipalComponentAnalysis(), and Test.SampleTest.SamplePopulationMomentEstimateVariances().
|
get |
Gets the number of enties in the sample.
Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.MultivariateSampleTest.MultivariateManipulations(), Test.MultivariateSampleTest.MultivariateNormalSummaryStatistics(), Test.MultivariateSampleTest.OldMultivariateLinearRegressionTest(), and Test.MultivariateSampleTest.PrincipalComponentAnalysis().
|
get |
Gets a value indicating whether the multivariate sample can be modified.