IGLib  1.7.2
The IGLib base library EXTENDED - with other lilbraries and applications.
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Events Macros
Meta.Numerics.Statistics.MultivariateSample Class Reference

Represents a multivariate sample. More...

+ Inheritance diagram for Meta.Numerics.Statistics.MultivariateSample:
+ Collaboration diagram for Meta.Numerics.Statistics.MultivariateSample:

Public Member Functions

 MultivariateSample (int dimension)
 Initializes a new multivariate sample. More...
 
 MultivariateSample (params string[] names)
 Initializes a new multivariate sample with the given variable names. More...
 
void Add (params double[] value)
 Adds an entry to the sample. More...
 
void Add (IList< double > value)
 Adds an entry to the sample. More...
 
bool Remove (params double[] value)
 Removes an entry from the sample. More...
 
bool Remove (IList< double > values)
 Removes an entry from the sample. More...
 
bool Contains (params double[] value)
 Determines whether the sample contains a given entry. More...
 
bool Contains (IList< double > value)
 Determines whether the sample contains a given entry. More...
 
void Clear ()
 Removes all entries from the sample. More...
 
Sample Column (int c)
 Gets the indicated column as a univariate Sample. More...
 
BivariateSample TwoColumns (int cx, int cy)
 Gets the indicated columns as a BivariateSample. More...
 
MultivariateSample Columns (IList< int > columnIndexes)
 Gets the indicated columns as a multivariate sample. More...
 
MultivariateSample Columns (params int[] columnIndexes)
 Gets the indicated columns as a multivariate sample. More...
 
MultivariateSample Copy ()
 Copies the multivariate sample. More...
 
double Moment (params int[] powers)
 Computes the given sample raw moment. More...
 
double Moment (IList< int > powers)
 Computes the given sample raw moment. More...
 
double MomentAboutMean (IList< int > powers)
 Computes the given sample central moment. More...
 
double MomentAboutMean (params int[] powers)
 Computes the given sample central moment. More...
 
FitResult LinearRegression (int outputIndex)
 Performs a linear regression analysis. More...
 
IEnumerator< double[]> GetEnumerator ()
 Gets an enumerator over the sample entries. More...
 
PrincipalComponentAnalysis PrincipalComponentAnalysis ()
 Performs a principal component analysis of the data. More...
 
void Load (IDataReader reader, IList< int > dbIndexes)
 Loads values from a data reader. More...
 
void Load (IDataReader reader, params int[] dbIndexes)
 Loads values from a data reader. More...
 

Properties

int Dimension [get]
 Gets the dimension of the sample. More...
 
int Count [get]
 Gets the number of enties in the sample. More...
 
bool IsReadOnly [get]
 Gets a value indicating whether the multivariate sample can be modified. More...
 

Private Member Functions

int IndexOf (IList< double > value)
 
bool Matches (IList< double > value, int i)
 
FitResult LinearRegression_Internal (int outputIndex)
 
IEnumerator IEnumerable. GetEnumerator ()
 
void ICollection< double[]>. CopyTo (double[][] array, int start)
 
bool ReadValues (IDataReader reader, IList< int > dbIndexes, double[] entry)
 

Private Attributes

List< double[]> data = new List<double[]>()
 
SampleStorage[] storage
 
bool isReadOnly
 

Detailed Description

Represents a multivariate sample.

A multivariate sample is simply a sample in which more than one number is associated with each data point. A study which records only the height of each subject could use the Sample class to store its data, but a study which records the income and height of each subject should use MutlivariateSample class. In addition to descriptive statistics, this class offers tests for studying the associations between the recorded variables, and routines for fitting the sample to a model.

Constructor & Destructor Documentation

Meta.Numerics.Statistics.MultivariateSample.MultivariateSample ( int  dimension)
inline

Initializes a new multivariate sample.

Parameters
dimensionThe dimension of the sample space, that is, the number of variables recorded for each sample entry.
Exceptions
ArgumentOutOfRangeExceptiondimension is less than one.
Meta.Numerics.Statistics.MultivariateSample.MultivariateSample ( params string[]  names)
inline

Initializes a new multivariate sample with the given variable names.

Parameters
namesThe names of the variables.

Member Function Documentation

int Meta.Numerics.Statistics.MultivariateSample.IndexOf ( IList< double >  value)
inlineprivate
bool Meta.Numerics.Statistics.MultivariateSample.Matches ( IList< double >  value,
int  i 
)
inlineprivate
void Meta.Numerics.Statistics.MultivariateSample.Add ( IList< double >  value)
inline

Adds an entry to the sample.

Parameters
valueThe values associated with the entry.
bool Meta.Numerics.Statistics.MultivariateSample.Remove ( params double[]  value)
inline

Removes an entry from the sample.

Parameters
valueThe values associated with the entry to remove.
Returns
Whether the entry was found and removed.

Referenced by Test.MultivariateSampleTest.MultivariateManipulations().

bool Meta.Numerics.Statistics.MultivariateSample.Remove ( IList< double >  values)
inline

Removes an entry from the sample.

Parameters
valuesThe values associated with the entry to remove.
Returns
Whether the entry was found and removed.
bool Meta.Numerics.Statistics.MultivariateSample.Contains ( params double[]  value)
inline

Determines whether the sample contains a given entry.

Parameters
valueThe values associated with the entry to search for.
Returns
Whether the sample contains the given entry.

Referenced by Test.MultivariateSampleTest.MultivariateManipulations().

bool Meta.Numerics.Statistics.MultivariateSample.Contains ( IList< double >  value)
inline

Determines whether the sample contains a given entry.

Parameters
valueThe values associated with the entry to search for.
Returns
Whether the sample contains the given entry.
void Meta.Numerics.Statistics.MultivariateSample.Clear ( )
inline

Removes all entries from the sample.

Referenced by Test.MultivariateSampleTest.MultivariateManipulations().

Sample Meta.Numerics.Statistics.MultivariateSample.Column ( int  c)
inline

Gets the indicated column as a univariate Sample.

Parameters
cThe (zero-based) column index.
Returns
A read-only Sample containing all values in the indicated column.

Use this method to obtain column-specific information, such as the Sample.Median or Sample.Variance of the column.

Note that this is a fast, O(1) operation, which does not create an independent copy of the column. The advantage of this is that you can access columns as independent samples as often as you like without worying about performance. The disadvantage of this is that the returned sample cannot be altered. If you need to alter values in a column independent of the multi-variate sample, use the Sample.Copy method to obtain an independent copy of the column.

Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.DataSetTest.FitDataToPolynomialUncertaintiesTest(), Test.MultivariateSampleTest.GetTotalVariance(), Test.MultivariateSampleTest.MultivariateMoments(), Test.MultivariateSampleTest.MultivariateNormalSummaryStatistics(), Test.RectangularMatrixTest.PC(), Test.MultivariateSampleTest.PrincipalComponentAnalysis(), Test.SampleTest.SamplePopulationMomentEstimateVariances(), and Test.SampleTest.WeibullFitUncertainties().

BivariateSample Meta.Numerics.Statistics.MultivariateSample.TwoColumns ( int  cx,
int  cy 
)
inline

Gets the indicated columns as a BivariateSample.

Parameters
cxThe (zero-based) column index of the X variable.
cyThe (zero-based) column index of the Y variable.
Returns
A read-only BivariateSample consisting of the indicated columns..

Use this method to obtain information specific to the two columns, such as the BivariateSample.Covariance, or to perform tests specific to the two columns, such as a BivariateSample.PearsonRTest.

Note that this is a fast, O(1) operation, which does not create independent copies of the columns. The advantage of this is that you can access pairs of columns as bivariate samples as often as you like without worying about performance. The disadvantage of this is that the returned bivariate sample cannot be altered. If you need to alter values independent of the multi-variate sample, use the BivariateSample.Copy method to obtain an independent copy of the bivariate sample.

Referenced by Test.BivariateSampleTest.BivariatePolynomialRegression(), Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement(), and Test.MultivariateSampleTest.MultivariateMoments().

MultivariateSample Meta.Numerics.Statistics.MultivariateSample.Columns ( IList< int >  columnIndexes)
inline

Gets the indicated columns as a multivariate sample.

Parameters
columnIndexesA list of column indexes.
Returns
A read-only MultivariateSample consisting of the indicated columns.

Use this method to perform multivariate analyses, such as regression and principal component analyis, using only a subset of the variables in the original multivariate sample.

Note that this is a fast operation, which does not create independent copies of the columns.

Referenced by Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement().

MultivariateSample Meta.Numerics.Statistics.MultivariateSample.Columns ( params int[]  columnIndexes)
inline

Gets the indicated columns as a multivariate sample.

Parameters
columnIndexesA list of columns indexes.
Returns
A read-only MultivariateSample consisting of the indicated columns.
MultivariateSample Meta.Numerics.Statistics.MultivariateSample.Copy ( )
inline

Copies the multivariate sample.

Returns
An independent copy of the multivariate sample.

References Meta.Numerics.Statistics.MultivariateSample.Copy().

Referenced by Meta.Numerics.Statistics.MultivariateSample.Copy().

double Meta.Numerics.Statistics.MultivariateSample.Moment ( params int[]  powers)
inline

Computes the given sample raw moment.

Parameters
powersThe power to which each component should be raised.
Returns
The specified moment.

Referenced by Test.MultivariateSampleTest.MultivariateMoments().

double Meta.Numerics.Statistics.MultivariateSample.Moment ( IList< int >  powers)
inline

Computes the given sample raw moment.

Parameters
powersThe power to which each component should be raised.
Returns
The specified moment.
Exceptions
ArgumentNullExceptionpowers is null.
DimensionMismatchExceptionThe length of powers is not equal to the Dimension of the multivariate sample.

References Meta.Numerics.MoreMath.Pow().

double Meta.Numerics.Statistics.MultivariateSample.MomentAboutMean ( IList< int >  powers)
inline

Computes the given sample central moment.

Parameters
powersThe power to which each component should be raised.
Returns
The specified moment.
Exceptions
ArgumentNullExceptionpowers is null.
DimensionMismatchExceptionThe length of powers is not equal to the Dimension of the multivariate sample.

References Meta.Numerics.MoreMath.Pow().

Referenced by Test.MultivariateSampleTest.MultivariateMoments().

double Meta.Numerics.Statistics.MultivariateSample.MomentAboutMean ( params int[]  powers)
inline

Computes the given sample central moment.

Parameters
powersThe power to which each component should be raised.
Returns
The specified moment.
FitResult Meta.Numerics.Statistics.MultivariateSample.LinearRegression_Internal ( int  outputIndex)
inlineprivate
FitResult Meta.Numerics.Statistics.MultivariateSample.LinearRegression ( int  outputIndex)
inline

Performs a linear regression analysis.

Parameters
outputIndexThe index of the variable to be predicted.
Returns
The result of the regression.

Linear regression finds the linear combination of the other variables that best predicts the output variable.

The noise term epsilon is assumed to be drawn from the same normal distribution for each data point. Note that the model makes no assumptions about the distribution of the x's; it merely asserts a particular underlying relationship between the x's and the y.

Inputs and Outputs

In the returned fit result, the indices of the parameters correspond to indices of the coefficients. The intercept parameter has the index of the output variable. Thus if a linear regression analaysis is done on a 4-dimensional multivariate sample to predict variable number 2, the coefficients of variables 0, 1, and 3 will be parameters 0, 1, and 3, of the returned fit result, and the intercept will be parameter 2.

If you want to include fewer input variables in your regression, use the Columns(IList{Int32}) method to create a multivariate sample that includes only the variables you want to use in your regression.

The correlation matrix among fit parameters is also returned with the fit result, as is an F-test for the goodness of the fit. If the result of the F-test is not significant, no conclusions should be drawn from the regression coefficients.

Regression vs. Correlation

If a given coefficient is significantly positive, then a change in the value of the corresponding input variable, holding all other input variables constant, will tend to increase the output variable. Note that italicized condition means that, when there is more than one input variable, a linear regression coefficient measures something different than a linear correlation coefficient.

Suppose, for example, we take a large number of measurements of water temperature, plankton concentration, and fish density in a large number of different locations. A simple correlation analysis might indicate that fish density is positively correlated with both water temperature and plankton concentration. But a regression analysis might reveal that increasing water temperature actually decreases the fish density. This seeming paradoxical situation might occur because fish do much better with more plankton, and plankton do much better at higher temperatures, and this positive knock-on effect of temperature on fish is larger than the negative direct effect.

If we are in a situation where we can control the input variables independently – for example we are running an aquarium – we would ceratainly want to know the specific effect of one variable – that our fishes would actually prefer us to turn down the temperature while maintaining a high plankton level – rather than the observed effect as a variable changes along with all the others that tend to change with it. This does not mean that the correlation analysis is wrong – higher temperatures are indeed associated with higher fish densitites in our hypothetical data set. It simply means that you need to be careful to ask the right question for your purpose.

In most cases, it is indeed the specific effect of one variable when others are held constant that we seek. In a controlled experiment, the confounding effects of other variables are removed by the experimental design, either by random assignment or specific controls. In an observational experiment, though, confounding effects can be, and often are, large, and correlation analysis is not sufficient. It is worthwhile keeping this in find in politically charged debates in which easily observed correlations are likely to be bandied about as evidence, while a more difficult regression analysis that would actually be required to support an assertion is left undone.

Cavets

It can occur that two theoretically independent variables are so closely correlated in the observational data that a regression analsysis cannot reliably tease out the independent effect of each. In that case, a fit using only one of the variables will be as good or nearly as good as a fit using both, and the covariance between their corresponding linear regression coefficients will be large. In a situation like this, you should be wary of drawing any conclusions about their seperate effects.

It can also occur that an input variable or a set of input variables is indeed good predictor of an output variable, but via a complex and non-linear relationship that a linear regression analysis will completely miss.

Exceptions
ArgumentOutOfRangeExceptionoutputIndex is outside the range of allowed indexes.
InsufficientDataExceptionThere are fewer entries than the dimension of the multivariate sample.
See also
LinearRegression(int)

Referenced by Test.MultivariateSampleTest.MultivariateLinearRegressionAgreement(), Test.MultivariateSampleTest.MultivariateLinearRegressionBadInputTest(), Test.MultivariateSampleTest.MultivariateLinearRegressionNullDistribution(), Test.MultivariateSampleTest.MultivariateLinearRegressionTest(), and Test.MultivariateSampleTest.OldMultivariateLinearRegressionTest().

IEnumerator<double[]> Meta.Numerics.Statistics.MultivariateSample.GetEnumerator ( )
inline

Gets an enumerator over the sample entries.

Returns
An iterator over the sample entries.

Referenced by Test.RectangularMatrixTest.PC().

IEnumerator IEnumerable. Meta.Numerics.Statistics.MultivariateSample.GetEnumerator ( )
inlineprivate
void ICollection<double[]>. Meta.Numerics.Statistics.MultivariateSample.CopyTo ( double  array[][],
int  start 
)
inlineprivate
PrincipalComponentAnalysis Meta.Numerics.Statistics.MultivariateSample.PrincipalComponentAnalysis ( )
inline

Performs a principal component analysis of the data.

Returns
The result of the principal component analysis.
Exceptions
InsufficientDataExceptionThe number of data entries (Count) is less than the number of variables (Dimension).
See also
PrincipalComponentAnalysis

Referenced by Test.RectangularMatrixTest.PC(), and Test.MultivariateSampleTest.PrincipalComponentAnalysis().

void Meta.Numerics.Statistics.MultivariateSample.Load ( IDataReader  reader,
IList< int >  dbIndexes 
)
inline

Loads values from a data reader.

Parameters
readerThe data reader.
dbIndexesThe database column indexes of the sample columns.
bool Meta.Numerics.Statistics.MultivariateSample.ReadValues ( IDataReader  reader,
IList< int >  dbIndexes,
double[]  entry 
)
inlineprivate
void Meta.Numerics.Statistics.MultivariateSample.Load ( IDataReader  reader,
params int[]  dbIndexes 
)
inline

Loads values from a data reader.

Parameters
readerThe data reader.
dbIndexesThe database column indexes of the sample columns.

Member Data Documentation

List<double[]> Meta.Numerics.Statistics.MultivariateSample.data = new List<double[]>()
private
SampleStorage [] Meta.Numerics.Statistics.MultivariateSample.storage
private
bool Meta.Numerics.Statistics.MultivariateSample.isReadOnly
private

Property Documentation

bool Meta.Numerics.Statistics.MultivariateSample.IsReadOnly
get

Gets a value indicating whether the multivariate sample can be modified.


The documentation for this class was generated from the following file: