Correlation Matrix Block
This block calculates a matrix of correlation coefficients for a set of input values. From that matrix, the block outputs a set of correlation-related statistics.
NOTE: This block can only be executed in playback mode in the Architect or as a real-time action object. Simulation and event-based modes are not supported.
Correlation Matrix block
Description
This block takes in a single input with a set of fields and correlates each field in that set with every other field in the set over a configured data window size. This produces a matrix of correlation coefficients.
A new calculation is triggered either on every execute or on a change of field value for a sample of good quality. The choice of trigger is configured as one of the block properties. Block execution does not necessarily occur at regular intervals.
The outputs of the block are:
-
The timestamps, qualities, and top correlations of all field pairs in a configured data window
-
The timestamps, qualities, and top average correlations of all field pairs for a configured average history size
-
The timestamps, qualities, and top deviations from previous calculated average correlations of all field pairs in a given window
Blocks results are always delayed by at least one execution. I.e., results are not produced at the output port at the same time as the execution is triggered. Results are delayed by the time taken for the calculation and the time that elapses between completion of the calculation and the next block execute. Because of this delay, on first execute, all output fields have zero value, bad quality, and a timestamp that is before any other possible timestamp.
Correlations
The first output of the block is a set of the top correlation coefficients (R) between each field pair in the input, calculated over a user-defined data window. This data window is a window of samples, not a time window. The window moves forward with each new sample. The data window size is configured as one of the block properties.
Correlation analysis attempts to measure the strength of linear relationships between variables by means of a single number called a correlation coefficient R. The measure of linear association between two variables X and Y is estimated by the sample correlation coefficient R, where N is the number of samples in the window and R is calculated as follows:
Values for R:
-
R will have values that range between –1 and +1.
-
A value of +1 indicates that the two variables are in a perfect positive linear relationship over the time window (perfect positive correlation), i.e., Y = aX + b.
-
A value of –1 indicates a perfect negative linear relationship (perfect negative correlation), i.e., (Y = aX + b, for a negative value of a).
-
A value of 0 indicates that there is no correlation between the two variables over the time window. If one of the fields has zero variance over the window, this will also result in the block producing an output correlation coefficient of 0, with good quality.
It is important to note the following:
-
The correlation coefficient is a measure of the linear relation between two variables. Therefore, even if two variables are very well correlated via some nonlinear relationship, e.g. Y = X2, the correlation coefficient may indicate a weak correlation.
-
Outliers can skew the correlation coefficient such that weakly correlated variables may seem to have strong correlation.
-
The correlation coefficient does not adjust for the averages of the two variables over the time window, i.e. it does not first make them zero-average by subtracting their respective averages before calculating the correlation coefficient.
Average correlation
For each field pairing, the average correlation is the average across all correlation coefficients for a given window called an "average history" window. The average history size is configured as one of the block properties.
Deviation from average
For each field pairing, the deviation from average is calculated by subtracting the current correlation from the previously calculated average for the field pair. It is the difference between the correlation of the current field pair and the previously calculated average correlation for that field pair over the average history window.
CORRELATION MATRIX BLOCK
Block Type
Statistical block
Input port
The input port takes any type, but string fields are ignored. In order for this block to run, the input port must be connected to a source with at least two integer or double fields.
Careful consideration should go into the number of input fields passed into the block. As the number of fields increases, the amount of memory and processing time required can quickly escalate.
Parameter port
Use of the parameter port is optional. If you select to trigger a block execute on value change when configuring the block properties, then you will need to use this parameter port. In such a case, executes are triggered only if the selected field value changes across good samples. Changes in quality alone do not trigger an execute. Bad quality samples are ignored.
Output ports
There are 3 output ports:
-
Correlations
-
This port outputs timestamps, qualities, field pair names, and correlation values of the top correlations of all field pairs in a configured data window.
-
-
Averages
-
This port outputs timestamps, qualities, field pair names, and average correlation values of the top average correlations of all field pairs for a configured average history size.
-
-
Deviations
-
This port outputs timestamps, qualities, field pair names, and deviation values of the top deviations from previous calculated average correlations of all field pairs in a given window.
-
For each output port, the name field is of type string and the value field is of type double.
Functions performed on fields
- On the input values - The output values produced are (where A, B, and C are user-configured integers):
-
- The top A (+ or -) of all field pairs in a given window
- You can also choose to limit the number of times a field may occur in the results by applying an occurrence limit. This limit can be configured as one of the block properties.
- The top A (+ or -) of all field pairs in a given window
-
- The top B (+ or -) average correlations of all field pairs for a given average history size
- You can also choose to restrict the number of times a field may occur in the results by applying an occurrence limit. This limit can be configured as one of the block properties.
- The top B (+ or -) average correlations of all field pairs for a given average history size
-
- The top C (+ or -) deviations from previous calculated average correlations of all field pairs in a given window
- You can also choose to restrict the number of times a field may occur in the results by applying an occurrence limit. This limit can be configured as one of the block properties.
- The top C (+ or -) deviations from previous calculated average correlations of all field pairs in a given window
- On the timestamps - The output timestamp is always set to the execute time.
- On the quality - The quality is set based on the quality threshold. The quality level is calculated as the number of samples for which both signals had good quality over the data window. This quality level is expressed as a percentage of the window size. If the quality level is less that the quality threshold, the output quality is set to bad, otherwise it is good.
Examples of limiting field occurrences for correlations
The examples that follow are based on the matrix below, illustrating correlation of 10 fields:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 | Field 6 | Field 7 | Field 8 | Field 9 | Field 10 | |
Field 1 | ||||||||||
Field 2 | 0.90 | |||||||||
Field 3 | 0.80 | 0.95 | ||||||||
Field 4 | 0.70 | -0.93 | 0.03 | |||||||
Field 5 | 0.05 | 0.85 | 0.04 | 0.01 | ||||||
Field 6 | 0.06 | 0.01 | 0.05 | 0.04 | 0.03 | |||||
Field 7 | 0.07 | 0.02 | 0.06 | 0.07 | 0.02 | 0.01 | ||||
Field 8 | 0.08 | 0.03 | 0.07 | 0.04 | 0.06 | 0.09 | 0.03 | |||
Field 9 | 0.03 | 0.04 | 0.02 | 0.08 | 0.09 | 0.05 | 0.06 | 0.02 | ||
Field 10 | 0.02 | 0.05 | 0.01 | 0.02 | 0.01 | 0.03 | 0.07 | 0.06 | 0.04 |
Note, in the following examples, that the sign of the correlation (+ve or -ve) does not affect the ordering. This is also true of average correlations and deviations.
Field names given are of the form <PREFIX>_<TOPNUMBER> where
- <PREFIX> is the corresponding prefix configured as one of the block properties.
- <TOPNUMBER> is the integer representing the position that the field pair has in the list of top results.
Correlations of all pairs in the given window (sorted as per maximum correlation value):
Field Name Field Value Quality
Corr_N_01 Field2_Field3 Good
Corr_V_01 0.95 Good
Corr_N_02 Field2_Field4 Good
Corr_V_02 -0.93 Good
Corr_N_03 Field1_Field2 Good
Corr_V_03 0.90 Good
Corr_N_04 Field2_Field5 Good
Corr_V_04 0.85 Good
Corr_N_05 Field1_Field3 Good
Corr_V_05 0.80 Good
Corr_N_06 Field1_Field4 Good
Corr_V_06 0.70 Good
[...] [...] [...]
Example 1
User Config
- Number of top correlation pairs required as outputs = 4
- Occurrence limit = 2
Resulting outputs
Field Name Field Value Quality
Corr_N_01 Field2_Field3 Good
Corr_V_01 0.95 Good
Corr_N_02 Field2_Field4 Good
Corr_V_02 -0.93 Good
Corr_N_03 Field1_Field3 Good
Corr_V_03 0.80 Good
Corr_N_04 Field1_Field4 Good
Corr_V_04 0.70 Good
Example 2
User Config
- Number of top correlation pairs required as outputs = 4
- Occurrence limit = 3
Resulting outputs
Field Name Field Value Quality
Corr_N_01 Field2_Field3 Good
Corr_V_01 0.95 Good
Corr_N_02 Field2_Field4 Good
Corr_V_02 -0.93 Good
Corr_N_03 Field1_Field2 Good
Corr_V_03 0.90 Good
Corr_N_04 Field1_Field3 Good
Corr_V_04 0.80 Good
Example 3
=User Config
- Number of top correlation pairs required as outputs = 2
- Occurrence limit = 1
Resulting outputs
Field Name Field Value Quality
Corr_N_01 Field2_Field3 Good
Corr_V_01 0.95 Good
Corr_N_02 Field1_Field4 Good
Corr_V_02 0.70 Good
Example 4
User Config
- Number of top correlation pairs required as outputs = 5
- Occurrence limit = 1
In this example, the top number specified is greater than the total number of unique pairs with the occurrence limit applied. For the excess fields requested, there is no field name, the field value is zero, and the block quality is set to bad.
Resulting outputs
Field Name Field Value Quality
Corr_N_01 Field2_Field3 Good
Corr_V_01 0.95 Good
Corr_N_02 Field1_Field4 Good
Corr_V_02 0.70 Good
Corr_N_03 Bad
Corr_V_03 0.0 Bad
Corr_N_04 Bad
Corr_V_04 0.0 Bad
Corr_N_05 Bad
Corr_V_05 0.0 Bad
Related topics: