Overview on extracting data from a plant historian

Using a Nonlinear model block and training it on plant measurements, we are able to build up a process model which represents the real plant unit on a real time basis. All of the simulation historical data is typically stored in a data base on the plant (the plant historian) so that it may be recalled at any time for engineering analysis and trending purposes. This section gives the format in which the historical data should be that you extract from your plant historian.

File format

The file type can be any type of delimited file such as, for example, .csv (comma separated values) or .txt (text file with TAB as the delimiter)
When deciding how much data to train the model on, bear in mind that the training data set that is selected should include as much of the operating range of the system as is possible. The data should not be too old due to the fact that the system frequently moves into new operating ranges due to the gradual deterioration of the equipment. If the system has undergone any major modifications, it is not advisable to use data from the period prior to these changes.
In many cases the historical data extract utility is such that only relatively small quantities of data may be extracted from the historian at a time. An example is where the extract utility extracts the data from the historian into an Excel worksheet, in this case you should note that there is a limit to the amount of data that an Excel worksheet can accommodate (65536 records (i.e. rows), for at most 256 tags (columns)). Due to the fact that the offline trainer requires a single file to train the model on and that a relatively large data set may be required to adequately characterize the system, it may be necessary to extract several files, save them as .csv files (or some other delimited file type) and then concatenate them. >> More on concatenating files

When selecting the sample frequency (i.e. the time interval at which the data must be extracted from the historian), ensure that the sample period is sufficiently small so that the dynamics of all the signals that are to be used is captured.
The first row should preferably contain headings (providing the tag name for each column of data), for reference purposes
The first column should contain the timestamp information, although it is not absolutely necessary as the text source block is able to generate timestamps for each record at user defined intervals for model training and simulation purposes. For reference purposes though (to ensure that the data is sequential following concatenation) it is best to include timestamp information.
The subsequent columns should contain the value and, if available, their associated quality fields.

diagram that illustrates the optimal file format

Terminology

Delimited - This refers to the separation of data elements in a text file by a character or combination of characters. The character that separates the elements is called the delimiter.
Delimiter - A character or string used to separate, or mark the start and end of, items of data in, for example, a database, source code, or text file. The comma is the delimiter in a .csv (Comma Separated Values) file, whilst the tab key is the delimiter in a text (.txt) file
Concatenate - to link together in a series or chain, (files are basically appended, in order of increasing time, to the file containing the start date)

Return to top