49
edits
Changes
→Data Management
*Data preperation
*Algorithim computation
<br/>
Data management refers to a set of operations that work on the data and are distributed between the stages of the data analytics pipeline. The data management flow is shown in the figure below. You start with your raw data and its acquisition. The first step is to transfer the out of memory data, the source could be from files, databases, or remote storage, into an in-memory representation.
Once it’s inside memory you can then prepare the data in many ways. DAAL offers support of various in-memory data formats such as an array of structures or compressed-sparse-row format, you can also convert data into a numeric representation, filter data and perform data normalization, compute various statistical metrics for numerical data such as the mean, variance, and covariance, and also compress and decompress the data.
The third step is to stream the in-memory numerical data to the algorithm
In complex usage scenarios the data ends up going through these three stages back and forth, so for example if your data isn’t fully available at the start of the computation it can be sent in chunks which is an advantage of DAAL I mentioned earlier.
[[File:ManagemenFlowDal.jpg|900px]] <br/>
<br/>
This is a tabular view of data where the columns represent features, these are properties or qualities of a real object or event, and the table rows represent observations which are feature vectors that are used to encode information for a real object or event. This is used in our machine learning regression example.
The data set is used across all stages of the data analytics pipeline, during acquisition it’s downloaded into local memory, during preparation it’s converted to a numerical representation, and during computation it’s used with an algorithm as an input or result.<br/>
[[File:DataSet.jpg]]<br/>