49
edits
Changes
→Data Management
== Data Management ==
Data management refers to a set of operations that work on the data and are distributed between the stages of the data analytics pipeline. The data management flow is shown in the figure below. You start with your raw data and its acquisition. The first step is to transfer the out of memory data, the source could be from files, databases, or remote storage, into an in-memory representation.
Once it’s inside memory you can then prepare the data in many ways. DAAL offers support of various in-memory data formats such as an array of structures or compressed-sparse-row format, you can also convert data into a numeric representation, filter data and perform data normalization, compute various statistical metrics for numerical data such as the mean, variance, and covariance, and also compress and decompress the data.
The third step is to stream the in-memory numerical data to the algorithm
In complex usage scenarios the data ends up going through these three stages back and forth, so for example if your data isn’t fully available at the start of the computation it can be sent in chunks which is an advantage of DAAL I mentioned earlier.
*Raw Data Acquisition
*Data preperation
*Algorithim computation
[[File:ManagemenFlowDal.jpg|900px]] <br/>