Data abstraction
IoT devices generate mountains of data that must be captured, aggregated, and processed by analytic systems. Preprocessing of IoT-collected data often occurs at the edge, where an initial filter is applied leaving only filtered data to be passed to a data analytic system in the fog or in the cloud.
Preprocessing also includes the classification of data objects. Classification can be done based on the types and/or sensitivities of the data. Metadata is added, which includes tags that represent the security sensitivity and other attributes of the data or the sources that collected the data. For example, any sensitive data that requires confidentiality protections should be tagged as such. At this stage, both data and metadata should be digitally signed.
Data is cleaned and de-duplicated next. The cleansing process includes corrections that must be made based on bad data. Clean data is then input into data models where it can be produced into products and visualizations.
A key consideration within the data life cycle is the need for data lineage assurance. Data lineage tracks the origin of data and the transformations and actions that were applied to that data over time. Data lineage tools can visually represent data flows and movements across a system. There are a number of data lineage tools on the market today. Apache Falcon is an open source data lineage tool that can be applied to IoT systems. You can learn more about Apache Falcon here: https://falcon.apache.org/.