Data accumulation
Data collected from sensors may be stored as raw data at the edge and aggregated in storage within edge databases and the cloud. Data can exist in a variety of formats including text files, spreadsheets, log files, and of course in relational and NoSQL databases. Tools such as REST, WebSockets, XML, and JSON can be used for remote data acquisition. When designing the security architecture at this layer, consider how to validate the source of data, whether malicious data has been injected into data streams, and whether data has been tampered with at any point in the life cycle.
CSPs offer data services within their IoT service offerings. For example, AWS supports configuration of IoT devices to offload data to IoT-specific gateways. Data can also be ingested into AWS through platforms such as Kinesis or Kinesis Firehose. Kinesis Firehose, for example, can be used to collect and process large streams of data and forward on to other AWS infrastructure components for storage and analysis.
Once data has been collected within a CSP, logic rules can be set up to forward that data where most appropriate. Data can be sent for analysis, storage, or to be combined with other data from other devices and systems. Reasons for the analysis of IoT data run the gamut from wanting to understand trends in shopping patterns (for example, beacons) to predicting whether a machine will break down (predictive maintenance):
Software as a Service (SaaS) providers also offer analytic services for the IoT. For example, https://www.salesforce.com/in/?ir=1 has designed a tailored IoT analytic solution. Salesforce makes use of the Apache stack to connect devices to the cloud and analyze their large data streams. The Salesforce IoT cloud relies on the Apache Cassandra database, the Spark data-processing engine, Storm for data analysis, and Kafka for messaging.
An example of the immense data collection from IoT devices is the proliferation of small Unmanned Aerial Systems (sUAS)—or drones—that provide an aerial platform for deploying data-rich airborne sensors. Today, three-dimensional terrain mapping is performed by inexpensive drones that collect high-resolution images and associated metadata (location, camera information, and so on) and transfer it to powerful backend systems for photogrammetric processing and digital model generation. The processing of these datasets is too computationally intensive to perform directly on a drone that faces unavoidable size, weight, and power constraints. It must be done in backend systems and servers. These uses will continue to grow, especially as countries around the world safely integrate unmanned aircraft into their national airspace systems.