In 2018, we saw a boom in the number of IoT devices and their use-cases. We saw an increase in the use of IoT and sensors in diversified contexts such as smart factories and smart cities. The total number of IoT devices is moving towards 20 billion by the year 2020. Every second, another 127 devices are connected to the internet. All these different devices and sensors generate massive amounts of data which can give us many interesting insights.
The majority of data that is generated is time-stamped, and we call this kind of data time-series data. If you take an abstract look at the two types of time-series data generated, you can categorise them into roughly two types:
There are some other characteristics that separate the two use-cases — projects that fall under the first category tend to be internal IT services and the data in the database or monitoring are often not mission-critical. The projects that fall in the latter category are often mission-critical, transformative, and enable real-time data-driven decision systems.
Now that we know what time-series data is and what the general use cases are let’s talk a bit more about industrial sensor data.
Traditional relational databases are not well equipped to handle this type and amount of data. To give you an idea of the scale of such data and how it differs from the other category we discussed, let me tell you the story of a company called ALPLA. They produce items most readers will have in their hands at least once a day. ALPLA produces plastic packaging for leading brands.
They needed 900 different tables, one for each of the 900 different sensor data structures. Each production line generated thousands of readings per second. Trying to query streams of that volume of sensor data in real-time and running complex machine learning analytics on terabytes of historical data was outside of SQL Server’s sweet spot. Charts in the mission control dashboards were taking three to five minutes each to update, which was prohibitively slow since each dashboard contained a dozen or more charts.
For monitoring with the help of real-time operational insights, they needed a different database solution to handle this kind of data at the required scale while being highly available.
The key characteristics we are looking for in such use cases are horizontal scaling and self-healing clusters. Perhaps even more critical is linear scalability. CrateDB is a new kind of distributed SQL database that is extremely adept at handling industrial time series data due to its ease of use and ability to work with many terabytes of time series data with thousands of sensor data structures.
CrateDB operates in a shared-nothing architecture as a cluster of identically configured servers (nodes). The nodes coordinate seamlessly with each other, and the execution of write and query operations are automatically distributed across the nodes in the cluster.
Increasing or decreasing database capacity is a simple task of adding or removing nodes. Sharding, replication (for fault tolerance), and rebalancing of data as the cluster changes size are automated.
Installing and running CrateDB doesn’t take a lot of time.
Don’t take my word for it, try it out!
You can follow the installation steps in our documentation. Once you have CrateDB up and running, you can access the Admin UI at http://localhost:4200. Import test data from the “Help” tab and query it to get a feel of how CrateDB works.
Wanna play around more?
How about simulating the Curiosity rover with a Raspberry Pi and a couple of sensors and sending the sensor data to CrateDB? NASA has provided an excellent guide on how you can create your own Curiosity rover. You can add various sensors for capturing details like temperature, GPS, photos, humidity, air pressure and so on. You can then visualise the data in a dashboard using either NASA Open MCT or Grafana.
You can also use already existing time-series data such as the NYC cabs data which has ~4 million records.
If you have any questions, the folks at Crate.io would be delighted to help you.
With the increase in the number of IoT devices worldwide and the data generated by them, it is essential this machine data is put to use to create actionable insights. The use cases of such time-series data are limitless.
CrateDB Cloud is a scalable SQL cloud service hosted on Azure and operated 24 x 7 by the experts at Crate.io. It is ideal for industrial time series data processing and other IoT and machine data analytic workloads.
CrateDB Cloud can be connected to Azure Machine Learning Studio for predictive analytics. You can also use CrateDB Event Hubs Connector to allow users to route data from Azure IoT Hub or Azure Event Hubs directly to CrateDB. This makes it even easier to integrate and analyse IoT data in real time to monitor, predict, or control the behaviour of smart systems.
Give me a shout out on Twitter! I’d be excited to learn how you are currently dealing with your machine data and how you plan to use CrateDB.
Learn to build for the Amazon Echo, Google Home, Facebook Messenger, Slack & more!