Topics
NoSQL

NoSQL Databases

A NoSQL database provides a mechanism for storing and retrieving data modeled differently from the linked table approach used in relational databases. In NoSQL databases, data is organized as collections (the equivalent of tables) which contain documents (the equivalent of records). Each document has keys (which are the equivalent of fields) with values (the same as values in a relational database).

Documents are identified by unique keys (equivalent to primary keys), but there is no ability to link data from one collection to data in another one. Thus, it is stored in a de-normalized form, which means there could be redundancy of data.

For example, if the sensor had to store the server it was connected to, the sensor document would have the entire server document embedded. If multiple sensors were linked to the server, each sensor would have the server document with its values. This simplifies the process of reading data since it is all in one document, and you don't have to fetch data from multiple collections. But it also introduces data redundancy, which means every copy of the server data must be updated with every server document alteration.

Both NoSQL and Relational databases have some advantages and disadvantages over others.

  • It is very easy to read data from NoSQL databases since a document holds all the data required for the entity, as against from relational databases where a set of tables need to be joined with complex queries to get all the data required. However, with storing data in this de-normalized form, maintaining data consistency can be a challenge.

  • Table structures in relational databases are fixed when created, and it is hard to introduce new fields. In NoSQL databases, it is easy to add or remove keys and values from documents. They are flexible to the extent that one collection can hold documents from different fields. It is very useful if you have data that is broadly similar but varies in some attributes. For example, sensors have a lot of attributes in common, but they may return different types of values; hence, each sensor document may have some variations in keys and values.

The choice of databases depends on the nature of the application. A thumb-rule approach suggests that NoSQL is more suitable for solutions where data is unstructured as there are infrequent updates. An RDBMS is better for solutions where data is structured and updated frequently. In many real-world business applications, both RDBMS and NoSQL are used in tandem to leverage the advantages of each type of database.

Three of the factors that are considered when evaluating the nature of a data management solution include Volume (how much data is expected in a specific time), Velocity (how frequently will data be received), and Variability (how consistent will the structure of each record be).

In an IoT solution, thousands of sensors may capture and send data every second, 24x7, and each sensor may send differently structured data depending on what is being measured. Once data from sensors has been captured, it rarely updates as there is usually no relationship between data from two sensors. It is just a sequence of very fast "writes" and then a series of "reads" by a data analysis solution. In such scenarios, the de-normalized structure of a NoSQL database works well.

On the other hand, a banking solution has relatively far fewer transactions, but records are updated very frequently, and there are many relationships between data sets (customers, accounts, products, and so on), and a critical need to maintain consistency. An RDBMS is a better solution for such a requirement.

There are many NoSQL databases, but one of the more popular ones is MongoDB. MongoDB is a cross-platform document-oriented database that provides high performance, high availability, and easy scalability. It has a paid enterprise version and a full-featured free community version, both for local installations.

MongoDB on the Cloud is named Atlas, a fully-managed database as a service hosted on the cloud. Atlas makes it easy to deploy, operate, and scale a MongoDB database in the cloud using a web-based administration console. Atlas is a paid service, but it offers a free tier for basic and limited usage. It is a great option to start learning to use a NoSQL database.

AWS also offers a cloud hosted NoSQL databases named DynamoDB.

The following terms are used across databases.

RDBMSDynamoDBMongoDB
TableTableCollection
RecordItemDocument
FieldAttributeField

While DynamoDB and MongoDB are both NoSQL databases, MongoDB uses JSON while DynamoDB used a JSON-like structure.