NoSQL databases

Following the comparison between the NoSQL and SQL databasesIn the next section, we explain the first three types of NoSQL databases: key-value, documentary and columnar.

1. NoSQL key-value databases

General characteristics, advantages and disadvantages

This is the simplest and most flexible model, based on key-value pair structures. The key can be synthetic or self-generated, and can have different formats, but in all cases it has to be unique. However, in a partitioned model, data are divided into bucketsso that different buckets elements can contain the same key. This means that unique elements shall be identified by the tuple (bucket, key).

Values, on the other hand, will have a simple structure, being able to accept string, numeric, JSON or even more complex structures. Its use is based on the basic operations get (obtain data associated with a key), put (associating a value with a key) and delete (delete entry with a specific key).

Its main advantages are 3: simplicity, efficiency and flexibility, which allow for fast searches on reads to the whole database, as well as effective aggregation functions. On the other hand, simplicity will also mark its main disadvantages: lacking structure, it is not possible to launch queries by means of queriesand only consist of one collection, complicating the implementation of complex models.

Possible applications

  1. Web page caches, where the URL is the key and the content is the value.
  2. Operations logs, with the timestamp as key and content as value.

Main key-value databases

Riak KV (AP)

Licensed database open source (Apache) and Enterprise, designed for tracking session and user related information. It allows for lookup operations for specific values, use of secondary indexes and operations map-reduce. Allows automatic deletion of old data. With connectors for sparkapache mesos framework and integration with redis. Partitioning by sharding.

Redis (CP)

Licensed database open source and in-memory processing. Useful for caching, user sessions and message monitoring, but has additional modules for data processing, such as lookups, secondary indexes, transactions, or machine learning modeling. Partitioning by sharding and master-slave and multi-master replication systems. User-password based access control.

2. Documentary NoSQL databases

General characteristics, advantages and disadvantages

They are derived from key-value databases, but allow for a higher level of complexity through the use of metadata. The unit of data organisation is the document, which is composed of a series of key-value pairs whose value can take different formats. Each document has a unique ID to facilitate indexing methods, and often has a pre-defined schema, although this will be flexible. Since there is a pre-defined schema, the data is grouped into collections, which will usually have a similar schema.

In a SQL analogy, collections are equivalent to tables, and documents are equivalent to rows. Generally these databases will follow two types of structure: JSON and XML, with JSON being the most commonly used format.

The main advantage of these databases is their organisation. By having predefined structures, many vendors have implemented SQL-like languages for querying, in some cases even allowing the use of joins between collections. Moreover, thanks to the structured nature of the documents and the use of indexes, these databases respond well to queries, filtering and aggregation operations.

On the other hand, and especially when comparing these databases with SQL, the use of a flexible schema makes them prone to data entry errors, establishing the need to implement sanitisation and data cleansing procedures.

Possible applications

  1. Sensor data from different manufacturers
  2. Customer files with different characteristics
  3. Inventory catalogues of products for a shop or a factory

Main documentary NoSQL databases

MongoDB (CP)

One of the most widely used in the market. Open source with data storage in BSON (binary JSON) format. Allows secondary indexes, partitioning by sharding and replication via master-slave systems. New versions allow joins between collections, queries via queries and the use of 2 frameworks to operate in parallel: mapreduce y aggregation frameworks.

CouchDB (AP)

Open source which uses JSON natively, although it allows binary formats. Specialised in master-master replication on different devices and platforms, it has variants for web browsers (pouchDB) and iOS and Android systems (pouchDB).couchbase lite). Allows the use of HTTP protocols. Optimal for platforms that normally run off-line thanks to this replication system. Partitioned by sharding.

CouchBase (AP)

Derived from couchDB with integrated memcache, it is also a document database based on JSON files. It is defined as a engagement databasewhere high accessibility on different types of devices and apps is a priority. As with couchDB, it offers master-master replication as well as partitioning via sharding.

MarkLogic (CP)

Cross-platform database based mainly on an XML and JSON file system. Fee-based. Allows ACID transactions and the implementation of role-based security systems at document and sub-document level. Partitioning via sharding and allows the application of routines map-reduce.

3. Columnar NoSQL databases

General characteristics, advantages and disadvantages

Conceptually, they are the most similar model to SQL databases (along with document databases), as the data follows a row and column structure. However, unlike SQL, these NoSQL databases functionally group cells into columns, where each column is a tuple of values (corresponding to rows), whereas SQL organises its data in rows. Even if they were later extended to other NoSQL database formats, the routines map-reduce were designed on the basis of this type of databasesIf our consultations are based on this paradigm, this option will be the most optimal.

As advantages, columnar databases contain a conceptually simple, yet still flexible, schema that allows the use of SQL language for queries. Their columnar structure favours queries that require full table reads, such as data extraction and aggregation. Queries that are streamlined through the application of map-reduce routines. In addition, they allow the use of joinsThe new databases are more effective than SQL (although these databases are still not optimised for their use).

The main disadvantage is that they allow unstructured dataThe inconsistencies created are going to be problematic when performing operations and queries. In addition, they are generally designed as persistent databases that perform reads across the entire database, so they will not be optimised for real-time queries (although they can be very effective at establishing transactions given their similarity to SQL).

Possible applications

  1. Product catalogues with predefined characteristics
  2. Homogeneous sensor data with high sampling rates
  3. Messaging applications

Main columnar databases

Cassandra (AP)

Open source under Apache license. Its main asset is robust and flexible scalability, with continuous availability and robust object-level security. It allows the application of map-reduce routines and is easy to use using SQL-like languages.

Hbase (CP)

Open source under the Apache license, it runs under the HDFS infrastructure of the Hadoop. Like Cassandra, its strength is scalability, with sharding partitioning systems and replication models on regional servers. They also allow the use of map-reduce routines.



Tech & Data

You may be interested in

Take the leap

Contact us.