N.World

NoSQL 3 databases: graphs, time series and content repositories.

After explaining the general characteristics of NoSQL databases and the key-value, documentary and columnar typesIn this third installment of NoSQL databases, we look at graph types, time series and content repositories.

4. Graphene

General characteristics, advantages and disadvantages

These databases arose from the need to apply a different point of view to data analysis, in which the connectivity between objects was more important than the objects themselves. Under this premise, it is logical that the main promoters of these databases were companies based on this type of analysis, such as social networks. Thus, these databases are based on the definition of nodes and links, where nodes are the objects and links are the connections, with the important characteristic that the predominant relationships are of the many-to-many type. It is here where graph databases have a great advantage over SQL databases, since the latter would require a large number of inner joins that would multiply query times, while through networks these queries occur in real time.

Two types of network structures are generally distinguished: hierarchical and network, and it is important to emphasise that, although they are a separate type of NoSQL, these databases are often structured at a low level as document, object or mixed NoSQL.

The main advantages are based on this connectivity, since, although their areas of use are very small, in those cases where relationships are important, these databases will outperform any other type of noSQL. Moreover, given this specific area of use, they often have their own user interfaces and query languages focused on these connectivity analyses.

On the other hand, their advantages become in some cases their disadvantages. The main problem is their specific scope of use, being ineffective to implement for most real-life use cases (on the other hand, they have appeared to cover a specific need and nobody expects them to have other uses). In addition, the development of non-unified (and often different from SQL) proprietary languages poses a problem in terms of usability and versatility, as they require a rather slow learning curve.

Possible applications

  1. Social media
  2. Calculation of fast and optimal routes
  3. Transport route planning

Main graph databases

Neo4j (AC)

Open source under GPL licence, although with options for paid Enterprise versions. Designed for data analysis, it offers the possibility of connectivity to multiple analytics and BI platformsas well as its integration in different platforms and languages through APIs. Proprietary language, optimised for connectivity queries. Allows ACID transactions and the definition of user roles through standardised external solutions (e.g. Active directory or Kerberos). On the other hand, it is not partition tolerant.

InfiniteGraph (AP)

Commercially licensed DB. It combines an object-based DB (Objectivity DB) with a Spark environment based on HDFS to offer graph functionalities in a distributed environment. Combinable with Python and Spark tools (SparkML, SparkSQL, GraphX...) as well as with proprietary applications (REST API).

Time series

General characteristics, advantages and disadvantages

As in the case of graphs, these databases emerged to meet a specific need: the storage of time series and queries in real time. As a result, these databases are optimised for high speed reading and writing in real time, as well as for performing filtering and aggregation operations based on time stamps. They are generally composed of a time stamp and a pair of pairs. key-value representing each of the dependent variables or factors, where the time stamp functions as a key and will be treated as an ordinal discrete element. As was the case with graphs, these databases are often structured at a low level as SQL or NoSQL documentaries.

Their main advantage is that they allow real-time queries and that, to a greater or lesser degree, they all use a language similar to SQL. Moreover, because they are optimised for a specific type of data, they often take up less space than non-SQL ones when storing the same data. Moreover, given their specific use, almost all of them are already associated with visualisation software and processing and aggregation applications integrated in the system itself. On the other hand, they have a major disadvantage: they do not support unstructured data, and it must be stressed that, for simple temporal structures, key-value systems are possibly more efficient.

Possible applications

  1. Real-time system performance monitoring
  2. Streaming analysis
  3. Real-time sensor monitoring

Main time series DBs

InfluxDB

Integrated in the InfluxData platform, they allow the integration of different data ingestion and visualisation tools (e.g. Chronograf or Grafana). Designed primarily for monitoring performance data through Telegraf, but has external connectors for more general uses. Free non-scalable and Enterprise version offering scalability and support.

RRDtool

Open source licensed database. Allows the integration of scripts in different programming languages and has its own integrated platform for graphical visualisation. The way of storing data is circular, which means that when it reaches its maximum capacity, it overwrites previous data. Not scalable.

Kdb+

Belonging to KX, they allow the integration of the DB with their own visualisation software. Master-slave replication systems and access controls via user accounts.

Content repositories

General characteristics, advantages and disadvantages

Very specific type of NoSQL, specialised in the storage of heterogeneous content and non-textual formats, such as digital image and video files. Their possibilities of use are very limited, and they often work in association with another DB to which they complement. On the other hand, it is important that they are optimised for the storage and use of files, such as digital data or product versions, and allow multiple readings at the same time. Given their limited use, they often contain user-friendly interfaces for non-programmers.

Main content repositories

Apache Jackrabbit:

Open data repository under Apache license. Created to work with the Java Technology API (JCR) and store content in a scalable way.

Modeshape

Like Jackrabbit, it is an open repository that complements JCR.

In the last issue we will discuss the Databases in the cloud.

 

Bosonit

Bosonit

Tech & Data

You may be interested in

Take the leap
technological.

Contact us.