NoSQL, a means to an end
NoSQL has left its technological-only position far time ago, and with no doubt has become the new black on computing World. It is uncommon read an article or hear about a success case that had not taken advantage from its miraculous architecture, what lead developers to the unequivocal sensation of being out of fashion for not being able wear it every day.
I have been hearing about NoSQL for a long time and a unexplained marketing explosion has lead it to conquer huge spaces on several channels like web sites, conferences, online courses, nerd’s happy hours, huge companies and open source communities. All this noise is quite normal; however, when I first heard “common” people talking about it like it was a dress for the next season an alert came up on my mind. Like the business or financial market, NoSQL has stuck in such emotional way on so much customers and managers that it had become a synonym for data mining, data scienceand all those fancy vocabulary for the big data World.
It is not uncommon hear from speaker on conferences a regarding that NoSQL does not means Not SQL, hence should not compete against relational databases. However, who cares about this first notes after hours of constant speaking showing all NoSQL cool features not supported by most part of the old-fashioneddatabase management system?
All that was said just to make explicitly the understanding that NoSQL means Not Only SQL, and by “Not Only” we intend to affirm that you probably will not be able to provide a final and appropriated solution to your customer by only using NoSQL databases. Putting black on white, do not throw relational databases away, they were, and still are the main piece for your data architecture. Probably another common saying truth is that several features provided by NoSQL database are achieved using the relational ones, even though that means avoid some usual directions like normalization, triggers, foreign keys and constraints.
Well, you might be asking if relational databases are so high while NoSQL are not that much, why are several smart people and most top companies are using it so much? I believe the answer is quite simple, NoSQL databases provide better productivity for some common nowadays problems, and by productivity I meant to say the NoSQL products leads the developer to model data in such a way that would be harder to achieve having Set Theorems on mind.
Abraham Maslow’s quote “if all you have is a hammer, everything looks like a nail” fits very well here, maybe the statement “you can nail using a rock, however its easier do it with a hammer” would fit even better. In other words, exist many common problems that are easiest solved through NoSQL usage, even though, most of them can be solved by using a relational database too.
I do not believe there is a silver bulletrule to help developers decide whether a NoSQL database shall be used or not, and I am probably not the most expert one to provide it. Anyway, I will share some data architecture requirements that might point to a NoSQL usage:
- Data being stored will not be queried later using joining operations,
- data being stored has a free or non-strict type format,
- vertical scaling will not be an alternative due to cost restrictions,
- the volume of data is intensive, i.e., at least dozens of gigabytes are on the table,
- partitions or shards will be used to distribute data evenly,
- achieve huge throughput on write operations is a strong requirement,
- non-geographical redundancy is most likely to happen,
- Parallel computing or MapReduce pattern is the only way to handle all data.
On the other hand, some requirements that would probably push you away from NoSQL World are as following:
- Data being stored has valuable and straight relation with each other,
- data storage format bound the rules for system integration,
- data has a strict format modeled by business logic,
- data integrity is a must,
- data constraints cannot relies only on application validation.
Remembering that, using a NoSQL or Relational database is not an exclusive operation and both of them shall apply to a common solution, the key is to identify what set of data fits better one or another. Since the existence of several NoSQL products may confuse the developer when deciding to pick one of them, a summarized table pointing the pros and cons of several features for a list of most top used products is exhibited on both table below.
|Table 1 – NoSQL Ecosystem|
|Table 2 – NoSQL Main Features|
When reading the table it is important to keep in mind that all figures represents a comparative between all products and not an isolated analysis on each one of them. Therefore, installing Apache Cassandra is simpler than installing the Apache HBase; on the same way, monitoring a Hazelcast cluster is easier, richer and intuitive than monitoring a MongoDB cluster. Once I was looking for a free NoSQL product, Hazelcastand Neo4J commercial versions where not experimented at all; if you have an extra cash on the pocket give them a try.
As previously said, it may exist some other important features that were not explicitly exposed on the tables above. I believe the ecosystem comparison is important because relational database has a more mature and richer one when compared to its NoSQL counterparts; remember that manage billions of data without an appropriate tool can transform customer (and support team) life on a big pain.
Performance metrics are always a very hard topic, like economy it requires to much indicators that can be perceived by different perspectives; so my tests considered times spent on read and write operations yet I do not forget to monitor CPU usage by using iostat. Once Hazelcast and Redis are in memory database, memory usage was not raised; network configuration would be a good metric to talk about, however I do not have that much server at home, so measure network I/O on virtual machines would not bring great results after all.
This article is the last one from a list of six, I hope they all can enlighten a bit more about the NoSQL World. For more detail information take an extra look on each product web site, they all have very good documentation. In addition, there are a plenty of free course material like the ones available at https://university.mongodb.com/ that deserves a bit of your attention. Thanks.