There is growing momentum and a rise in the use of non-relational databases in the last few years. These database engines are often called “NoSQL” databases. It’s more common these days to hear conversations about whether the use of a relational database or “NoSQL” database makes sense for a project.
Common things you will hear is how “NoSQL” databases are schema-less and using them to serialize objects removes a lot of the pain in comparison to using and maintaining a RDMS (relational database management systems).
There are 4 high level categories of “NoSQL” databases that I’m aware of which are ColumnFamily, Document, Graph and Key-Value databases. Within the 4 groups there are many options to choose from.
For example Cassandra and HBase (Hadoop) although both modeled based on Google BigTable and both in the ColumnFamily category are implemented very different and share some strengths and weaknesses but also have some that oppose each other.
The lack of schema you hear mentioned a lot in the “NoSQL” camps isn’t exactly accurate either. Wikipedia states that a Database schema is:
A database schema (pronounced skee-ma, /ˈski.mə/) of a database system is its structure described in a formal language supported by the database management system (DBMS) and refers to the organization of data to create a blueprint of how a database will be constructed (divided into database tables).
If you look at the HBase DataModel documentation you will find quotes like:
A column family regroups data of a same nature in HBase and has no constraint on the type. The families are part of the table schema and stay the same for each row; what differs from rows to rows is that the column keys can be very sparse
In the Cassandra DataModel documentation you’ll find documentation describing Keyspaces, Column Families, Columns and Super Columns. Column and Super Column sorting behaviour can be defined as Bytes, Long, Ascii, UTF8, Lexical UUID and Time.
In my opinion this is database schema. Perhaps much less schema than a typical relational database but schema nonetheless.
If we back up to the concept that many people adopt which is serializing objects into a “NoSQL” database then it is true the use of schema is almost non-existent even in Cassandra because as your data model changes your schema wouldn’t change. It’s schema but one that doesn’t change.
Using ColumnFamily, Document, Graph and Key-Value databases and all their diverse implementations as an object serialization store is not using them to their strengths and ignoring their weaknesses. They are very different choices and they often solve a different set of problems although you can misuse them in the same manner. A perfect example is Facebook. Facebook was the initial developer who created Cassandra but Facebook today uses both Cassandra and HBase (along with other database solutions). They use them for their strengths and align them with a specific problem.
For some cases serializing objects to these “NoSQL” databases may make sense but it’s important to consider that as an option and not the only way to use these alternatives.
Based on this information I think the term “NoSQL” is doing all of the non-relational database options a disservice. The term “NoSQL” does help to argue with management that maybe a relational database is not the best option but that’s about where it’s usefulness ends.
Don’t treat them all the same. Learn what the advantages and disadvantages of each are and choose wisely.