There is growing momentum and a rise in the use of non-relational databases in the last few years. These database engines are often called “NoSQL” databases. It’s more common these days to hear conversations about whether the use of a relational database or “NoSQL” database makes sense for a project.
Common things you will hear is how “NoSQL” databases are schema-less and using them to serialize objects removes a lot of the pain in comparison to using and maintaining a RDMS (relational database management systems).
There are 4 high level categories of “NoSQL” databases that I’m aware of which are ColumnFamily, Document, Graph and Key-Value databases. Within the 4 groups there are many options to choose from.
For example Cassandra and HBase (Hadoop) although both modeled based on Google BigTable and both in the ColumnFamily category are implemented very different and share some strengths and weaknesses but also have some that oppose each other.
The lack of schema you hear mentioned a lot in the “NoSQL” camps isn’t exactly accurate either. Wikipedia states that a Database schema is:
database management system (DBMS) and refers to the organization of data to create a blueprint of how a database will be constructed (divided into database tables).
If you look at the HBase DataModel documentation you will find quotes like:
A column family regroups data of a same nature in HBase and has no constraint on the type. The families are part of the table schema and stay the same for each row; what differs from rows to rows is that the column keys can be very sparse
In the Cassandra DataModel documentation you’ll find documentation describing Keyspaces, Column Families, Columns and Super Columns. Column and Super Column sorting behaviour can be defined as Bytes, Long, Ascii, UTF8, Lexical UUID and Time.
In my opinion this is database schema. Perhaps much less schema than a typical relational database but schema nonetheless.
If we back up to the concept that many people adopt which is serializing objects into a “NoSQL” database then it is true the use of schema is almost non-existent even in Cassandra because as your data model changes your schema wouldn’t change. It’s schema but one that doesn’t change.
Using ColumnFamily, Document, Graph and Key-Value databases and all their diverse implementations as an object serialization store is not using them to their strengths and ignoring their weaknesses. They are very different choices and they often solve a different set of problems although you can misuse them in the same manner. A perfect example is Facebook. Facebook was the initial developer who created Cassandra but Facebook today uses both Cassandra and HBase (along with other database solutions). They use them for their strengths and align them with a specific problem.
For some cases serializing objects to these “NoSQL” databases may make sense but it’s important to consider that as an option and not the only way to use these alternatives.
Based on this information I think the term “NoSQL” is doing all of the non-relational database options a disservice. The term “NoSQL” does help to argue with management that maybe a relational database is not the best option but that’s about where it’s usefulness ends.
Don’t treat them all the same. Learn what the advantages and disadvantages of each are and choose wisely.
- The 99th percentile matters
- Batching and pipelining linearizable operations in replicated logs
- Trick to reduce allocations improves response latency in Haywire
- Improving the protocol parsing performance in Redis
- Mencius and Fast Mencius a high performance replicated state machine for WANs
- Tuning Paxos for high-throughput with batching and pipelining
- Scalable Eventually Consistent Counters
- Create benchmarks and results that have value
- Routing aware master elections
- My new test lab
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- October 2014
- September 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010