Feb
12

The generalization of “NoSQL”

By kellabyte  //  Databases  //  7 Comments

There is growing momentum and a rise in the use of non-relational databases in the last few years. These database engines are often called “NoSQL” databases. It’s more common these days to hear conversations about whether the use of a relational database or “NoSQL” database makes sense for a project.

Common things you will hear is how “NoSQL” databases are schema-less and using them to serialize objects removes a lot of the pain in comparison to using and maintaining a RDMS (relational database management systems).

There are 4 high level categories of “NoSQL” databases that I’m aware of which are ColumnFamily, Document, Graph and Key-Value databases. Within the 4 groups there are many options to choose from.

For example Cassandra and HBase (Hadoop) although both modeled based on Google BigTable and both in the ColumnFamily category are implemented very different and share some strengths and weaknesses but also have some that oppose each other.

The lack of schema you hear mentioned a lot in the “NoSQL” camps isn’t exactly accurate either. Wikipedia states that a Database schema is:

A database schema (pronounced skee-ma, /ˈski.mə/) of a database system is its structure described in a formal language supported by the database management system (DBMS) and refers to the organization of data to create a blueprint of how a database will be constructed (divided into database tables).

If you look at the HBase DataModel documentation you will find quotes like:

A column family regroups data of a same nature in HBase and has no constraint on the type. The families are part of the table schema and stay the same for each row; what differs from rows to rows is that the column keys can be very sparse

In the Cassandra DataModel documentation you’ll find documentation describing Keyspaces, Column Families, Columns and Super Columns. Column and Super Column sorting behaviour can be defined as Bytes, Long, Ascii, UTF8, Lexical UUID and Time.

In my opinion this is database schema. Perhaps much less schema than a typical relational database but schema nonetheless.

If we back up to the concept that many people adopt which is serializing objects into a “NoSQL” database then it is true the use of schema is almost non-existent even in Cassandra because as your data model changes your schema wouldn’t change. It’s schema but one that doesn’t change.

Using ColumnFamily, Document, Graph and Key-Value databases and all their diverse implementations as an object serialization store is not using them to their strengths and ignoring their weaknesses. They are very different choices and they often solve a different set of problems although you can misuse them in the same manner. A perfect example is Facebook. Facebook was the initial developer who created Cassandra but Facebook today uses both Cassandra and HBase (along with other database solutions). They use them for their strengths and align them with a specific problem.

For some cases serializing objects to these “NoSQL” databases may make sense but it’s important to consider that as an option and not the only way to use these alternatives.

Based on this information I think the term “NoSQL” is doing all of the non-relational database options a disservice. The term “NoSQL” does help to argue with management that maybe a relational database is not the best option but that’s about where it’s usefulness ends.

Don’t treat them all the same. Learn what the advantages and disadvantages of each are and choose wisely.

  • http://unapologetic.wordpress.com/ John Armstrong

    Just as a nitpick, I’d put column-family databases like HBase, BigTable, and — the new hotness — Accumulo as special cases of key-value databases. The difference being that the keys have some pre-defined structure that helps with indexing.

    Anyway, k-v databases do come with a little bit of structure, as you point out, but there’s usually no way to enforce it at the database level. I can set up an Accumulo table to save and serve web-mapping raster tiles, but there’s nothing to stop me from throwing in another kind of data entirely. I think that’s what people really mean by saying that these kinds of databases have “no schema”.

    As for object serialization, there are ways to make it work, but they’re all sorta primitive, and referential coherence is incredibly hairy. Relational databases and have well-honed strategies for dealing with this — it’s half of what O/RM is about — but it’s just not there yet for non-relational databases.

  • http://typecastexception.com John Atten

    Nice post.

    I agree that any system by which you formally organize your data represents a schema of some sort. It just goes to show how entrenched we have become in the terminology of RDBMS land that when we consider a schema structure based on anything other than entities, relationships, and ACID principles we consider it to be “schemaless”

    I am surprised, actually, at your focus on this aspect of “NoSQL” db’s. I am one of the faceless many who lurk on your twitter account, seeking to absorb the tidbits you toss out there, and I would have expected this post to focus as much on differences in CAP (Consistency, Availability, Partition tolerance) priorities between standard relational Db’s and NoSQL implementations, and the need for scale vs. immediate consistency.

    I am just getting started on this weird “NoSQL” road (and really looking for a new name for it), looking at RavenDb, Redis, and Mongo (nothing I might do will require the scope or complexity of Cassandra or Hadoop – yet!). It flies in the face of everything I have learned about data. Like most, I think the relational model is so built into my thinking that the biggest hurdle is getting around that.

    Keep writing. Love your posts . . .

  • James Tryand

    Hangonamoment.
    Have you seen the work that Erik Meijer did on this to produce the CoSql langauge?

    http://channel9.msdn.com/Forums/Coffeehouse/LINK-Slides-from-Erik-Meijers-NoSQL-is-CoSQL-talk

    http://queue.acm.org/detail.cfm?id=1961297 : “CONTRARY TO POPULAR BELIEF, SQL AND NOSQL ARE REALLY JUST TWO SIDES OF THE SAME COIN.” by Erik Meijer and Gavin Bierman, Microsoft

  • Chris

    I do hate the term “NoSQL”, it is completely non-descriptive. I could claim that using Extensible Storage Engine (ESENT) is”NoSQL” but I doubt any of the proponents for “NoSQ” would count that as a “NoSQL” database.

  • http://murrayon.net/ Mike Murray

    Many people view the expansion of the acronym NoSQL to be “Not Only SQL” instead of the commonly assumed “No SQL” or relational DB’s at all. http://en.wikipedia.org/wiki/NoSQL

  • Andre Calil

    As John Atten pointed out, I’m also surprised that you only referenced the schema-less aspect. That are many other key-points to call a “storage server” as e NoSQL DB, like cluster-based.

    I’d recommend you to take a look at Fowler’s blog, he’s talking about NoSQL these days. Specially here http://martinfowler.com/bliki/NosqlDefinition.html.

    Regards

  • Curtis Maloney

    It annoys me that the NoSQL project (who produced a relational DBMS with a better query language, not SQL) had their name hijacked by people who almost exclusively mean “Non-Relational”.
    But I guess that boat has sailed, and there’s little to no chance of getting people to say “NonRel” instead…