What is NoSQL Database?
NOSQL Database generally refers as "Not Only SQL" or "non-Relational". NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.
Basic Concept and Technique Used in NoSQL
1. Distribution Model: How to distribute data when scaling out, basically two type:- Sharding and Replication.
2. Consistency: NoSQL databases use CAP theorem to define consistency.
3. Data Model: In NoSQL databases are basically categories on the basis how data is actually stored, like- Column wise, Document, Graph, Key/Value Store.
1. Distribution Model :
Because of their architecture differences, NoSQL databases differ on how they support the reading, writing, and distribution of data. Some NoSQL platforms like Cassandra support writes and reads on every node in a cluster and can replicate / synchronize data between many data centers and cloud providers.
1.1. Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data.
1.2. Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places.
Replication comes in two forms:
- Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads.
- Peer-to-peer replication allows writes to any node; the nodes coordinate to synchronize their copies of the data.
2. CAP Theorem :
In distribution system, managing
Consistency,
Availability,
Partition Tolerance exist in a mutually dependent relationship.
The CAP theorem which states that in any distributed system we can choose only two of consistency, availability or partition tolerance.
The relational database is in favour of consistency and availability.
-
C onsistency : All nodes have the same data at the same time.
-
A vailability : Every request gets a response on success/failure. Every Achieving availability in a distributed system requires that the system remains operational 100% of the time.
-
P artition Tolerance : System continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn't result in a failure of the entire network.
All the three combinations can be defined as:
CA – data should be consistent between all nodes. As long as all nodes are online, users can read/write from any node and be sure that the data is the same on all nodes.
CP – data is consistent between all nodes and maintains partition tolerance by becoming unavailable when a node goes down.
AP - nodes remain online even if they can’t communicate with each other and will re-sync data once the partition is resolved, but you aren’t guaranteed that all nodes will have the same data (either during or after the partition)
3. Data Model :
Types of NoSQL Databases
, There have been various approaches to classify NoSQL databases, each with different categories and subcategories.
Here is a basic classification by data model, with examples:
3.1.
Wide Column Stores:
A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements:
- Unique name: Used to reference the column
- Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
- Timestamp: The system timestamp used to determine the valid content.
Accumulo, Cassandra, Druid, HBase, Vertica are the example of Column database.
3.2.
Document Stores:
The central concept of a document store is the notion of a "document". While each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON as well as binary forms like BSON.
Apache CouchDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB are the example of Document database.
3.3.
Key-value Stores:
Key-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection.
Aerospike, CouchDB, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB are the example of Key-Value database.
3.4.
Graph Databases:
A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
AllegroGraph, InfiniteGraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog are the example of Graph database.