Data and AI Knowledge: October 2016

Friday, 28 October 2016

NoSQL Vs RDBMS

Relational Database Management System (RDBMS)

RDBMS Database is a relational database. It is the standard language for relational database management systems.Data is stored in the form of rows and columns in RDBMS. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, etc. All these RDBMS follows ACID property.

NoSQL

NoSQL commonly referred to as “Not Only SQL”. With NoSQL, unstructured ,schema less data can be stored in multiple collections and nodes and it does not require fixed table sachems, it supports limited join queries , and we scale it horizontally. Some common NoSQL databases are: MongoDB, Cassandra DB, Neo4j etc and all these NoSQL databases follows CAP theorem

How NoSQL Differs from Relational Databases

NoSQL and RDBMS’s are designed to support different application requirements and typically they co-exist in most enterprises.

Where to use RDBMS and NoSQL:

Now main decision points on when to use RDBMS/NoSQL which include the following:

NoSQL Database Overview

What is NoSQL Database?
NOSQL Database generally refers as "Not Only SQL" or "non-Relational". NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.

Basic Concept and Technique Used in NoSQL

1. Distribution Model: How to distribute data when scaling out, basically two type:- Sharding and Replication.
2. Consistency: NoSQL databases use CAP theorem to define consistency.
3. Data Model: In NoSQL databases are basically categories on the basis how data is actually stored, like- Column wise, Document, Graph, Key/Value Store.

1. Distribution Model :
Because of their architecture differences, NoSQL databases differ on how they support the reading, writing, and distribution of data. Some NoSQL platforms like Cassandra support writes and reads on every node in a cluster and can replicate / synchronize data between many data centers and cloud providers.
1.1. Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data.
1.2. Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places.
Replication comes in two forms:
- Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads.
- Peer-to-peer replication allows writes to any node; the nodes coordinate to synchronize their copies of the data.

2. CAP Theorem :
In distribution system, managing Consistency, Availability, Partition Tolerance exist in a mutually dependent relationship. The CAP theorem which states that in any distributed system we can choose only two of consistency, availability or partition tolerance. The relational database is in favour of consistency and availability.

- C onsistency : All nodes have the same data at the same time.
- A vailability : Every request gets a response on success/failure. Every Achieving availability in a distributed system requires that the system remains operational 100% of the time.
- P artition Tolerance : System continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn't result in a failure of the entire network.

All the three combinations can be defined as:
CA – data should be consistent between all nodes. As long as all nodes are online, users can read/write from any node and be sure that the data is the same on all nodes.
CP – data is consistent between all nodes and maintains partition tolerance by becoming unavailable when a node goes down.
AP - nodes remain online even if they can’t communicate with each other and will re-sync data once the partition is resolved, but you aren’t guaranteed that all nodes will have the same data (either during or after the partition)

3. Data Model :
Types of NoSQL Databases
, There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Here is a basic classification by data model, with examples:

3.1. Wide Column Stores:
A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements:
- Unique name: Used to reference the column
- Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
- Timestamp: The system timestamp used to determine the valid content.
Accumulo, Cassandra, Druid, HBase, Vertica are the example of Column database.

3.2. Document Stores:
The central concept of a document store is the notion of a "document". While each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON as well as binary forms like BSON.
Apache CouchDB, Clusterpoint, Couchbase, DocumentDB, HyperDex, Lotus Notes, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB are the example of Document database.

3.3. Key-value Stores:
Key-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection.
Aerospike, CouchDB, Dynamo, FairCom c-treeACE, FoundationDB, HyperDex, MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Redis, Riak, Berkeley DB are the example of Key-Value database.

3.4. Graph Databases:
A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
AllegroGraph, InfiniteGraph, MarkLogic, Neo4J, OrientDB, Virtuoso, Stardog are the example of Graph database.

Tuesday, 11 October 2016

MSBI Introduction

Microsoft Business Intelligence Introduction

BI stands for Business Intelligence, it's a set of techniques which helps in gathering, storing, analysing and come up with best information from huge/big data which helps to business decision making.
In other words, BI is Integration of data from different data sources, then Analyse the Integrated data for BI engagement and Representation of the analysed result in the form of reports, dashboards, Charts etc. which helps to understand the data easily.

MSBI stands for Microsoft Business Intelligence, The MSBI suite is composed of tools which helps in providing best solutions for Business Intelligence and Data Mining Queries. This tool uses Visual studio along with MS SQL server. It empower users to gain access to accurate and up-to-date information for better decision making in an organization. It offers different tools for different processes which are required in Business Intelligence (BI) solutions.

MSBI is divided into 3 categories:-

1. SSIS – SQL Server Integration Services – Integration tool
2. SSAS – SQL Server Analytical Services – Analysis tool
3. SSRS – SQL Server Reporting Services – Reporting tool

Microsoft Business Intelligence (MSBI) Architecture

Example: Lets take an example to understand how BI beneficial for an organization: Let’s understand above picture by taking an example of an organization. Let’s take the example :- Adidas. We have outlets of Adidas in most parts of India. Every outlet stores their customer data in their respective database and it’s not mandatory that every outlet is using the same database. Some outlets may have Access database, some might be using Excel sheet or some stores prefer storing their data in simple text files, oracle etc.

Before proceeding ahead with our explanation, we should know “what OLTP is”? It stands for Online Transaction Processing. These are basically the online transactions which is heavy DML operation (Insert, Update, Delete) is performed on database at every Adidas outlet.

Phase 1- “SSIS” After storing daily data of the customers who visited Adidas outlet at different stores, the data is then integrated and saved it in a centralized database. This is done with the help of OLTP component of MS SQL Server. Integration means merging of data from different data stores (i.e. it may be a flat file, Excel sheets, Access, Oracle, etc.), refreshing data in data warehouses and to cleanse data (e.g. -Date format may be different for different outlet’s database, so same format is made to make it even) before loading to remove errors. Now, you must be clear with the Integration concept.

Phase 2- “SSAS” Next step is to analyze the stored centralized data. This huge data is then divided into Data Marts on which analytic process is carried on. Analysis services use OLAP (Online Analytical Processing) component and data mining capabilities. It allows to build multi-dimensional structures called CUBES to pre calculate and store complex aggregations, and also to build mining models to perform data analysis which helps to identify valuable information like what are the recent trends, patterns, dislikes of customers. Business analyst then perform data mining function on multi dimensional cube structure to look data from different perspectives. Multi Dimensional analysis of huge data completes.

Phase 3- “SSRS” Now, represent analysis graphically so that an organization (Adidas) can make effective decision to enhance their revenue, gain maximum profit and to reduce time wastage. So, this is done in forms of Reports, Scorecards, Dashboards, Excel workbooks, etc. This data reports will tell the organization what is the revenue of Adidas in specific time at specific place, where they captured the market, in which part they are lacking and needs to boost up and many other things which end users wish to look into. This reporting is done with a tool SQL Server Reporting Services and completes.

SQL Server : How to Change Service Account of SSRS

Change Service Account for SSRS 2012 (SQL Server Reporting Services)

To view and reconfigure service account information, always use the Reporting Services Configuration Manager. Service identity information is stored internally in multiple locations. Using the tool ensures that all references are updated accordingly whenever you change the account or password. The Reporting Services Configuration Manager performs the following additional steps to ensure the report server remains available:

- Automatically adds the new account to the report server group created on the local computer. This group is specified in the access control lists (ACLs) that secure Reporting Services files.
- Automatically updates the login permissions on the SQL Server Database Engine instance used to host the report server database. The new account will be added to the RSExecRole.
- Automatically updates the encryption key to include the profile information of the new account.

Note: If the report server is part of the scale-out deployment, only the report server that you are updating is affected. The encryption keys for other report servers in the deployment are unaffected by the service account change.

Steps to change the account

1. Open the Reporting Services Configuration Manager form Start à Programs
2. Navigate to the Service Account tab and choose one of the available system accounts, or provide a domain account as the new service account and click Apply

3. It will prompt you to backup the Encryption Key. Specify a location to store the Encryption Key file and provide the password and click OK

4. You will get a prompt to choose the administrator account for granting access rights to the new service account. Enter the account and click OK

5. The service account will now be changed to the new account and the list of activities done during the account change will be displayed on the Results window

For more info refer this Article