Friday, 25 November 2016

MSBI : Data Warehouse

What is a Data Warehouse?


A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but can include data from other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources.

There are two persons as “Gods of BIDW”: Ralph Kimball and William Inmon.
These two persons given some important Data warehouse concepts which need to be followed as standards for BIDW.
- Ralph Kimball provided “Bottom – Up approach”
- Bill Inmon provided “Top – Down approach”

In the world of BIDW, William Inmon provided most popular definition: “A Data warehouse is a Subject-oriented, Integrated, Time-variant and Non-volatile collection of Data in support of Management’s decision making process”.

What makes a Data warehouse?


1. Subject-Oriented : Data warehouse can be used to analyse a specific subject area. e.g – In an Organization, Subject area can be different Depts like HR, IT, Sales and many more. So, it can be used to analyse particular Dept like HR or IT.
2. Integrated : With Integrated we mean data from heterogeneous data sources is integrated in Data warehouse. e.g – Excel Sheet Source stores date in different format and MySQL DB source stores date in another format, but in a Data warehouse, dates will be stored in a single format only.
3. Non-volatile : Nonvolatile means that, once entered into the data warehouse, data should not change. This is logical because the purpose of a data warehouse is to enable you to analyse what has occurred.
4. Time-Variant : Data warehouse deals with historical data only. It acts as a pit where Transactional data is dumped every day. We can retrieve data which is 6 months, 12 months old or even more than this from a data warehouse. For example – You hold a SBI bank account and updated your contact details. So, Transaction system can only provide you the current contact details whereas Data warehouse holds all the contact details associated with your account (Current and Previous both).
This is very much in contrast to On-Line Transaction Processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive.

In addition to a relational database, a data warehouse environment can include an Extraction, Transformation, and Loading (ETL) solution, statistical analysis, reporting, data mining capabilities, client analysis tools, and other applications that manage the process of gathering data, transforming it into useful, actionable information, and delivering it to business users.

1 comment: