School of Information Systems

Data Warehousing : Collecting and storing large amount of data historical data

Introduction

What is data warehousing ?

A data warehouse is also a system in which it stores highly structured information from many kind of sources. Data Warehouse usually store current data and also historically from one data to another. The main goal of data warehousing is to combine different types of data sources in order to analyze the data, look for insights and create business intelligence in the form of report.

Characteristics of Data Warehousing

  • Contains large amount data from current data and historic data. Which contain a range of data from raw data to highly curated data,cleansed,filtered and aggregated data.
  • Data Warehousing has ETL ( extract, transform,load ). ETL is the process of moving data from original source to the data warehouse. The ETL processes moves data on a regular schedule.
  • Data Warehouse typically have predefined and fixed schema. Therefore it works well with the strutured data. Some Data Warehouse also support semi – sructured data.

Why use Data Warehouse ?

When you need to store a lot of historical data or analyze your data in-depth to produce business insight, data warehouses are a fantastic choice. Business analysts and data scientists can easily analyze the data found in data warehouses because of its highly structured structure.

Keep in mind that data warehouses are not meant to meet an application’s needs for concurrency and transactions. An organization will require a separate database or databases to power their day-to-day operations if they decide they will benefit from a data warehouse.

Data Warehouse Examples

  • Amazon Redshift
  • Google Big Query
  • IBM Db2 Warehouse
  • Microsoft Azure Synapse
  • Oracle Autonomous Data Warehouse

What is a Database ?

A database is a grouping of information or data. Databases are utilized to enable Online Transaction Processing (OLTP) and are often accessible electronically. Database Management Systems (DBMS) allow users and programs to interact with data while storing it in the database. The word “database” is frequently used to refer to both the DBMS and the database itself.

Characteristics of DataBase

  • Being able to store a lot data
  • Separates the logical structure of the data from its physical storage, providing flexibility in making changes
  • Minimizes duplication of data to maintain efficiency and consistency.
  Database Data Warehouse
Workload Operational & Transaction Analytic
Data Type Structured or semi-structured Structured and/or semi-structured
Schema Flexibility Rigid or flexible schema depending on database type Predefined and fixed schema definition for ingest ( schema on write and read
Data Freshness RealTime May not be up to date based on freqeuncy of ETL processes.
Users Application developers Business Analyst & Data Scientists
Pros Fast queries for storing and updating data The fixed schema makes working with the data easy for business analysts
Cons May have limited analytics capabilities Difficult to design and evolve schema scaling compute may require unnecessary scaling of storage, becayse they’re tightly coupled.

Conclusion

Database and Data Warehouse each have their own purpose, because nowadays nearly all the business out there need a database in order to store the current appliction data. Organization that wants to analyze there application current and historical data may choose to complement their databases with a data warehouse or a database.

References

MongoDB. (n.d.). Databases vs. data warehouses vs. data lakes. https://www.mongodb.com/databases/data-lake-vs-data-warehouse-vs-database

What is a Data Warehouse? (n.d.). https://www.oracle.com/id/database/what-is-a-data-warehouse/

What is a database? (n.d.). https://www.oracle.com/id/database/what-is-database/

Devyano Luhukay & William Putra Wijaya