Understanding the Different Types of Schemas (Snowflake and Star)
In data modeling, understanding how tables are structured is very important because the structure affects how data is filtered, how calculations work, and how fast reports run. In most analytical systems such as Excel Data Model, Power BI, or data warehouses, we work mainly with two types of tables: fact tables and dimension tables. Fact tables store the key numerical data we want to analyse, such as sales amount, quantity sold, profit, or number of transactions. Dimension tables store descriptive information that gives meaning to those numbers, such as product name, customer, store, date, or location. Dimension tables are used to filter and group the data in the fact table.
The way these tables are arranged creates what we call a schema. The two most common schemas are:
- the star schema,
- the snowflake schema.
A star schema has a simple structure. One central fact table is connected directly to several dimension tables. The fact table is usually placed in the centre or at the bottom of the model, and the dimension tables surround it. Each dimension table has a one-to-many relationship with the fact table. This means one row in the dimension table (for example, one product) can relate to many rows in the fact table (many sales of that product). The dimension tables do not connect to each other they only connect to the fact table. Because of this layout, the diagram looks like a star.
In a star schema, dimension tables are denormalized. This means all related descriptive attributes are stored in the same table instead of being split into multiple smaller tables. For example, a Product dimension might contain Product ID, Product Name, Category, Brand, and Colour all in one table. This design makes the structure easier to understand and reduces the number of joins needed when querying data.
A snowflake schema is an extension of the star schema but with more complexity. In this design, some dimension tables are split into additional tables called sub-dimensions. These sub-dimensions create multiple levels of relationships. For example, a Store dimension might contain a Location ID. Instead of storing city, region, and country in the Store table, that information is placed in a separate Location table. The Store table connects to Location, and Location may connect to another table such as Country details. Because the dimension tables branch out into more tables, the structure looks like a snowflake.
This happens because the data is normalized, meaning attributes are separated into different tables to reduce duplication. While this can save storage space and improve data consistency, it increases the number of relationships and joins needed when analysing data. As a result, the model becomes harder to understand and queries can become slower.
There are several important advantages of using a star schema.
- Simplicity. The structure is straightforward and easy to visualize. Users can clearly see which tables are dimensions and which table contains the measures. This is especially helpful for beginners or business users who are not technical.
- Performance. Because dimension tables connect directly to the fact table and there are fewer tables overall, fewer joins are required during queries. This usually results in faster report performance
- Easier Maintenance. Adding a new attribute to a dimension table is simple and does not require changing many relationships. Finally, calculations are easier to write because the filtering path from dimension to fact is direct and clear.
The snowflake schema, on the other hand, can be useful in very large or complex systems where storage efficiency and strict normalization are important. However, in tools like Excel or Power BI, star schema is usually recommended because it is more user-friendly and optimized for analytical performance rather than transactional storage efficiency. To keep models simple, modelers often avoid snowflake structures by combining sub-dimension data back into the main dimension table. Instead of creating many connected tables, they store related attributes together and then organize them using hierarchies.
A hierarchy is a logical way to group related columns so users can drill down through levels of detail. For example, in a Location dimension, we might have Country, Region, and City columns. Instead of treating them separately, we create a Location hierarchy where users can analyse data from Country level, then drill down to Region, and finally to City. Hierarchies make reports more structured and easier to navigate, without needing extra tables like in a snowflake schema.
Reference:
Bernard Obeng Boateng. (2023). Data Modeling with Microsoft Excel. 1st. Packt. Birmingham. ISBN: 971803240282.