When choosing a database schema for a data warehouse, snowflake and star schemas tend to be popular choices. This comparison discusses suitability of star vs. snowflake schemas in different scenarios and their characteristics.
|Snowflake Schema||Star Schema|
|Ease of maintenance / change||No redundancy, so snowflake schemas are easier to maintain and change.||Has redundant data and hence less easy to maintain/change|
|Ease of Use||More complex queries and hence less easy to understand||Lower query complexity and easy to understand|
|Query Performance||More foreign keys and hence longer query execution time (slower)||Less number of foreign keys and hence shorter query execution time (faster)|
|Type of Datawarehouse||Good to use for datawarehouse core to simplify complex relationships (many:many)||Good for datamarts with simple relationships (1:1 or 1:many)|
|Joins||Higher number of Joins||Fewer Joins|
|Dimension table||A snowflake schema may have more than one dimension table for each dimension.||A star schema contains only single dimension table for each dimension.|
|When to use||When dimension table is relatively big in size, snowflaking is better as it reduces space.||When dimension table contains less number of rows, we can choose Star schema.|
|Normalization/ De-Normalization||Dimension Tables are in Normalized form but Fact Table is in De-Normalized form||Both Dimension and Fact Tables are in De-Normalized form|
|Data model||Bottom up approach||Top down approach|
Consider a database for a retailer that has many stores, with each store selling many products in many product categories and of various brands. A data warehouse or data mart for such a retailer would need to provide analysts the ability to run sales reports grouped by store, date (or month, quarter or year), or product category or brand.
Star Schema Example
If this data mart was using a star schema, it would look as follows:
The fact table would be a record of sales transactions, while there are dimension tables for date, store and product. Dimension tables are each connected to the fact table via their primary key, which is a foreign key for the fact table. For example, instead of storing the actual transaction date in a row of the fact table, the date_id is stored. This date_id corresponds to a unique row in the Dim_Date table, and that row also stores other attributes of the date that are required for grouping in reports. e.g., day of the week, month, quarter of the year and so on. The data is denormalized for easier reporting.
Here is how one would get a report of number of televisions sold by brand and by country with the help of inner joins.
Snowflake Schema Example
The same scenario can also use a snowflake schema, in which case it would be structured as follows:
The main difference, when compared with the star schema, is that data in dimension tables is more normalized. For example, instead of storing month, quarter and day of the week in each row of the Dim_Date table, these are further broken out into their own dimension tables. Similarly for the Dim_Store table, the state and country are geographical attributes that are one step removed — instead of being stored in the Dim_Store table, they are now stored in a separate Dim_Geography table.
The same report — the number of televisions sold by country and by brand — is now a little more complicated than in a star schema: