Setting up a data warehouse connection and selecting tables
At the core of a data warehouse, there are facts and dimensions. They can be organized in star schemas, snowflake schemas; they can be more or less complex, and to some extent undocumented. This is not a book about dimensional modeling and discussing how a database layout works better than other is out of scope. Some common sense rules always apply: consistent column naming can be useful (user_id
, iduser
, user_code
, key_user
, yetanotheruserid
, you name it…), constraints on tables help to pull the strings and find a way into unexplored databases. NOT NULL
fields also come in handy.
I personally had my share of good and bad data. I always remember a development team manager who once asked me, "Why do you need primary keys, anyway?" I looked around and thought, "This must be a candid camera…"
If you are drowning in a very complex DB and want some relief, you can go to http://at5.us/Ch2U2 and read about the Directive 595, it may be a real story after all.
In the course of this book, I will use several editors that MicroStrategy Desktop offers. It is also worth mentioning that there is another way to add tables and create objects, with a tool named Architect; it has the same capabilities and a more visual interface. You can find instructions about Architect in the product documentation.
Getting ready
You need to have completed the previous recipe to continue.
How to do it...
First, we create a connection to the data warehouse and then select the source tables:
- From the Schema menu, select Warehouse Catalog....
- The Warehouse Database Instance dialog box pops up. This is because we have not specified a connection yet. Click on the New... button.
- Say Hello to the Database Instance Wizard welcome page and click on Next.
- In the following page, type a name in the first textbox, since this is our only data warehouse, we simply put
datawarehouse
. - Select the type of RDBMS that you are using from the Database type drop-down list, in our case Microsoft SQL Server 2012. Then click on Next.
- Now it's the ODBC turn, find the datawarehouse DSN we created in the previous chapter and select it.
- Now since we are using the current Windows user to validate onto SQL Server, check the Use network login id (Windows authentication) checkbox and leave the Database login and Password fields empty. Click on OK when you see the warning message, click on Next and then on Finish.
- Back to the Warehouse Database Instance dialog. Now we have a database instance for the project and can continue. Click on OK.
- This is the Warehouse Catalog window: a list of all the tables available in our source system.
- Right-click on DimCustomer, for example, and select Show Sample Data....
- This is a quick way to have a glimpse at the contents of the tables. Only the first 100 rows will be returned, enough to have an idea of what the table is about.
- Holding down the Ctrl key, select DimCustomer and FactInternetSales and with the little arrow > button, move them to the right.
- Click on the Save and Close button. By default, the comments box will pop up, check Do not show this screen in the future and click on OK.
- You should be back in the Tables folder, from the View menu select Refresh (F5) to see the two tables we just selected. Then from the Schema menu, click on Update Schema (Ctrl + U).
How it works...
When you select the tables from the left pane list, MicroStrategy reads the definition of all the columns and stores the information inside the metadata, creating the first schema objects (tables). The objects that we see in the Tables folder are the logical representations of the underlying database and hold no data, just information about the columns and the datatype; if the physical table in the data warehouse changes, this information must be updated (more on this later). It is important to select all the tables that are useful for the project, not less, not more. It doesn't make sense to have 500 tables in the project if we only use 10. It just complicates the design and slows down the Desktop application.
There's more...
While you have the Warehouse Catalog window open:
- Click on Options... and select Read Settings from the Categories list.
- Enable Read the table Primary and Foreign Keys.
- Enable Count the number of rows for all tables when reading the database catalog.
- Unfold the View category, and select Table Row Counts then enable Display the number of rows per table.
- Under the Schema category, select Automatic Mapping and click on Do not map schema objects to the new tables.
- Click on OK and you're back in the Warehouse Catalog window. Click on the top-left button with a lightning icon: this will refresh the list of the tables and display the row count for each one.
Notice that, as expected, the fact tables have far more rows than the dimension ones. This is important when you don't have any database documentation and the table names do not make sense. Reading primary and foreign keys from the database helps identifying the parent-child relationship between tables in case you want to use the automatic discovery features of Architect.
Exercise 2
Now that you know how to add tables to the metadata, go on and add:
DimDate
DimGeography
DimProduct
,DimProductSubcategory
, andDimProductCategory
DimSalesTerritory
Remember to update the schema.
Exercise 3
Looking at the full database schema diagram, can you tell the type of relationship that links one table to another?
Which table is parent and which is child?
Would you say that this data warehouse is normalized or denormalized?