Cutter Consortium

Data Analysis & Logical Data Modeling for the Data Warehouse

Length of workshop:
General Overview:

This seminar focuses on the key user expectation from any data warehouse: clean, consistent and integrated data that supports business users' decision making needs. This is not a database design issue, rather it is a data analysis, data cleansing and data transformation issue. This seminar will teach you how to address these three issues by using two logical data modeling techniques. The top-down data modeling technique will teach you how to create an integrated Logical Data Model, which is fully normalized and populated with key business attributes. The bottom-up data modeling technique will teach you how to validate and map source data into your Logical Data Model, as well as how to find dirty data in the source files and transform it using formal Normalization rules.

Every data warehouse project addresses two critical business information objectives: data management and data delivery. While the goal of data delivery is to provide easier access and manipulation of business information, the goal of data management is to turn raw data into business information in the first place. This is accomplished by integrating source data from disparate heterogeneous systems. Much of that source data is dirty in terms of data content, and much of it is misplaced or misused in old file structures. This seminar will teach you how to accomplish the data management aspect of your data warehouse project through the discipline of Logical Data Modeling.

Leader Larissa Moss
Workshop Goals:

As a result of attending this workshop, participants will learn the technique of Entity-Relationship modeling and the principles of Normalization; how to integrate disparate source data; how to recognize "dirty data" and how to model it; and will understand the difference between a logical data model and a logical database design (physical data model). This workshop will answer such questions as What steps must be taken to ensure data quality and data integration in a data warehouse? How can multiple data warehouse projects be integrated? How can the unavoidable data redundancy be controlled? In addition, all participants will learn about business meta data and technical meta data components; the basic concepts of Physical Data Modeling (logical database design); and the difference between a two-dimensional and multi-dimensional database design.

Intended Audience:

This workshop is intended for data warehouse customers/users, developers, and project managers.

Data warehouse customers/users will learn why your full-time participation in Logical Data Modeling is necessary; how to communicate and capture your business rules to ensure that the data warehouse will contain accurate and integrated data; how to validate a data warehouse data architecture

Developers will learn how to create and interpret a Logical Data Model; how to tell the difference between a Logical Data Model and a Physical Data Model (logical database design); how to apply Normal Forms to model the source data

Project managers will learn about the objectives and benefits of Logical Data Modeling for a data warehouse; how to estimate time and resources needed to prepare an adequate Logical Data Model for their data warehouse project; how map the database design into the Logical Data Model.

Outline/At a Glance:

This seminar is 80% hands-on and only 20% lecture. Participants will work in teams as they apply the logical data modeling rules learned to a case study. Students will simulate a real project by playing the roles of end user, data modeler and facilitator. They will experience the difficult choices they will have to make when modeling ambiguous requirements, resolving data disputes between users, staying within the scope of the project, and dealing with dirty data.

Workshop Outline
1. Introduction to Data Warehousing
  • Problems that lead to Data Warehousing
  • Traditional DSS architecture
  • Swim Lane development
  • Cross-organizational development
  • Data Management objectives
  • Data Delivery objectives
2. Data Warehouse Architecture
  • History of Data Warehousing
  • Characteristics of an Enterprise Data Warehouse
  • Characteristics of a Data Mart
  • Characteristics of an Operational Data Store
  • Corporate Information Factory (CIF)
  • CIF architecture & ETL process
  • OLTP vs. CIF
  • ODS vs. EDW vs. Data Mart
3. Data Modeling
  • Logical Data Model
  • Physical Data Model
  • Distributed Data Model
  • Top-down modeling
  • Bottom-up modeling
  • Middle-out modeling
  • Meta data components
4. Basic Logical Data Model
  • Entities
  • Relationships
  • Entity-Relationship matrix
  • Entity-Relationship diagram
  • Entity definitions
5. Refinement of Logical Data Model
  • Cardinality
  • Associative Entities (intersection data)
  • Unique identifiers (primary keys)
  • Foreign keys
  • Optionality (existence criteria)
  • Attributes
  • Naming standards
6. Source Data Analysis
  • Information Quality
  • Dirty data symptoms
  • Dirty data categories
  • Modeling dirty data
  • Data cleansing issues
7. Normalization
  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce-Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)
  • Final modeling considerations
8. Data Warehouse Database Design
  • Data Warehouse considerations
  • Platform considerations
  • Performance considerations
  • Data Warehouse de-normalization options
  • One-dimensional design (relational)
  • Multi-dimensional design (star schema)
  • Star schemas, snow-flakes, constellations
For more information on bringing this workshop to your organization, contact Dennis Crowley by phone at +1 781 641 5125, by fax at +1 781 648 1950, or by e-mail at sales@cutter.com.
Data Analysis and Logical Data Modeling for the Data Warehouse