Flow Template

 

 

 

 

 

 

 

 

 

System Innovations

--  Integration modeling, data warehousing, data strategy --


Steps of the Process

The model below defines a methodology for building a data warehouse.  As the model shows, it is a very iterative process, where the idea is to get all the way through the process (or a subprocess) with a component of data, then add another component and get it all the way through the process again.  It is a model that most of the data warehousing industry now follows, developed because the warehousing process is essentially a process of discovery.  The iterative approach allows us to build in new knowledge gained during data research, data design, build and testing, and application deployment, as we move ahead in the overall project. 

Data Warehouse Methodology

Project Initiation
Data Business Area Definition
Data Research
Data Sourcing
Data Model Completion
Perform Multidimensional Modeling
Physical Model Implementation
Performance Tuning
Business Process Definition
Technical Architecture Definition
Application Definition
Application Engineering
Distributed Deployment

Project Initiation
Define Business Case, Scope and Feasibility. 
Seek ongoing sponsorship and executive buy-in.
Identify allies early, establish their goals, and include in the scope.


Back to steps

Data Business Area Definition
Keep the scope focused and manageable. 
Determine areas of the business most important to the organization at this time. 
Priority depends on:
How much information is available through automation already?
What are the industry trends and competition?
What are the events driving the business at this time?
How much is known and understood about this area of the business?
As the above factors are clarified, what will emerge is an understanding that

  • Certain areas can be researched and "mined" for data immediately, the "low-hanging fruit"; 

  • Some areas are well understood and the amount of research can be estimated and started immediately, the "near-term" deliverables; 

  • And thirdly, other areas are little-understood and need further research before they can even be estimated, the more "long-term" areas of interest. 

Conducting this type of analysis early and often will keep a data warehouse project focused on the real goals of the business and will assure that the research and sourcing that is undertaken is feasible, meaningful and realistic. 

Back to steps

Data Research
Determine how the data will be used (Business Process).
Inventory, question, define business rules for sending (source) applications.
Expect receiving business process will undergo a transformation as the result of new awareness and new understanding due to having more information than ever before. 
Expect the requirements to change quickly because the access to information represented by an effective Data Warehouse will cause rapid growth and change in the functions using the information. 
Research data availability and value.
Continually evaluate priority and benefit of desired data vs. cost and time involved in acquiring it. 


Back to steps

Data Sourcing
Identify systems, suppliers and vendors that will provide data.
Prioritize based on revenue implications, strategic and competitive concerns as well as usage and actual availability.
Gather valuable information from sourcing to be returned to the next iteration of Business Area Definition and Data Research as data discovery results. 
Equip and train teams performing sourcing to collect information to be returned to Researchers. 


Back to steps

Data Model Completion
Select a portion of the Enterprise Model to implement through the analysis of the Business Area.
Define data relationships and resolve anomalies like many-to-many relationships.


Back to steps

Perform Multidimensional Modeling
Define Star schemas for data constructs, providing greater flexibility and understandable data structures for the business user.
Define facts, dimensions, hierarchies and sparsity. 
Define and model common business dimensions such as time, scenarios, geography, product and customer using Modeling techniques like the hypercube, star diagrams and snoflakes.


Back to steps

Physical Model Implementation
Depending on the performance requirements and other architecture considerations of the system, choose:
Relational DBMSs such as Sybase, SQL Server, Oracle, Informix/RS6000, DB2,

or
Multidimensional Databases like PaBLO, PowerPlay/Cognos, Mercury, ESSBase/Arbor, Lightship Server, Acumate ES/Kenan, Express/Oracle, Gentium or Holos

or
Relational OLAP (or OLAP-on-Relational) tools like Metaphor, Information Advantage's AXSYS, Prodea's Beacon, Redbrick Data Warehouse, Alpha Warehouse (ISI & Digital) or Decision Warehouse from Sun Systems.

Back to steps

Performance Tuning
Determine workload
Bitmap indexing significantly improves response time over traditional indexing methods by greatly reducing the number of read operations to the data.  Bitmap indexing also allows more users to access the warehouse simultaneously makes it easier for users to pose a series of queries to analyze the data achieves an acceptable level of responsiveness with a lower hardware expenditure than with traditional indexes.


Back to steps

Business Process Definition
Define/validate the process to be automated through accessing the data warehouse. 
Utilize Business Modeling to clarify and redesign if necessary.
Highlight decision-support requirements for analysis. 


Back to steps

Technical Architecture Definition
Define the hardware and technical constructs to be applied in providing the business solution. 
Select core technologies (platform, networking, RDBMS, etc.).
Address structural issues.
Data Warehousing architecture includes:
- Sourcing of data from legacy systems and other applications (data collection, editing and preparation).
- Transformation and data integration, with storage issues resolved and atomic vs. Composite levels of granularity defined.
- Distribution of data to actual warehouse tables (further distribution to subject area databases or Data Marts is sometimes in order).

Consideration should be given to the nature of the data, answering the primary question of whether it serves the informational community or the operational.

Back to steps

Application Definition
Define the processes that will access and manipulate the data. 
Consider incorporating an Application Server, removing the business logic to a Three-tiered architecture which addresses performance, reliability and resource management by moving complex application logic to an application server independent of server databases and Client PCs:


Application server provides efficient data access, accelerated response, scheduled background processing and serving of pre-processed reports. 

Client is dedicated to presentation logic and services, and has an API for invoking the applications in the middle layer. 

Database server is dedicated to data services and file services. 

Back to steps

Application Engineering
Develop physical design, test plans and scripts with expected results and build plan. 
Application Definition, Engineering and deployment are iterative, with successive prototypes developed, providing early requirements gathering to third-level systems that will graduate to production. 


Always put test data in test tables; never load it to production.  (Early prototyping efforts in data warehousing often deliver data that is below performance thresholds for accuracy.  Performance threshold should be defined early in the process, and signed off as part of the data research phase.  Warehoused data is sometimes not required to be 100% accurate - certain marketing applications for instance can tolerate a margin of error up to 5%.  Expectations must be documented and adhered to, but test data that falls below standards should not be loaded to production databases.)

Back to steps

Distributed Deployment
Deploy architecture: server, client and application.
Consolidate meta-data (data about the data) for publication.
Data credibility hinges on managed expectations and consensus in the definition stage

Back to steps
Return to Data Warehousing

Top of Page

[Company Profile] [Services] [Home] [Books & Articles] [Laura's Bio]

Copyright 1998, 1999, 2000. Laura Brown, LBPI, Inc. (DBA: System Innovations)
Last Updated: August 18, 2000