Queries From Hell
A blog about data warehousing, correlation databases, associative and incremental queries, value-based storage, metadata, on-the-fly indexing, automatic data-driven schemas, BI tools, data mining, visual mapping, pattern recognition, and the limitations of standard SQL in answering "queries from Hell."
Or, how to discover what you don't know you don't know.

An inexpensive data warehouse?

Posted at 4/09/2008 08:24:00 AM
What makes a data warehouse system expensive or affordable? Let’s take a quick look at two examples; one based on an RDBMS that is free and another based on a CDBMS that sells for $100K. To keep it simple, I’ll use 100 gigabytes of raw data as the input to these systems and assume a small number of users directly on the warehouse and a larger number of users with OLAP screens, dashboards using star schemas, etc. The costs of the front end systems will be essentially the same for both systems so they are not included.

Design – RDBMS: Six calendar months, about $150,000 in labor cost.
Get user requirements defined, design level one schema, design level two schema, design all required OLAP cubes, star schemas and other information access structures, prepare load, indexing and aggregation processes, prepare physical design.

Design – CDBMS: Three calendar days, about $750 in labor cost
Define aggregations, prepare load and aggregation processes.


Startup – RDBMS: 20 calendar days, about $15,000 in labor cost
Load data, index level one. Build and index level two. Extract data for access structures, build and index structures. Validate all levels for accuracy and completeness.

Startup – CDBMS: Four calendar days, about $1000 in labor cost
Load data and validate for accuracy and completeness


Operation – RDBMS: One full time person, about $75,000 per year
Weekly load and update and rebuild of all access structures. Design and build of new access structures as needed. Periodic reorganization of level one.

Operation – CDBMS: Less than 1/10 of one full time person
Weekly loads.

It looks like even "free" gets expensive if you can’t reduce the time required to design, build and maintain the data warehouse. Of course, this comparison ignores the opportunity cost of not having the information available for six months. And this sample is a small data warehouse. The cost of implementing a data warehouse is roughly proportional to the amount of data and the number of users. A system with a terabyte of data and a thousand users would be much higher. The cost of creating the warehouse with both the RDBMS and the CDBMS would increase by at least a factor of ten.

Hmmm, ten times $165,000 or ten times $1,750; that seems like an easy decision.

Add to Technorati Favorites      | Blogger Comments (0) |