Should users know what is behind a datawarehouse?
Originally I was averse to teaching users about datawarehouses and their structure. Database work is rather complex – DBAs hold a lot in their heads. And the reason for being a database administrator is to manage the data and its integrity, then make it available to the users. There’s a touch of job security involved, but more important, the users can inadvertently damage or misconstrue the information.
My view on this matter has shifted after a recent experience. I was at a company which had a huge data warehouse with information available from at least three other databases (all of which were proprietary).
At marketing's request the data analyst created four Cognos OLAP cubes to run their most frequent analyses and reports. The data analyst did not do a lot of training for the marketing people, but did teach them how the cubes functioned and what data they had in them in what form. Marketing could then fly on their own, and did so for months. Every morning at 8 AM the cubes would be rebuilt with the newest data, ready by 9 AM.
For those of you who are not familiar with OLAP cubes, this is a method of combining data from whatever sources into three-dimensional “grids” which can be rotated, much like a three-dimensional pivot table in spreadsheets.
Marketing got curious when the information seemed off. The cubes suggested that some data was per-instance when it was actually per-person, and other information was life-long, not per day.
Unfortunately, there were no data dictionaries so I had to build them myself. We had to define how the information was being calculated, because data in one direction would be broken up by measures which didn’t work in another direction. This whole analysis, building of data dictionaries, and repair of the cubes took several weeks, especially since nothing had been documented. While the result was a success, the problem should not have occurred. Marketing should have been more thorough in explaining what they intended to do with the cubes, and the data analyst should have been more thorough in making marketing aware of what was occurring.
So I think, other than letting a data analyst have more time to be sure of what and how things were being assessed, teaching the users was paramount - and teaching them properly. They could not change any data, since the OLAP cubes were intermediate between them and the databases. They could only query. The problem was that they were limited in their knowledge to make those queries.