Last week I found myself getting defensive with a colleague of mine in a conversation about the value of system prototyping. My colleague suggested that there’s a difference between IT people and everyone else and that it’s that two IT guys can look at a diagram and see how a system will work while everyone else needs to see the system in action. There is perhaps an element of truth to this but like most things in life I think it depends on the people involved (which is why I leapt to the defence of IT people everywhere as I don’t like generalisations being made that somehow mark us out as different). Sometimes I can look at a diagram and make the imaginative leap to how a system might work. There are other times when I like to play with a piece of software or whatever to get to know it, and at the end of the day there’s nothing like using a thing to understand how it operates.
Also last week I was discussing Oracle Warehouse Builder with someone. OWB (as it’s known) is one of those systems that I’ve experienced but would like to get more experience of and it offers some functionality that might solve a nasty little problem many businesses suffer from which has bothered me for a while.
If you maintain any sort of business system, like ERP or CRM, then the chances are that over time all sorts of data quality problems occur in the database. Data duplication, dirty data, and difficulties with uploads from legacy systems are all variations on a theme of a data quality that plague businesses that maintain any kind of database system over an extended period of time. Most smaller organisations tend not to deal with these problems the way they should for a variety of reasons, mostly relating to cost.
When you look into dealing with data quality the costs can quickly mount up. Most firms would rather not have to deal with this rather messy business themselves so they look to outsourcing the problem which doesn’t come cheap. If you decide to take a swing at the issue yourself then the cost of getting some heavy hitting software will put you off quite quickly. Dealing with data quality yourself is also going to take some time and needs quite a bit of expertise. That said, if you have the time and the inclination, there are some really good software options available to you.
Oracle Warehouse Builder is, as the name suggests, software for the purpose of creating and maintaining a data warehouse. OWB comes with all the tools needed to deal with the Extract Transform and Load (ETL) routines and can therefore deal with the data quality and mapping issues that come with warehouse building. These tools are useful not only for building warehouses but also for sorting out the poor data quality problems that haunt businesses everywhere and have been causing me grief of late.
I’ve been reading up on Oracle Warehouse Builder but reading about it only goes so far and I wanted to try it out, so it was time for a Quick & Dirty installation!
The Quick & Dirty approach to problems in IT is well established and considered such a valid process that it evolved into the Rapid Application Development methodology. In the case of trying out OWB I went for the quickest and dirtiest way of doing things.
Step 1: Operating System
I installed a Windows XP guest onto a VirtualBox virtual machine running on Windows XP. Doesn’t get much more straightforward then that, and I wanted to see OWB in action, not an operating system so I decided to leave off the Solaris 11 or Linux installs for another time.
Step 2: Oracle 11g
Oracle Warehouse Builder comes bundled with the Oracle database, though it is licensed separately, so to install OWB you install Oracle. I downloaded a copy of Oracle Database 11g from the Oracle Technology Network and installed the Enterprise Edition just to be sure that I’d installed everything I needed. The OTN license allows for non-production development use of Oracle products so it’s always the first port of call when you want to try out an Oracle product. However, a full (pricey) license is needed if that system is to be used commercially.
Step 3: OWB Schema Preparation
Oracle installs a couple of users for OWB but they’re locked by default as a security precaution. In order to get OWB up and running the user accounts OWBSYS and OWBSYS_AUDIT, need to be unlocked and have their passwords set, a task best achieved via SQL Plus. There are also a handful of SQL scripts available that prepare the objects needed in the schema to run the OWB workspaces which can be run in SQL Plus while you’re there.
Step 4: Repository Assistant
With the users operational the Repository Assistant can be run. The Repository Assistant is a GUI application that populates the repository and creates a workspace for you to conduct your Warehouse Building in as well as offering the ability to create users who own workspaces. The wizard is powerful and allows for all sorts of settings to be tweaked. I accepted the default of installing everything and found the process to be quite simple.
Step 5: Design Center
With OWB installed and configured its primary user interface is the Design Center application. In Design Center you can connect to sources of data and targets for that data. It’s also used to configure Data Profiles and mappings that can transform data into the format you desire for your target database, usually the Data Warehouse. I set about a simple connection and loading exercise where I took data from a basic data source, an Excel file, and loaded into an Oracle table. Now, this is far from the best way of doing such a simple task in Oracle but I wanted to see OWB in action and I was interested in seeing how it would handle connecting to data in a format common to the smaller enterprise.
OWB comes with the ability to connect to a range of different databases via its built-in JDBC drivers. Connecting to an Oracle database is, as expected, a simple task and there are options for SQL Server, Sybase, and Informix systems, as well as an ODBC option which piqued my interest. The OWB Sources & Targets guide offered details on how to connect to ODBC options like Excel and while the procedure follows the standard Oracle approach to doing things (lots of file editing and connections into connections into the database you’re trying to get to) it does make sense and isn’t too hard to follow. It didn’t take long to get to a point where data could be extracted from Excel, transformed into a format of my choosing, and loaded into an Oracle database.
My Quick & Dirty installation performed admirably and has provided not only an insight into how to setup OWB and begin construction of a Data Warehouse (albeit without the massive amounts of designing that would be needed a real world situation) but has also given me a better understanding of what can be done to reign in common data quality problems as the OWB Design Center provides tools for dealing with duplications, common Address formats, and for writing transformation rules that enable you to fix and map practically any data problem.
Data Warehouses in their truest form only really come into their own when there are a large number of systems in use all collecting data, preferably from all across a large organisation and the warehouse and its associated tools can consolidate that data into a meaningful store that can then be analysed as a whole. In the case of a smaller organisation the issues relate more to how to migrate data from legacy systems into new ones and how to deal with data quality problems, as ideally the number of core systems in use in the business should be relatively limited. For all the power that OWB can bring to solve these issues it’s likely that the cost of licensing the software will be prohibitive for the types of companies that could get the most value from it.
While I had gotten a little defensive during that conversation last week, I was in favour of the idea to do some prototyping and at the end of a Quick & Dirty installation you realise just how much can be gained from the exercise, and how much can be lost if anyone forgets that the Quick & Dirty way is, in the end, not the right way to install a system.
Quick & Dirty in IT: http://en.wikipedia.org/wiki/Quick-and-dirty
Oracle Warehouse Builder Installation and Admin Guide: http://download.oracle.com/docs/cd/B28359_01/owb.111/b31280/reqs01.htm#i1139371
Financial and Management Reporting: Data Warehouse vs ERP System: http://www.glassoc.com/resource_center_white_papers.php?id=4&msg=1