Question. How do you load 80 million records into the Data warehouse?
Answer. Slowly!
As we are now in version 7.8 of the CRM application there is lots of work around upgrading from previous versions. Normally an upgrade of the data warehouse will provide companies with an opportunity to reassess what is captured and how. The biggest dilemma is the choice between a simple upgrade and a fresh installation.
Both methods have their benefits, but for the developers the most interesting choice is to go for a new fresh installation. No need to learn how the previous guys worked their magic, and no constraints of the old data structures, ETL, etc.
Whichever method you adopt there will always be data migration. If you’re lucky there is not a big gap between versions and the data structures are similar; the system has only been used for a year and only has a few million records (and pigs might fly too!).
One project I have been involved had approximately 150 M records in the legacy CRM system. After a year of hard labour this data was converted to 80 M fresh rows and was loaded into the new eBusiness application. This is where we come in. The new data warehouse was designed and built around the 7.8.4 database with no vertical licences. (more on this subject another day).
The standard Informatica routines and DAC repository is the basis of the new ETL. The ‘opportunity’ in the design phase was fact that there were approx 150 custom fields added to various S_ tables. (more on this another day!)
So, here we are with a new blank database, customised ETL system and 80 M records ready to load. If you were using the standard DAC then you would now run the execution plan ‘Complete ETL’. Sit back and wait a few days. But what about the database? Is it big enough? Have you created the correct tablespaces, and are these reflected in the DAC? (where you may ask – hint: look at the Indices)
So half way through day one, and the db has crashed four times due to lack of tablespace. After various calls to the dba you’re back up and running again. Midnight of day one and a standard virus protection system kicks in. Server dies under the weight of workload. Morning of day three and the team are bleery eyed and living on extra strong expresso, having spent the night coaxing the DAC into life, time and time again. This time due to centrally organised Server updates (a Windows Patch here, a java update there). Day five and the testers arrive to start picking holes in the results; you know the sort of thing, table W_xxx has no data, Field xx_Type has the wrong LOV’s in.
Fixes were made and the following weekend set aside to do a full load again. Reset the warehouse; purge the run details, analyse the tables; stop the virus protection, disable the PC updates, increase the tablespace and cross your fingers. From Friday night to Sunday morning all is well
Luckily for us the above is just a nightmare that we usually avoid due to good planning, lots of luck, and plenty of time to be prepared.
You have been warned.
*For opportunity normally substitute ‘Problem’!
www.addidici.com