Data and Software Project Planning
Crystal (Bliss) Sloan, Microsoft MVP
--- Resources --- Legacy Data and Software Project
Legacy Data and Software Project Planning
soon as phrases such as "mapping the data," "data
transfer," "data migration," or "convert the legacy data" appear in the
context of a project, a wise systems engineer or project manager should
see red flags waving..."
Students in Professor McCumber's "Systems Development and Project
Control" graduate classes are asked each term to share their tales of
project disasters. Each term, a significant number of the disasters
related revolve around the use or conversion of legacy data. For
"My company recently migrated to our current financial system....
Basically, the conversion process involved the purchase of new hardware, client-side
configurations, application server installation, and data migration. The new package... had no
software conversion utilities for the [legacy] version... we were running.
Therefore, the data migration was to be a "brute force" effort. To
accomplish this, we set up a test bed of the new system, mapped the data, spied the table
updates of the new system, and then migrated the old data to the current structure. Long
story short, hardware procurement was coordinated with software
procurement/installation and the system was put in place. Everything
progressed along the proposed timeline and remained in budget with the exception
of the data migration.... The data migration involved the mapping of data by
the finance/purchasing personnel, the partial population of data into the new
system, the verification of data by the finance/purchasing office, and the
movement of all the data by functional area (i.e. Accounts Payable, Accounts
Receivable). The two bottleneck areas turned out to be the data mapping and
verification. We had a number of highly paid technical resources waiting on
the finance office. The finance personnel were terribly overworked, performing
two functions: their original jobs and the data migration phase. The problem was
definitely a mismanagement of personnel. This problem directly effected the cost
of the project. To offset the cost overruns, we made drastic cuts in the
guaranteed bandwidth of the network connections. To compound the problem, some of the data was too old to be migrated.
The data could not be mapped or traced in the old system. So the old system
remains in place in case we have to produce financial data older than 5 years. This
process would involve rebooting the HP 9000, faking the date to pre-2000 and
producing data reports. Lessons Learned: 1) Personnel - steps that seem insignificant are not.
The data mapping and data verification was considered to be a side note of the
project. Obviously, this was one of the most important aspects of the migration."
- "This six-month small project was an interface between [a legacy
system at our company and a new system from] an outside vendor.... The
project itself was basically successful. We hit our completion date and the
interface works. What wasn't successful was the way we got there and the
negative impact this had on my "career." According to my manager,
development on this project took 40 percent longer than it should have and
he had to pay 40 percent more for the $$$ contractor programmer to complete
the project than he should have. We sacrificed testing time to compensate
for the development time and I spent six months in a state of frenzied
anxiety....We did not have certain data in our company's system or did not
have data in the correct format to meet the vendor's requirements. At the
beginning of this project, all people involved were describing it as
"just a simple interface." I believed that and thought that
requirements would just be a matter of mapping the data, gathering the data,
and sending it. It turned out that the vendor needed more information than
was present in my company's system of record. We had to add many new codes
to give the vendor specific data and in some cases had to clarify and/or
create new business rules and data entry rules."
- "Vendor could not deliver database in type desired therefore
historical data is already 6 months late in conversion."
- "The [government agency] invested in a new computerized personnel
security database in 1996. The system... was intended to replace the older
tracking system and enable immediate access to personnel security files
during all stages of the investigations conducted to grant a government
security clearance to an individual. The older system kept track of who had
clearances and would indicate if an investigation was open, but not any
other details. The newer system would provide up-to-date information as to
when an investigation was opened, what the current phase of the
Although the plan sounded good, it didn't go quite the way it was expected.
When the new computer system and software was completely installed, the plug
was pulled on the old system. After just a few days, the [agency] knew it
made a grave error. The data from the old system did not totally correlate
to the new database. As changes or new investigations required accessing
a record file, it had to be completely re-entered and corrected. New records
that had to be created for required investigations were delayed while
operators tried to correct the existing database. The investigations began
to back up. The [agency] was processing an average of approximately 300,000
investigative cases a year for industry and government agencies. In 1998,
the investigations backlog had reached an astronomical 800,000 cases! The
[agency's] Director... was replaced by [a new manager] who immediately set
out to find a solution to the problem. After an assessment of the situation,
[the new manager] conceded that the new computer database system was too far
along to replace, yet it would take years to catch up to meet the needs of
industry and government. His solution included hiring more investigators and
tasking other government investigative agencies to help. Meanwhile, [agency]
database operators would continue to change and input data for the millions
of records in the system. In 2001, the backlog hasn't been reduced and the
length of time to process an investigation has more than tripled.
The latest figures for the first quarter of FY01 indicate that the
investigative cases that in 1995 took an average of 120 days was now taking
477 days. It was now very possible that a contractor could hire a person for
an [important] contract, yet not be able to put them to work for almost a
year and a half!
The system design... was basically good, but its implementation was
disastrous! Perhaps if the changeover from the old data base to the new was
not so abrupt, bugs could have been worked out. If the database records were
tested before shutting down the older system, errors could have been
corrected. Sample databases, test cases, and trials using real data could
have prevented this whole situation." It is estimated that extra
costs and delays from this mistake will continue for 5 to 10 more years.
- "...The organization was in the process of converting their [ancient
legacy] system to [a new system]. The [legacy] database was plagued with
many duplicate files... [and] was very large. There was no consideration
taken to see if data from [the legacy system] could be imported into the
[new] database.... After taking days to find and delete the right
duplicate files, the manager decided to convert the databases. Everything
went wrong. Fields were not the same so data was not converted. The
organization's entire... database was down for two days. The decision was to
create the appropriate fields in [the new system] and have the entry
coordinators input the previous two fiscal years of [legacy] data [into the
war story included working on a government project as a contractor to develop
data warehouse. The project truly lacked requirements and totally lost
its focus. We had no idea the amount of data this particular government agency had in
its legacy systems. The data warehouse continued to grow, which is normal and the
purpose of a data warehouse, but the quality of the data was poor. We tried to
capture the metadata [data about the data], but no one really knew why certain fields were
created. I feel we should have created a good requirements and design document from the
beginning. At that point, we would have realized the best way to migrate the
legacy data together is to create a smaller repository, such as a data store. The data store
would have given us the opportunity to view the data in smaller quantities and to cleanse
the data from anomalies. Then the data store could have provided us with a baseline
to build our enterprise data warehouse, which eventually lost its funding."
In a recent discussion, one student said something that made me think back to
some of the project disasters students posted earlier in this and previous
"... programmers cannot properly script what they don't
Many computer-related projects involve transferring legacy data to a new system,
or interfacing to data on legacy systems, or both. I lump all the sub-projects
relating to these topics under the term "data transfer projects," or
simply "data transfers" or "data migrations."
Quite a few of the project "disasters" reported in this and past terms
involved such data transfers. This is completely understandable when one
examines exactly what is involved in such "data transfers."
To transfer data from one model/format to another, or even just to map the old
data to the new model, in the case of connecting a legacy system to a new
system, one must know the exact model and format actually used by both sides.
One must know not only the current "official" formats and meanings and
relationships of the legacy data, but also the real-world formats and
meanings and relationships the data has had over time, not only generally but
literally right down to the bit level. This means that the poor programmer stuck
with the data transfer is really being given the belated task of finally and precisely
documenting the entire history of the existing legacy data, a task that almost
certainly no one else has done before.
I'm no data transfer "expert," but have probably participated in some hundreds of
these data transfer projects (I stopped counting at 100 fifteen or twenty years
ago, but kept right on doing them) and have seen many times the types of things
that can happen.
Here is a typical example: a legacy medical database had a field to indicate how
much a person smoked tobacco: According to the customer who owned the data, 0
meant none, 1 meant some, 2 meant 1/2 pack a day, 3 meant 1 or more packs a day.
Simple, right? But it turned out that the field *used* to be used to indicate
not how much a person smoked, but whether they had ever smoked: 0 was never, 1
was the person was a current smoker, and 2 meant the person used to smoke but
Years before they had changed the meaning of the field, but left in the old
data, since the people using the data at the time knew which patients were
which--patients with (otherwise meaningless) IDs before a certain number were
the old format, with IDs after a certain number were the new. However, this
information relating the meaning of the smoking field to the value of the ID was
never formally documented.
The proposed new system had an entirely different, meaningful ID system,
requested by the customers to enhance their operations.
Enter the hapless programmer who is told how to generate the new numbers from
legacy patient Social Security Number and birth date, and also accurately
translates all the legacy data according to the stated specs: next thing, the
programmer is faced with angry doctors who say their patient data is wrong! For
example: "This patient is not a smoker! But the system now says he
smokes!" Multiply this problem with the smoking field by similar stories
for many of the other thousands of fields in that medical research database...
Needless to say, the data transfer part of the project took much longer than the
customers had wanted it to. Incidentally, although the programmer knew in
advance this would happen, neither management nor the customer would listen-- a
Often the mismatch between the "official" legacy format and the
reality of it is so great that even perfectly programmed transfer programs
written to the "official" specification will not successfully complete
a run, even after thousands of "programming fixes" and re-starts, or
will dump so much each run to a log of errors that the programmer must add fixes
for a yet another round of newly-discovered differences between the
specifications and the reality, and start testing yet again.
It is an extremely rare project indeed in which even the current owner of the
legacy data knows exactly what all the fields and codes and records and tables
have meant over time and their relationships to each other. Unfortunately, it is
also an extremely rare project in which the powers that be do not insist that
they really do know the exact formats and consequently insist that the data
transfer is a small part of the larger project, more of a detail or an
afterthought than the main project itself.
This means that nearly always, the data transfer portion of a larger project is
allocated far too little calendar time and resources (labor) by the project
A good rule of thumb for data transfer projects is to figure the time it
"ought" to take to do the transfer--roughly proportionate to the
number of fields in the model-- then multiple by McCumber's 2.2, or Sloan's 2.2
or a more conservative 2.5 to get a "safe" estimate for any other type
of project, and then multiply by ... ten. I wish this were a joke, but
it's not. Ten is actually a pretty good number for this.
The student who posted earlier in the term about being 40% over budget for a
data transfer project may have been doing very well, if her manager would
only realize it.
Managers and programmers who have been through some of these projects can save
themselves a lot of trouble in future endeavors by treating data transfers for
what they truly are--long-overdue precise documentation of the legacy data model
As soon as phrases such as "mapping the data," "data
transfer," "data migration," or "convert the legacy data," appear in the context of a
project, a wise systems engineer or project manager should have red flags waving
in his or her mind.
It is not that doing the programming for these
conversions is so difficult or requires super-expert capability on the part of
the programmers; it is that the following two major drawbacks accompany nearly
every data transfer project: 1) the project will have specifications for the
legacy data that do not mirror reality--that is, the project will be mis-specified;
and 2) the project will be seen by the people who are paying for it as the
exception to the rule--they will nearly always say that they do know the
data formats exactly, so their project is different. One
cannot do much about the almost-certain starting condition of flawed
documentation of legacy data; however, one can try to communicate the potential
for trouble to the customer in hopes of heading off disaster.
It may be that the IT professional can get the customer to see that a data
transfer of legacy data is mostly a long-overdue precise documentation of the
legacy data, with only a relatively small part of the project being routine
programming. If the customer can be brought to understand this, then
he or she may decide that the legacy data is not so important after all, or that
it might cost less to keep the old legacy system around just to check on the
legacy data in those few cases they might need to, rather than pay for a
probably-expensive data transfer project.
It is indeed true that "... programmers cannot properly script what they
Copyright © 2001-2008
EagleRidge Technologies, Inc.