Why should I consider if data needs converting into different formats?

 

We have already experienced the difficulty of dealing with legacy systems. The structure of somebody else's database may not be quite up to scratch, or there maybe data anomalies. How do you cope?

 

Similarly, you may be faced with a range of existing systems that are not conventional relational databases. How do you deal with a process that uses a spreadsheet to record product data? How can the information be included within a new system?

 

There are times when we must convert data from one format to another. We can do this in 'real-time' by using applications to do the conversions (such as ASP or Java to generate XML files), or we can do a one-off conversion when we are scrapping an existing system. If you can't remember how we come to the conclusion to write a system off, refresh your memory here.

 

So, you have an old spreadsheet with some data in it. Download a sample spreadsheet and take a look at it. In it we have some customer records, nothing special. But we need to include this with our new information system.

 

How can we use this information in the new system?

 

We have a few options:

  1. Re-type the data into a database. This is OK for a few records, but there are potentially thousands of records. This might lead to some mistakes.
  2. Create a bespoke link to the spreadsheet and dynamically query the data. This would get the data out of the spreadsheet and allow us to use it. However, it won't prevent users from putting the wrong data in (validating input), nor will it take account of changes in the spreadsheet if somebody inserts a column or a row for example. Also, once the work has been done, it cannot be re-used in the future unless the same spreadsheet is utilised again.
  3. Get the data out of the spreadsheet and convert it to a 'neutral' file format. This sounds better, but what do we need to do?

 

Go back to the spreadsheet you opened in MS Excel.

  1. Go to File, Save As.
  2. Select Text (Tab delimited)(*.txt) in the Save as type drop down box.
  3. The filename will change to 'customers.txt'.
  4. Click on Save.
  5. You will see a dialogue box appear. Select 'Yes'.
  6. Close Excel.
  7. Go to Start, Run and type into the box marked Open, notepad.
  8. Open the customers.txt file. What do you see?

 

Is this file of any use to your system?

 

Repeat the same exercise, but this time save the file as a Comma Separated Value (CSV) format.

 

After you have done this, close Excel and Notepad down and open up Access. Create a new blank database and then:

  1. Go to File, Get External Data, Import... now the rest is up to you!

 

See if you can successfully import the text file, then the csv version. If you can export/import files in the above formats, then you can just about access data from any data repository, whether it is an old database or an obscure spreadsheet.

 

The last but one exercise is to test your proficiency with XML. Have a go at re-writing the customer.xls file in XML. Look at last session's tutorial and think about the three components; Structure, content and display.

Do you need a DTD or schema? How would you write one?

 

Once you have done this, take a look at the available export formats in Excel and Access. Can you find an easier way of creating the XML?

 

 

 

 

If you want to read a little further about data conversion issues try the following text:

 

Chapter 4, A Methodology for Client/Server and Web Application Development, First Edition, Roger Fournier, Yourdon Press, p274-282, ISBN 0-13-598426-2.

 

However this is only a starting point and you are advised to look further for other resources, such as alternative texts or on the internet.

 

Once you have completed your research and made some notes, try the review questions below.

 

As a guide, this piece of learning should take about two hours to complete.

 

Review Questions

 

Test your understanding of this topic with the following questions.

 

What activities should we consider if we are designing a data conversion program?

 


 

What advantages does the use of XML offer over traditional 'middleware' products?

 

 

What is meant by 'manual' and 'automated' data conversion processes? Give some examples of each.

 

 

 

Using some modelling notation, describe the data conversion process.

 

 

 

What are the common characteristics of legacy data?

 

 

Describe a clean-up strategy for legacy files.