Tue, 11 Oct 2005
Exporting data from IS MU
Every year we have to export data about entrance exams from IS MU to the state-wide database. As I wrote before, I am working on migration of the export/import mechanism to the XML-based transport. However, it is too late for this year's data, so we had to export data using a traditional way, via a Perl script, connected to both databases.
The problem here is, that instead of saying "this is the data we have, insert it to the remote system", it has to be done the hard way: "this is the data we have, that is the data they have, let's compare it, and insert, update or delete it as needed". However, the structure of the remote database is not exactly clean :-), for example, the study subject names are looked up using the full name instead of the code (even though there indeed exists a state-wide code assigned by the Ministry of Education). The study subject code-list even contains the code of the study programme, so we have to do "look up the row where the study subject name matches, and the code of the study programme matches".
Still not difficult enough? Okay, let's go to the next level: the code list is different from ours - for example, they have the study subject named "Gender study", while ours is named "Gender studies". And so on. So matching the study subject name is not exactly easy. And the study programme codes do not match either. The suggestion from the maintainer of the remote database is: "Try the exact match first, and then try to match at least the first letter of the study programme code".
Still with me? To another level then: when matching the study programme the above way, multiple rows can match. And I have to choose the proper one - for updating, the row which has already some references from the present data, and for inserting, the row which does not have any. Except that when there is none, take what is available as a fallback.
I am looking forward to the time when I rewrite this beast to use the XML-based transport mentioned above with the "delete everything at the beginning, and then insert all the new data" approach.