I am re-writing a section of my site and am trying to decide how much of a rewri
ID: 643215 • Letter: I
Question
I am re-writing a section of my site and am trying to decide how much of a rewrite this will be. At the moment I have a web service feed that generates an xml once per day. I then use this xml file on my website to generate the general structure. I am trying to decide if this information should be located in the database or stay in the xml file.
The file can range from 4mb - 12mb.
The files depth can go on and on so I have to recurse to find the data I want.
I use the .NET serializer classes and store the serialized file in a global variable to avoid re-serializing it each time the page is loaded.
My reasons for thinking a database would be better are:
I would know exactly where I am in the file by using an internal ID so I wouldn't have to recurse the file to get information.
I wouldn't have to load / serialize the XML and could just use my already open database connections.
Searching for the data in the file would be quicker(?) as I would just perform an SQL query rather than re-cursing the file.
Has anyone got any ideas which is better and which option uses more resources on the server or be quicker?
Explanation / Answer
If I understand correctly, your data changes once exactly per day. To be honest, it doesn't matter if you have it as XML, or it store it row-by-row in a DB; what would make a big difference is if you cached it in your web application.
Basically, you should read in the file at most once per day (this can happen on-demand, just check if your cached version has expired), and store its contents in a structure optimised for quickly fetching information. This will prevent your pages wasting CPU time parsing the file over and over again, as well as saving the garbage collector a lot of work.
Technically, a 10 MB XML file should take up a similar amount of memory as objects (accounting for the lack of bloat of a text-based encoding, but also for the overhead of having Dictionary objects for indexing), which is nothing these days. Changes like these typically result in performance gains of a couple of orders of magnitude.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.