Excel Python XML

Python is one of the new weapons of choice at one of my current contracts. I'm a big believer in removing as many chances for error in a project as possible. That means for example where as some teams will have their designers edit an Excel spreadsheet .xls file and then require them to also export to a .csv file for the game, I'd prefer if they just save to .xml at all times.

They lose no features because Excel will save all the data, formatting, formulas etc to XML and Windows will even directly open the .xml files to Excel when double clicked so as far as the user is concerned nothing changes. In fact once they create the file Excel will load and save to .XML so there is no need to remember to save as .XML, it just happens. Automated things like that mean no errors.

When using CSV files there is always the issue that the designer may or may not have remembered to export so they change a value, save the file and forgetting to export they spend a bunch of time trying to figure out why their changes are not showing up in the game. Or, worse, when you save AS CSV in Excel, excel then defaults to saving that file in CSV mode. So, if the designer makes any changes after that and forgets to resave the file specifically in .xls mode then all their changes will be lost. Sure they can reload the .csv file but any formulas, formatting, colors, comments etc will all be lost.

If instead they use XML files then they don't have to remember the step of saving to CVS. They don't have to worry about saving back to XLS as well and they never have to worry about losing data since that issue will never come up.

I know some teams that write Excel macros to do this. That's a huge waste of time. Those macros generally have to be propogated to all their Excel files. Those macros often have to be tweaked for different versions of Excel. It becomes even more of a pain at localization time because the developers don't usually control the machines of the people editing the files in other countries and macros etc are often disabled. I've seen no end of problems dealing with Excel files that had macros in them so no, macros are not the solution.

It's much simpler just to parse the XML. As you can see below it's very little code. You can easily add it to your tool pipeline or, looking at the example for reference, just make your game load and parse the XML file directly, at least during development.

Anyway, here is code to read an excel XML file (right click and pick Save Target As). I posted some perl to do the similar things before if perl is more your thing.

I tried Pythonifying the code so it's pretty simple to use. The following are a few simple examples
import excelreader

xl = excelreader.ParseExcelXMLFile("someExcelFile.xml")

print "Cell B4 in first sheet = ", xl[0][4][2]
print "Cell G5 in first sheet = ", xl[0]['G5']
print "names of sheets        = ", xl.keys()
print "Cell A7 in Sheet1      = ", xl['Sheet1'][7][1] # assumes there is a sheet called "Sheet1"
print "all cells in all sheets"
for worksheet in xl:
    for row in worksheet.keys():
       for column in row.keys():
          print row, column, worksheet[row][column]
The library also pulls out style info which is useful if you want to know the colors (say for generating corresponding materials in Maya) or the font names and sizes (say for auto-generated localized textures)
print xl[0].GetCellStyle(7,2)  # style info for cell B7 in first sheet
print xl[0].GetCellStyle('D3') # style info for cell D4 in first sheet

Pass it on

Comments:
Thanks!
This is extremely helpful - thanks! couple things:
- got a license for this? (or public domain?)
- i had to make a minor change, between lines 418 and 419:
self.mergeAcross = int(self.mergeAcross)
For whatever reason, this var was a unicode string instead of an int. Probably related to different versions of excel's xml output, or platform (I'm on OS X 10.5, python 2.5, XML came from windows excel 2003)... whatever, minor.

I'd email you directly regarding a license, but couldn't find contact info on your many sites. Feel free to email me.

Again, THANK YOU! You saved me hours of boilerplate code!
posted by KevinCoxMarch 18, 2008 at 19:56 [ e ]
BSDed

thanks for the fix

the license is New BSD. It's in the comments at the top of the file.

posted by greggmanMarch 18, 2008 at 23:11 [ e ]
Thaks but...
Thanks for code but there some troubles with this script:
characters function of XML Handler returns only last part of cell if it has called for 2 or more times for one particular cell (e.g. you have "<" combination in XML cell (or '<' symbol in Excel row before import to XML)).
I'm not so good at Python but i suppose than following would help:
1.
Instead lines 416-417 (In class ExcelToSparseArrayHandler, func characters):
"self.currentRow[self.curr
entColumnNum]["content"]=con
tent
self.inCell = False"
Insert : "
try:
self.currentRow[se
lf.currentColumnNum]["conten
t"]+=content
except:
self
.currentRow[self.currentColu
mnNum]["content"]=content
"

2.
Add: "
def end_Cell(self):
self.inCel
l = False
"
posted by AlexeyJuly 31, 2008 at 8:06 [ e ]