Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Message from discussion Script to convert from XLSX to XLS

View parsed - Show only message text

Received: by 10.115.49.11 with SMTP id b11mr205589wak.17.1245399907748;
        Fri, 19 Jun 2009 01:25:07 -0700 (PDT)
Return-Path: <sjmac...@lexicon.net>
Received: from poplet1.per.eftel.com (poplet1.per.eftel.com [203.24.100.46])
        by gmr-mx.google.com with ESMTP id k19si884798waf.1.2009.06.19.01.25.06;
        Fri, 19 Jun 2009 01:25:07 -0700 (PDT)
Received-SPF: neutral (google.com: 203.24.100.46 is neither permitted nor denied by best guess record for domain of sjmac...@lexicon.net) client-ip=203.24.100.46;
Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 203.24.100.46 is neither permitted nor denied by best guess record for domain of sjmac...@lexicon.net) smtp.mail=sjmac...@lexicon.net
Received: from [192.168.1.2] (202.76.163.18.dynamic.rev.eftel.com [202.76.163.18])
	by poplet1.per.eftel.com (Postfix) with ESMTP id 0DD154433D
	for <python-excel@googlegroups.com>; Fri, 19 Jun 2009 16:25:03 +0800 (WST)
Message-ID: <4A3B4B5C.9080705@lexicon.net>
Date: Fri, 19 Jun 2009 18:25:00 +1000
From: John Machin <sjmac...@lexicon.net>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: python-excel@googlegroups.com
Subject: Re: [pyxl] Re: Script to convert from XLSX to XLS
References: <0c868271-d6f4-4ae6-9546-16c7a8314183@d25g2000prn.googlegroups.com> <07c4ffbf-f085-4886-abf5-bf7088590147@n7g2000prc.googlegroups.com>
In-Reply-To: <07c4ffbf-f085-4886-abf5-bf7088590147@n7g2000prc.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

On 19/06/2009 10:45 AM, vasudevram wrote:
> 
> 
> On Jun 15, 9:29 pm, Michael <selmo2...@gmail.com> wrote:
>> Recently I needed to quickly convert XLSX workbooks to XLS workbooks
>> so I could then interact with them via xlrd.  Here it is, hopefully it
>> will be useful to someone.  :)  Note that pywin32 is required to
>> interact with Excel 2007, so unfortunately this script will work only
>> on Windows with Excel 2007 installed.

[snip]

> 
> Interesting approach.
> 
> For a possibly limited but cross-platform way (i.e. don't need to be
> on Windows or use pywin32) to do the same conversion from .XLSX
> to .XLS files, it is also possible to use an XML parser, such as a SAX-
> capable parser (to read the content (*) of the .XLSX files, and then
> write the same content to .XLS files using, I guess, the Python xlwt
> library, which is mentioned in other messages in this group.). (I have
> not used xlwt (yet), which is why I said "I guess", though I have used
> its counterpart for reading, xlrd, in my xtopdf toolkit.)
> 
>  (*) Conditions apply - see below.
> 
> This alternative method is possible because .XLSX format files are a
> kind of XML. There is a recipe for how to extract the text-only
> content (i.e. numbers and strings, no formatting or images or charts -
> this is the condition mentioned above) of .XLSX files, using SAX, in
> the Python Cookbook 2nd Edition.

XLSX files were introduced by Excel 2007 (i.e. v12). An XLSX file is a 
ZIP file containing a bundle of XML documents. The Python Cookbook 2nd 
Edition was published in 2005. The recipe (12.7) to which you refer 
relates to the XML files that can be produced by Excel 2003 (v11) and 
Excel XP (v10), using the "XML Spreadsheet" option of "Save as". The two 
formats are XMLly and Microsofty but otherwise rather dissimilar.

> I had tried out that recipe some time
> ago (it worked fine, though I had to tweak it a bit), and used it to
> convert the (text-only) content of .XLSX files to PDF, as part of my
> xtopdf toolkit. That code is not in the xtopdf release yet, but will
> be after some time. If I can dig up the (standalone) code I wrote for
> that conversion, I'll post a link to it here in a few days. But
> basically, it's really easy to read .XLSX content with Python using
> SAX, since there are clearly defined XML elements for tables, rows and
> cells. In fact, that means you can also read the .XLSX content using
> any language that has a SAX XML parser, not just Python.

Any parser within reason can be used, not just SAX. We have an XLSX 
parser (using ElementTree) in the queue to be plugged into xlrd. It 
handles the basics i.e. open_workbook(..., formatting_info=0).


Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google