I know this has been asked in the past, but is support for
reading .xlsx (Excel 2007) format closer to being complete?
The reason I ask is because the included README.html mentions that
support is scheduled for v0.7.1 which is the current version. I tried
to read a simple excel 2007 (under ubuntu linux, python 2.5.4) file
and was greeted with the following error:
---
>>> book = xlrd.open_workbook("myexcel2007book.xlsx")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "xlrd/__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "xlrd/__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos
+8])
File "xlrd/__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected
BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
---
So my guess is that it's not ready and that's fine. I was just
interested in the status.
On Thu, Jun 25, 2009 at 6:21 AM, Darryl Wallace<walla...@gmail.com> wrote:
> I know this has been asked in the past, but is support for
> reading .xlsx (Excel 2007) format closer to being complete?
> The reason I ask is because the included README.html mentions that
> support is scheduled for v0.7.1 which is the current version. I tried
> to read a simple excel 2007 (under ubuntu linux, python 2.5.4) file
> and was greeted with the following error:
> ---
>>>> book = xlrd.open_workbook("myexcel2007book.xlsx")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "xlrd/__init__.py", line 429, in open_workbook
> biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
> File "xlrd/__init__.py", line 1545, in getbof
> bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos
> +8])
> File "xlrd/__init__.py", line 1539, in bof_error
> raise XLRDError('Unsupported format, or corrupt file: ' + msg)
> xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected
> BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
> ---
> So my guess is that it's not ready and that's fine. I was just
> interested in the status.
> I know this has been asked in the past, but is support for > reading .xlsx (Excel 2007) format closer to being complete?
The current intention is this: Basic support will be in the next release, whenever that is, unless something happens that causes it not to be. It is intended to support on_demand=True but not formatting_info=True. Support for *any* version of Excel is unlikely ever to be "complete".
> The reason I ask is because the included README.html mentions that > support is scheduled for v0.7.1 which is the current version.
s/is/was/
I apologise for the slackness of the documentation team :-)
> I tried > to read a simple excel 2007 (under ubuntu linux, python 2.5.4) file > and was greeted with the following error: > --- >>>> book = xlrd.open_workbook("myexcel2007book.xlsx") > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "xlrd/__init__.py", line 429, in open_workbook > biff_version = bk.getbof(XL_WORKBOOK_GLOBALS) > File "xlrd/__init__.py", line 1545, in getbof > bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos > +8]) > File "xlrd/__init__.py", line 1539, in bof_error > raise XLRDError('Unsupported format, or corrupt file: ' + msg) > xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected > BOF record; found 'PK\x03\x04\x14\x00\x06\x00' > --- > So my guess is that it's not ready and that's fine. I was just > interested in the status.
If you have some non-simple XLSX files that you think may test the capabilities of the development team, please send them. Of particular interest would be files created by software other than Excel itself. As with previous Excel versions, Microsoft documentation will say "you must do X" but Excel will support reading non-X. This has already occurred with the docs saying you must use the shared string table; C# code supplied by an MS write-your-own-XLSX workshop doesn't comply but Excel accepts the resultant file silently.
On Jun 24, 9:16 pm, Daniel Burke <dan.p.bu...@gmail.com> wrote:
> xlsx files are zip archives with xml files in them, you can read them
> with your favorite DOM parser if you're impatient.
Well.. thanks for the tip and your interesting contribution to this
thread? I understand they are zip archives with xml files in them. I
am, however, not impatient. I was simply asking for clarification.
> The current intention is this:
> Basic support will be in the next release, whenever that is, unless
> something happens that causes it not to be. It is intended to support
> on_demand=True but not formatting_info=True. Support for *any* version
> of Excel is unlikely ever to be "complete".
Thanks for the update!
> > The reason I ask is because the included README.html mentions that
> > support is scheduled for v0.7.1 which is the current version.
> s/is/was/
> I apologise for the slackness of the documentation team :-)
No problem.
> If you have some non-simple XLSX files that you think may test the
> capabilities of the development team, please send them. Of particular
> interest would be files created by software other than Excel itself. As
> with previous Excel versions, Microsoft documentation will say "you must
> do X" but Excel will support reading non-X. This has already occurred
> with the docs saying you must use the shared string table; C# code
> supplied by an MS write-your-own-XLSX workshop doesn't comply but Excel
> accepts the resultant file silently.
Most excel files I need to read in are simple, basic, data tables.
The main thing I'm interested in is simply the ability to have larger
data tables (>256 columns, etc.)
Thanks for your work on this library. It's proven to be extremely
useful.