Sorry for the question. I have just started using xlrd and it seems to
be the perfect solution for what I need (read an excel file into
python!! :-). I noticed that version 0.71 is out. Now, what is the
best way for updating the package without messing things up? I am on
Windows XP. I assume the new version is backward compatible.
Thank you John for writing this very useful package.
> Sorry for the question. I have just started using xlrd and it seems to > be the perfect solution for what I need (read an excel file into > python!! :-). I noticed that version 0.71 is out.
Version 0.7.1 was released on 2009-06-01, well over 4 months ago. When exactly did you "just" start using xlrd, and what version are you using at the moment?
> Now, what is the > best way for updating the package without messing things up? I am on > Windows XP.
Presuming you used a distutils method of installing (the Windows installer (xlrd-VERSION.win32.exe), or running setup.py after extracting from an archive) it should suffice to install over the top of your existing version.
> I assume the new version is backward compatible.
Your usage should be backward compatible provided you have stuck with the documented APIs.
> Thank you John for writing this very useful package.
John, i don't want to sound cheesy, but i have to thank you for your
quick reply.
What you say is very helpful. However, I remembered that I had that
utility easy_install installed and that made the trick.
As far as the version is concerned, I downloaded version 0.6.1 for
windows from www.lexicon.net last week (http://www.lexicon.net/ sjmachin/xlrd.htm). I found out that the new version was out because I
googled "python xlrd xlsx".
I have just one question: I would like it to be a bit quicker (I have
large files): is there a way of speeding it up, possibly by skipping
formatting and just get to the data or is it just the way it is? (not
at all a complaint).
Well, I hope that the answer to my dumb question will help some other
(dumb :-) person as well.
Ciao, marco
On Oct 19, 1:02 pm, John Machin <sjmac...@lexicon.net> wrote:
> > Sorry for the question. I have just started using xlrd and it seems to
> > be the perfect solution for what I need (read an excel file into
> > python!! :-). I noticed that version 0.71 is out.
> Version 0.7.1 was released on 2009-06-01, well over 4 months ago. When
> exactly did you "just" start using xlrd, and what version are you using
> at the moment?
> > Now, what is the
> > best way for updating the package without messing things up? I am on
> > Windows XP.
> Presuming you used a distutils method of installing (the Windows
> installer (xlrd-VERSION.win32.exe), or running setup.py after extracting
> from an archive) it should suffice to install over the top of your
> existing version.
> > I assume the new version is backward compatible.
> Your usage should be backward compatible provided you have stuck with
> the documented APIs.
> > Thank you John for writing this very useful package.
> I have just one question: I would like it to be a bit quicker (I have > large files):
What is "large", what are you running it on (including how much free memory), how long does it take to do what, how long might it take to do the same job using pywin32/COM, what is "a bit quicker", and how much time/effort/money do you want to spend?
> is there a way of speeding it up, possibly by skipping > formatting and just get to the data or is it just the way it is? (not > at all a complaint).
Here are a couple of minimal-investment ideas:
(1) Try using psyco ... e.g. on my box [see below] with Python 2.6.2, the time for open_workbook() with default args drops from 7.9 secs to 2.6 secs with a 15 Mb file; from 31 secs to 10.2 secs with a 120 Mb file -- this is using psyco.full() immediately before the first open_workbook() call.
(2) "possibly by skipping formatting and just get to the data": Read the docs about the args of open_workbook(), especially the formatting_info arg. Eyeball the source code, especially in the neighbourhood of mentions of "formatting_info".
My box has a 2.0 GHz AMD Turion 64 Mobile (running in 32-bit mode) with 896Mb usable memory. OS is Windows XP SP3.
> On Oct 20, 12:07 am, John Machin <sjmac...@lexicon.net> wrote: >> On 20/10/2009 3:12 AM, Marco wrote:
>>> I have just one question: I would like it to be a bit quicker (I have >>> large files): >> What is "large", what are you running it on (including how much free >> memory), how long does it take to do what, how long might it take to do >> the same job using pywin32/COM, what is "a bit quicker", and how much >> time/effort/money do you want to spend?
>>> is there a way of speeding it up, possibly by skipping >>> formatting and just get to the data or is it just the way it is? (not >>> at all a complaint). >> Here are a couple of minimal-investment ideas:
>> (1) Try using psyco ... e.g. on my box [see below] with Python 2.6.2, >> the time for open_workbook() with default args drops from 7.9 secs to >> 2.6 secs with a 15 Mb file; from 31 secs to 10.2 secs with a 120 Mb file >> -- this is using psyco.full() immediately before the first >> open_workbook() call.
>> (2) "possibly by skipping formatting and just get to the data": Read the >> docs about the args of open_workbook(), especially the formatting_info >> arg. Eyeball the source code, especially in the neighbourhood of >> mentions of "formatting_info".
>> My box has a 2.0 GHz AMD Turion 64 Mobile (running in 32-bit mode) with >> 896Mb usable memory. OS is Windows XP SP3.
> Hi John, > I know that big and slow are relative concepts. I haven't measured the > time exactly but, for a single worksheet with 1 date and 44 numerical > columns and about 100,000 rows, it takes well over a minute to read > the values, in my very basic script.
The maximum number of rows in a single worksheet of an XLS file is 65,536 (Excel 97 to Excel 2003). Is 66 about 100?
> I must be doing something silly > because it's much slower than your script with a file of about the > same size. I have a dualcore centrino 1.66Gbx2 and 1GB ram (not > shared), on XP SP3, 32 bit, of course. > > This week I am working on something else but i will try to speed up my > xlrd script soon, maybe trying to give it a bit more CPU as well.
Your script may well need speeding up, but you need to read carefully to ensure that you are comparing like with like. I wrote "the time for open_workbook() with default args drops from 7.9 secs to 2.6 secs" ... note that this is like it says *ONLY* the time for the open_workbook() call. It does *NOT* include any further processing, silly or otherwise.
Dig out the runxlrd.py script (a supported installation method should have left it in your_python_install_directory\scripts) and do this
\whatever\it\takes\runxlrd.py --help ...\runxlrd.py ov yourfile.xls ...\runxlrd.py show yourfile.xls >stdout.txt
The last line above dumps out a line for each non-empty cell. You might like to compare the code in runxlrd.py and the time it takes with yours.
> Psycho? Oh, no, another module!
"Psycho"??? Oh, no, another inattentive reader :-) Reread my previous message.
As I said, all you need to do is import the module and make one function call. The result is way up there on the bangs per buck scale. It's not "another module".
> I am still using python 2.5, I plan to > install python(x,y) 2.6 this week. Then, I will follow your > suggestions (Psycho), and look at the documentation!! > > In any case, i will let you kow if I make progress, maybe we could > start a new thread for this topic.
More importantly, let us know if you /don't/ make progress :-) If all else fails, publish your code for review.
John,
this week I am working on something else. Just wanted to tell you that
since version 2007 the Excel grid is 1,048,576 rows by 16,384 columns,
even MS products improve sometimes!
Happy to hear that i can report failures in improving the speed. Will
do when i have time.
Thank you very much for your help, John.
Marco
On Oct 20, 12:13 pm, John Machin <sjmac...@lexicon.net> wrote:
> === Also I'm presuming you didn't intend to go off-list so I'm pushing
> this back there ===
> > On Oct 20, 12:07 am, John Machin <sjmac...@lexicon.net> wrote:
> >> On 20/10/2009 3:12 AM, Marco wrote:
> >>> I have just one question: I would like it to be a bit quicker (I have
> >>> large files):
> >> What is "large", what are you running it on (including how much free
> >> memory), how long does it take to do what, how long might it take to do
> >> the same job using pywin32/COM, what is "a bit quicker", and how much
> >> time/effort/money do you want to spend?
> >>> is there a way of speeding it up, possibly by skipping
> >>> formatting and just get to the data or is it just the way it is? (not
> >>> at all a complaint).
> >> Here are a couple of minimal-investment ideas:
> >> (1) Try using psyco ... e.g. on my box [see below] with Python 2.6.2,
> >> the time for open_workbook() with default args drops from 7.9 secs to
> >> 2.6 secs with a 15 Mb file; from 31 secs to 10.2 secs with a 120 Mb file
> >> -- this is using psyco.full() immediately before the first
> >> open_workbook() call.
> >> (2) "possibly by skipping formatting and just get to the data": Read the
> >> docs about the args of open_workbook(), especially the formatting_info
> >> arg. Eyeball the source code, especially in the neighbourhood of
> >> mentions of "formatting_info".
> >> My box has a 2.0 GHz AMD Turion 64 Mobile (running in 32-bit mode) with
> >> 896Mb usable memory. OS is Windows XP SP3.
> > Hi John,
> > I know that big and slow are relative concepts. I haven't measured the
> > time exactly but, for a single worksheet with 1 date and 44 numerical
> > columns and about 100,000 rows, it takes well over a minute to read
> > the values, in my very basic script.
> The maximum number of rows in a single worksheet of an XLS file is
> 65,536 (Excel 97 to Excel 2003). Is 66 about 100?
> > I must be doing something silly
> > because it's much slower than your script with a file of about the
> > same size. I have a dualcore centrino 1.66Gbx2 and 1GB ram (not
> > shared), on XP SP3, 32 bit, of course.
> >
> > This week I am working on something else but i will try to speed up my
> > xlrd script soon, maybe trying to give it a bit more CPU as well.
> Your script may well need speeding up, but you need to read carefully to
> ensure that you are comparing like with like. I wrote "the time for
> open_workbook() with default args drops from 7.9 secs to 2.6 secs" ...
> note that this is like it says *ONLY* the time for the open_workbook()
> call. It does *NOT* include any further processing, silly or otherwise.
> Dig out the runxlrd.py script (a supported installation method should
> have left it in your_python_install_directory\scripts) and do this
> \whatever\it\takes\runxlrd.py --help
> ...\runxlrd.py ov yourfile.xls
> ...\runxlrd.py show yourfile.xls >stdout.txt
> The last line above dumps out a line for each non-empty cell. You might
> like to compare the code in runxlrd.py and the time it takes with yours.
> > Psycho? Oh, no, another module!
> "Psycho"??? Oh, no, another inattentive reader :-) Reread my previous
> message.
> As I said, all you need to do is import the module and make one function
> call. The result is way up there on the bangs per buck scale. It's not
> "another module".
> > I am still using python 2.5, I plan to
> > install python(x,y) 2.6 this week. Then, I will follow your
> > suggestions (Psycho), and look at the documentation!!
> >
> > In any case, i will let you kow if I make progress, maybe we could
> > start a new thread for this topic.
> More importantly, let us know if you /don't/ make progress :-) If all
> else fails, publish your code for review.
> John, > this week I am working on something else. Just wanted to tell you that > since version 2007 the Excel grid is 1,048,576 rows by 16,384 columns, > even MS products improve sometimes!
That large grid is available only in XLS[XMB] files. If you use Excel 2007 to save a workbook containing cells outside the older limits as an XLS file, it will warn you and ignore offending cells if you persist.
No released version of xlrd supports reading other than XLS files.
Your statements that your script was reading files with about 100,000 rows in a single worksheet and that you were using xlrd are not compatible except under an extremely charitable interpretation of "about", or some other "deus ex machina".