I was just noticing that any field in Excel represented as number will be interpreted by xlrd as a python float then has one decimal point. For example 4 would be represented as 4.0.
I realize this is easy to fix in any python code by converting to an int and then converting to a str, but I'm wondering if there's a way to change how this works in xlrd. Perhaps a number with 0 decimal places should return an int, or at least it should do the conversion to an int so the user doesn't have to.
> I was just noticing that any field in Excel represented as number will > be interpreted by xlrd as a python float then has one decimal point. For > example 4 would be represented as 4.0.
As far as Excel's calculations are concerned, there is no distinction between a float with zero fractional part and an integer. Excel does save space in the XLS file where a number can be represented exactly as a twos-complement 30-bit integer, but this is not mandatory and on re-reading by Excel so-represented numbers are converted to float and their history is forgotten.
"one decimal point"? Did you mean "one decimal place"? Floats don't have decimal places. Excel's 4 is represented as an IEEE 794 64-bit float, not "4.0".
> I realize this is easy to fix in any python code by converting to an int > and then converting to a str,
What is the problem that you say needs fixing?
> but I'm wondering if there's a way to > change how this works in xlrd. Perhaps a number with 0 decimal places > should return an int, or at least it should do the conversion to an int > so the user doesn't have to.
You write as though "return an int" and "do the conversion to an int" are different -- please explain.
Consider a worksheet that is valuing your stock exchange investments. It will have columns for quantity, price, and market value. Yes, all of the quantity numbers are expected to be integers. However the price and MV columns contain floats, some of which may have a zero fractional part. Do you really want a display treatment for a number that's determined programatically without inspecting its neighbours?
I'll start by saying sorry about some of the mistakes I had in my last email, it was written hastily.
On Sat, Oct 17, 2009 at 7:37 PM, John Machin <sjmac...@lexicon.net> wrote: > As far as Excel's calculations are concerned, there is no distinction > between a float with zero fractional part and an integer. Excel does > save space in the XLS file where a number can be represented exactly as > a twos-complement 30-bit integer, but this is not mandatory and on > re-reading by Excel so-represented numbers are converted to float and > their history is forgotten.
Now that you've explained that Excel actually stores all numbers as floats, that gives me a better understanding of why this is happening.
> I realize this is easy to fix in any python code by converting to an int > > and then converting to a str,
> What is the problem that you say needs fixing?
The problem I'm referring to is the discrepancy between how Excel represents numbers to the user and how I was seeing Python represent them after parsing them using xlrd. However, I now understand this is just how Excel stores numbers.
But even if that's how Excel stores numbers, there must be some place that it's storing that metadata about how many decimal places a field should represent to the user. Is that information easily accessible?
> Consider a worksheet that is valuing your stock exchange investments. It > will have columns for quantity, price, and market value. Yes, all of the > quantity numbers are expected to be integers. However the price and MV > columns contain floats, some of which may have a zero fractional part. > Do you really want a display treatment for a number that's determined > programatically without inspecting its neighbours?
I understand that a quantity should be an integer, while a price could be a dollar amount represented as a float, but I'm not sure about what market value is, and I'm not sure how the three would be related, so I'm not sure I understand the point of your example.
> I'll start by saying sorry about some of the mistakes I had in my last > email, it was written hastily.
> On Sat, Oct 17, 2009 at 7:37 PM, John Machin <sjmac...@lexicon.net > <mailto:sjmac...@lexicon.net>> wrote:
> As far as Excel's calculations are concerned, there is no distinction > between a float with zero fractional part and an integer. Excel does > save space in the XLS file where a number can be represented exactly as > a twos-complement 30-bit integer, but this is not mandatory and on > re-reading by Excel so-represented numbers are converted to float and > their history is forgotten.
> Now that you've explained that Excel actually stores all numbers as > floats, that gives me a better understanding of why this is happening.
> > I realize this is easy to fix in any python code by converting to > an int > > and then converting to a str,
> What is the problem that you say needs fixing?
> The problem I'm referring to is the discrepancy between how Excel > represents numbers to the user and how I was seeing Python represent > them after parsing them using xlrd. However, I now understand this is > just how Excel stores numbers.
> But even if that's how Excel stores numbers, there must be some place > that it's storing that metadata about how many decimal places a field > should represent to the user. Is that information easily accessible?
It is storing a "number format" string which corresponds to what you see when you do Format/Cells/Number in the Excel UI. That can be retrieved by opening the workbook with formatting_info=True and following this path from a cell's xf_index:
xlrd analyses these format strings only as far as attempting to guess if the number in a cell is intended to represent a date. Nobody has written a general-purpose format string parser.
> Consider a worksheet that is valuing your stock exchange investments. It > will have columns for quantity, price, and market value. Yes, all of the > quantity numbers are expected to be integers. However the price and MV > columns contain floats, some of which may have a zero fractional part. > Do you really want a display treatment for a number that's determined > programatically without inspecting its neighbours?
> I understand that a quantity should be an integer, while a price could > be a dollar amount represented as a float, but I'm not sure about what > market value is, and I'm not sure how the three would be related, so I'm > not sure I understand the point of your example.
quantity * price -> market_value
123 * 4.56 -> 560.88
123 * 1.00 -> 123.00 123 * 1 -> 123 # you don't want to display it like this
124 * 1.25 -> 155.00 124 * 1.25 -> 155 # you don't want to display it like this
The point is that displaying a whole-number float as an integer unconditionally is not a good idea.
John Machin wrote: > when you do Format/Cells/Number in the Excel UI. That can be retrieved > by opening the workbook with formatting_info=True and following this > path from a cell's xf_index:
> John Machin wrote: >> when you do Format/Cells/Number in the Excel UI. That can be retrieved >> by opening the workbook with formatting_info=True and following this >> path from a cell's xf_index:
John Machin wrote: > On 21/10/2009 1:42 AM, Chris Withers wrote: >> John Machin wrote: >>> when you do Format/Cells/Number in the Excel UI. That can be retrieved >>> by opening the workbook with formatting_info=True and following this >>> path from a cell's xf_index:
>>> xf = Book.xf_list[xf_index] >>> fmt_key = xf.format_key >>> fmt_obj = Book.format_map[fmt_key] >>> fmt_str = fmt_obj.format_str >> Given the shim-ish nature of Cell objects already, any chance they could >> grow a: