> I need to process the Report_2_5 and extract the following output
> Stat1: 40220 > Vol2 : 83000 > Amount3 : 40
> I am new to programming and I have been extracting this data using MS – Access and I badly need a better solution.
Have you considered using a proper SQL database? (See http://en.wikipedia.org/wiki/SQL ; MySQL is one example: http://en.wikipedia.org/wiki/MySQL) Mucking around with CSV files like this is basically doing the work of some simple SQL queries, only in an ad-hoc, inefficient manner. (MS Access is essentially a non-industrial-strength SQL for non-programmers.)
>> I need to process the Report_2_5 and extract the following output
>> Stat1: 40220 >> Vol2 : 83000 >> Amount3 : 40
>> I am new to programming and I have been extracting this data using MS – >> Access and I badly need a better solution.
> Have you considered using a proper SQL database? (See > http://en.wikipedia.org/wiki/SQL ; MySQL is one example: > http://en.wikipedia.org/wiki/MySQL) > Mucking around with CSV files like this is basically doing the work of > some simple SQL queries, only in an ad-hoc, inefficient manner. (MS > Access is essentially a non-industrial-strength SQL for > non-programmers.)
Industrial strength or not, Access should be capable of solving the OP's problem. So it would be interesting what's so bad about it in this case.
Anyway, here's a database-free python solution:
import csv
REPORT = "report.csv" CUSTOMERS = "customers.csv"
with open(CUSTOMERS) as instream: next(instream) # skip header
# put customer ids into a set for fast lookup customer_ids = set(line.strip() for line in instream)
with open(REPORT) as instream: rows = csv.reader(instream)
# find columns headers = [column.strip() for column in rows.next()] customer_column = headers.index("Customer ID") sum_over_columns = [headers.index(s) for s in "stat1 vol2 amount3".split()]
# initialize totals sigma = [0] * len(headers)
# calculate totals for row in rows: if row[customer_column] in customer_ids: for index in sum_over_columns: sigma[index] += int(row[index]) # print totals for index in sum_over_columns: print "%-10s %10d" % (headers[index] + ":", sigma[index])
The limiting factor for this approach is the customer_ids set which at some point may no longer fit into memory.