Is there a better way to read a CSV file and store for processing?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Is there a better way to read a CSV file and store for processing?


So I was wondering if there might be a "better" way to go about reading a CSV file and storing it for later post-processing. What I have written does the job fine, but I think there might be a better way as I seem to be duplicating some steps to get around things I don't know. For example I guess ideally I would like to read the CSV file into a numpy array one could access by variable names but I couldn't work that out. Any thoughts welcome.


CSV file looks a bit like this

Year,Day of the year,NPP, etc...
--,--,some units, etc...
YEAR,DOY,NPP, etc...
1996.0,1.0,10.09, etc...


#!/usr/bin/env python

Example of reading CSV file and some simple processing...

    1. Read CSV file into a python dictionary/list
    2. Save the data to a pickle object, to speed up reading back in
    3. Read the object back in to test everything is fine
    4. Get the timeseries of one of the variables, print it and plot it...
__author__ = "Martin De Kauwe"
__version__ = "1.0 (09.03.2012)"
__email__ = ""

import numpy as np
import sys
import glob
import csv
import cPickle as pickle

def main():
    for fname in glob.glob("*.csv"):
        data = read_csv_file(fname, head_length=3, delim=",")
        # save the data to the hard disk for quick access later
        pkl_fname = "test_model_data.pkl"
        save_dictionary(data, pkl_fname)
        # read the data back in to check it worked...
        f = open(pkl_fname, 'rb')
        data = pickle.load(f)
        npp = get_var(data, "NPP")
        for i in xrange(len(npp)):
            print npp[i]
        import matplotlib.pyplot as plt
        plt.plot(npp, "ro-")

def read_csv_file(fname, head_length=None, delim=None):
    """ read the csv file into a dictionary """
    f = open(fname, "rb")
    # read the correct header keys...
    f = find_header_keys(f, line_with_keys=2)
    # read the data into a nice big dictionary...and return as a list
    reader = csv.DictReader(f, delimiter=',')
    data = [row for row in reader]
    return data
def find_header_keys(fp, line_with_keys=None):
    """ Incase the csv file doesn't have the header keys on the first line,
    advanced the pointer until the line we desire """
    dialect = csv.Sniffer().sniff(
    for i in xrange(line_with_keys):
    return fp
def save_dictionary(data, outfname):
    """ save dictionary to disk, i.e. pickle it """
    out_dict = open(outfname, 'wb')
    pickle.dump(data, out_dict, pickle.HIGHEST_PROTOCOL)
def get_var(data, var):
    """ return the entire time series for a given variable """
    return np.asarray([data[i][var] for i in xrange(len(data))])

if __name__ == "__main__":