Is there a better way to read a CSV file and store for processing?
So I was wondering if there might be a "better" way to go about reading a CSV file and storing it for later post-processing. What I have written does the job fine, but I think there might be a better way as I seem to be duplicating some steps to get around things I don't know. For example I guess ideally I would like to read the CSV file into a numpy array one could access by variable names but I couldn't work that out. Any thoughts welcome.
CSV file looks a bit like this
Year,Day of the year,NPP, etc...
--,--,some units, etc...
Example of reading CSV file and some simple processing...
1. Read CSV file into a python dictionary/list
2. Save the data to a pickle object, to speed up reading back in
3. Read the object back in to test everything is fine
4. Get the timeseries of one of the variables, print it and plot it...
__author__ = "Martin De Kauwe"
__version__ = "1.0 (09.03.2012)"
__email__ = "firstname.lastname@example.org"
import numpy as np
import cPickle as pickle
for fname in glob.glob("*.csv"):
data = read_csv_file(fname, head_length=3, delim=",")
# save the data to the hard disk for quick access later
pkl_fname = "test_model_data.pkl"
# read the data back in to check it worked...
f = open(pkl_fname, 'rb')
data = pickle.load(f)
npp = get_var(data, "NPP")
for i in xrange(len(npp)):
import matplotlib.pyplot as plt
def read_csv_file(fname, head_length=None, delim=None):
""" read the csv file into a dictionary """
f = open(fname, "rb")
# read the correct header keys...
f = find_header_keys(f, line_with_keys=2)
# read the data into a nice big dictionary...and return as a list
reader = csv.DictReader(f, delimiter=',')
data = [row for row in reader]
def find_header_keys(fp, line_with_keys=None):
""" Incase the csv file doesn't have the header keys on the first line,
advanced the pointer until the line we desire """
dialect = csv.Sniffer().sniff(fp.read(1024))
for i in xrange(line_with_keys):
def save_dictionary(data, outfname):
""" save dictionary to disk, i.e. pickle it """
out_dict = open(outfname, 'wb')
pickle.dump(data, out_dict, pickle.HIGHEST_PROTOCOL)
def get_var(data, var):
""" return the entire time series for a given variable """
return np.asarray([data[i][var] for i in xrange(len(data))])