mardi 4 août 2015

memory issue in json conversion

In my pandas program i am reading a csv and converting some columns as json

For ex: my csv is like this:

id_4 col1  col2 .....................................col100
1     43    56  .....................................67
2     46    67   ....................................78

What i want to achieve is:

id_4 json

1    {"col1":43,"col2":56,.....................,"col100":67}
2    {"col1":46,"col2":67,.....................,"col100":78}

The code what i have tried is as follows:

    df = pd.read_csv('file.csv')
    def func(df):         
        d = [
        dict([
        (colname, row[i])        
        for i,colname in enumerate(df[['col1','col2',............,'col100']])

        for row in zip(df['col1'].astype(str),df['col2'].astype(str),...............,df['col100'].astype(str))]

        format_data = json.dumps(d)
        format_data = format_data[1:len(format_data)-1]
        json_data = '{"key":'+format_data+'}' 
        result.append(pd.Series([df['id_4'].unique()[0],json_data],index = headers))                                        
        return df   

    df.groupby('id_4').apply(func)

b = open('output.csv', 'w')
writer = csv.writer(b)
writer.writerow(headers)
writer.writerows(result[1:len(result)])

The CSV contains some 1 lakh data, memory is (15 MB).when i execute this, after a long time the process is killed automatically. I think its a memory issue.

As i am newbie to this python and pandas, Is there any way to optimize the above code to work properly or increasing the memory is the only way.

I am using 5GB RAM Linux System.

Aucun commentaire:

Enregistrer un commentaire