• 12
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

How to read a column of csv as dtype list using pandas?

I have a csv file with 3 columns, wherein each row of Column 3 has list of values in it. As you can see from the following table structure

Col1,Col2,Col3
1,a1,"['Proj1', 'Proj2']"
2,a2,"['Proj3', 'Proj2']"
3,a3,"['Proj4', 'Proj1']"
4,a4,"['Proj3', 'Proj4']"
5,a5,"['Proj5', 'Proj2']"

Whenever I try to read this csv, Col3 is getting read as str object and not as list. I tried to alter the dtype of that column to list but got "Attribute Error" as below

df = pd.read_csv("inputfile.csv")
df.Col3.dtype = list

AttributeError                            Traceback (most recent call last)
<ipython-input-19-6f9ec76b1b30> in <module>()
----> 1 df.Col3.dtype = list

C:\Python27\lib\site-packages\pandas\core\generic.pyc in __setattr__(self,         name, value)
   1953                     object.__setattr__(self, name, value)
   1954             except (AttributeError, TypeError):
-> 1955                 object.__setattr__(self, name, value)
   1956 
   1957     #----------------------------------------------------------------------

AttributeError: can't set attribute

It would be really great if you can guide me how to go about it.

      • 1
    • Can you show us an example of your csv (not as an image) , copy paste first few rows of your csv.
    • Standard warning: nonscalar values aren't really supported by pandas. You can use them, as they're sometimes handy in intermediate steps, but working with them is inconvenient, and that's not going to change in the near future.
    • @PadraicCunningham No the final value will be a String. PFB Sample Illustration of my requirement Input Row:1,a1,"['Proj1', 'Proj2']" Output Rows: 1,a1,"Proj1" 1,a1,"Proj2"

You could use the ast lib:

from ast import literal_eval


df.Col3 = df.Col3.apply(literal_eval)
print(df.Col3[0][0])
Proj1

You can also do it when you create the dataframe from the csv, using converters:

df = pd.read_csv("in.csv",converters={"Col3": literal_eval})

If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:

 df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")})

But you will end up with the strings wrapped in quotes

  • 38
Reply Report

I have a different approach for this, which can be used for string representations of other data types, besides just lists.

You can use the json library and apply json.loads() to the desired column. e.g

import json
df.my_column = df.my_column.apply(json.loads)

For this to work, however, your input strings must be enclosed in double quotations.

  • 2
Reply Report

@Padraic Cunningham's answer will not work if you have to parse lists of strings that do not have quotes. For example, literal_eval will successfully parse "['a', 'b', 'c']", but not "[a, b, c]". To load strings like this, use the PyYAML library.

import io 
import pandas as pd

data = '''
A,B,C
"[1, 2, 3]",True,"[a, b, c]"
"[4, 5, 6]",False,"[d, e, f]"
'''

df = pd.read_csv(io.StringIO(data), sep=',')                                    
df
           A      B          C
0  [1, 2, 3]   True  [a, b, c]
1  [4, 5, 6]  False  [d, e, f]

df['C'].tolist()                                                           
# ['[a, b, c]', '[d, e, f]']

import yaml
df[['A', 'C']] = df[['A', 'C']].applymap(yaml.safe_load) 

df['C'].tolist()                                                           
# [['a', 'b', 'c'], ['d', 'e', 'f']]

yaml can be installed using pip install pyyaml.

  • 1
Reply Report

Trending Tags