pandas details

2018-02-24

Classical usage of Pandas

Series & DataFrame are most important elements in pandas. df[‘col’] will return a Series with col as column name; can assign column name and index name with df.index/df.columns = ;can select using integer index like df.iloc[a,b] or using name value like df.loc[‘index1’,’col1’], can also using filter condition such as df.loc[df[‘col’] == val, :]

Use a single ceil to store a list

If you want to uses a single ceil to store a whole list, don’t use .iloc which may cause Valueerr, caonsider using at(m,n), which need column name as find params. Can also use iat(m,n). Actually, dataframe ca definitly used to store Series && Dataframe in a single cell.

Merge & Join

Can merge 2/ more dataframe together using the list/dict as params.
frame = [df1,df2,df3], which all has the same index and column, implied the same structure.
result = pd.concat(frame)
df1.append(index, axis = 0), concat along the index dimension, can also get 2 matrix together.
pandas has full-featured, high performance in-memory join operations idiomatically very similar to relational databases like SQL. These methods perform significantly better (in some cases well over an order of magnitude better) than other open source implementations (like base::merge.data.frame in R). The reason for this is careful algorithmic design and internal layout of the data in DataFrame.
Useful tips when using concat, the index and the columns will all be kept, if you want to regenerate the range num index for concat dataframe, using ignore_index when call concat. newdf = pd.concat(list, ignore_index = True).

Select based on index/position

iloc[] will always select based on continuous position, has nothing to do with your index/columns name. loc will select based on self-assigned index and column values.