How to get start with Pandas¶
In this notebook, I followed the examples in Python for Data Analysis to learn how to use Pandas.
from pandas import Series, DataFrame
import pandas as pd
obj = Series([4, 7, -5, 3])
The default index for the Series is integers starting from 0.
obj
print obj.values
print obj.index
index can be set by user
obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
obj2
print obj2['a']
print "---"
obj2['d'] = 6
print obj2[['c', 'a', 'd']]
You can do anything you can do on numpy array, Series will just keep its index.
print obj2
print "---"
print obj2[obj2 > 0]
print "---"
print obj2 * 2
print "---"
import numpy as np
print np.exp(obj2)
Series looks just like a dict. Actually you can create a Series by a python dict.
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj3 = Series(sdata)
obj3
You input the data with the index you assigned. See the example below. Utah no longer appears in obj4. Since there is no value for California in dict, the value is given as NaN.
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj4 = Series(sdata, index=states)
obj4
pandas provides isnull and notnull to detect the missing data.
print obj4.isnull()
print '---'
print pd.isnull(obj4)
print '---'
print obj4.notnull()
print '---'
print pd.notnull(obj4)
When you add two Series, the values will be lined up by their indices.
print obj3
print '---'
print obj4
print '---'
print obj3 + obj4
User can give name to Series object and its index.
obj4.name = 'population'
obj4.index.name = 'state'
obj4
index of Series can be changed
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
obj
DataFrame¶
data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
'year': [2000, 2001, 2002, 2001, 2002],
'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
frame = DataFrame(data)
print frame
In ipython notebook, DataFrame can be shown as a table
frame
the columns will be sorted if you specify columns
DataFrame(data, columns=['year', 'state', 'pop'])
Just like Series, if you input a column which is not in the data, it will display as NaN
frame2 = DataFrame(data, columns=['year', 'state', 'pop', 'debt'],
index=['one', 'two', 'three', 'four', 'five'])
frame2
frame2.columns
You can get column data in two ways
frame2['state']
frame2.year
You can get row data by ix
frame2.ix['three']
The column value can be changed by assignment.
frame2['debt'] = 16.5
frame2
frame2['debt'] = np.arange(5.)
frame2
val = Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
frame2['debt'] = val
frame2
A new column will be created if you assign value to a column which does not exist.
frame2['eastern'] = frame2.state == 'Ohio'
frame2
delete a column by del
del frame2['eastern']
frame2
Another way to create DataFrame is by nesting dictionary
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
frame3 = DataFrame(pop)
frame3
frame3.T
frame3.index.name = 'year'
frame3.columns.name = 'state'
frame3
values will return a ndarray with the values in the DataFrame
frame3.values