I picked up the book Python for Data Analysis as I've been seeing it
mentioned in quite a few places. And so far, it's great. A very good high
level overview of using Pandas. No, not the cute kind of pandas. I'm
talking about the Python library for data analysis. Derp.
Anyhow, I decided to dive in and see what I could find out about my neighbors.
Chapter 9 of the book goes into analyzing the
2012 Federal Election Commission Database so I loaded it up:
>>> import pandas as pd
>>> fec = pd.read_csv('P00000001-ALL.csv')
Looking into the data, there is some garbage rows. I grabbed all the Culver
City zip codes (well, the zip codes I care about) at least:
>>> zips = fec.contbr_zip.unique()
>>> mask = np.array([str(x).startswith('90232') for x in zips])
CULVER CITY 241
CUILVER CITY 2
SANTA MONICA 1
I don't know if these come from bad data from the contributor or from the FEC so I'm just
going to include everything based on zip code.
>>> culver = fec[fec.contbr_zip.isin(zips[mask])]
Fifty-eight grand! Nice going Culver City!
Now let's see who got the money:
>>> culver.pivot_table('contb_receipt_amt', rows='cand_nm', aggfunc=sum)
Huntsman, Jon 4500
Obama, Barack 50381
Paul, Ron 500
Roemer, Charles E. 'Buddy' III 110
Romney, Mitt 2850
That's kind of interesting...Huntsman got more money from the 90232 than
Now, let's check out the occupations that contributed the most:
>>> culver.pivot_table('contb_receipt_amt', rows='contbr_occupation',
ACCOUNT MANAGER 5000.0
VICE PRESIDENT, INTERNET MARKETING 4000.0
PRESIDENT & C.E.O. 2500.0
GALLERY OWNER 2500.0
INTERIOR DESIGNER 1500.0
Retirees going large. That's kind of interesting. Let's look at that.
>>> culver[culver.contbr_occupation == 'RETIRED'].pivot_table(
... 'contb_receipt_amt', rows='cand_nm', aggfunc=sum)
Obama, Barack 7162
Roemer, Charles E. 'Buddy' III 10
Romney, Mitt 100
Maybe I misunderstand our local retirees (at least the ones I've met) but this was
surprising to me. I really expected Romney to come out on top.
I think that's enough peeking into my neighbors contributions habits for one
night. I have to say Pandas makes this sort of thing really easy.
I've only scratched the surface here. There's lots more that one can do
(mathematically speaking) with Panads. Python for Data Analysis gives
you a really good introduction to Pandas and then the webiste fills
in the gaps.
Python for Data Analysis and Panads get two thumbs up from me. Thanks
to O'Reilly and Wes McKinney.