Social and organizational life are increasingly conducted or tracked online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” It is a good bet that within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “how to make use of big data,” just as statistical analysis is a standard part of such training today. The ICOS Big Data Camp aims to make big data accessible for people with no prior background. We want people to leave with enough confidence and basic knowledge to be able to know what is possible in their research and where they might go next, drawing on resources at the University of Michigan. Organizing committee: Jerry Davis, Cliff Lampe, Brian Noble, and Jason Owen-Smith
Instructors: Brian Noble, Matt Burton, Michael Cafarella, Colleen Van Lent, Felix Kabo, Russ Funk, Todd Schifeling
Helpers: Khevna Shah, Nick Repole, Guarav Singhal, Tyler Markvluwer, Jonathan Pevarnek, Matt Baumgartner, Matthew Sullivan
Who: The course is aimed at graduate students and other researchers.
Requirements: Participants must bring a laptop with a few specific software packages installed (listed below).
Contact: Please mail mcburton@umich.edu for more information.
Monday | 09:00 | Introduction and Overview (Intro ppt, pdf. Assignment ppt, pdf) |
10:15 | Break | |
10:30 | Hero's Journey #1 - Sarita Yardi Schoenebeck | |
11:15 | Hero's Journey #2 - Cliff Lampe | |
12:00 | Lunch break | |
1:00 | Group formation & How to learn in groups: lessons from design teams, Brian Noble | |
2:00 | The Setup & Command line with Matt Burton (Tutorials: install, command line) | |
Tuesday | 09:00-11:00 | Introduction to SQL with Mike Cafarella (Slides: ppt pdf) |
10:00-10:15 | Coffee Break | |
11:00-12:00 | Using SQL with Felix Kabo (Slides: ppt pdf & data) | |
12:00-1:00 | Lunch break | |
1:00-5:00 | Group Work (play data) | |
4:00-5:00 | Check-in and end of day discussion | |
SIGN IN SHEET | ||
Wednesday | 09:00-11:00 | Introduction to Python with Colleen Van Lent (HTML, Notebook) |
10:00-10:15 | Coffee Break | |
11:00-12:00 | Using with Python with Russ Funk (slides, code, GitHub) | |
12:00-1:00 | Lunch break | |
1:00-4:00 | Group Work (scraping links) | |
4:00-5:00 | Check-in and end of day discussion | |
Thursday | 9:00-9:20 | Now What? with Sharon Broude Geva |
09:20-11:00 | Introduction to APIs with Brian Noble (Install, Code, Lecture) | |
10:00-10:15 | Coffee Break | |
11:00-12:00 | Using APIs with Todd Schifeling (Slides: ppt, pdf, Code) | |
12:00-1:00 | Lunch break | |
1:00-4:00 | Group Work & Python + SQL (slides, code) | |
4:00-5:00 | Check-in and end of day discussion | |
Thursday May 29th | 1:00-4:00 | Final Session with Group Presentations. Ross R0230 |
4:00-5:00 | Dominicks! |
To participate in the ICOS Big Data Summercamp, you will need working copies of the software described below. Please make sure to install everything (or at least to download the installers) before the start of your bootcamp.
When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words.
Bash is a commonly-used shell. Using a shell gives you more power to do more tasks more quickly with your computer.
Python is becoming very popular in scientific computing, and it's a great language for teaching general programming concepts due to its easy-to-read syntax. We teach with Python version 2.7, since it is still the most widely used. Installing all the scientific packages for Python individually can be a bit difficult, so we recommend an all-in-one installer.
The IPython Notebook is a web-based interface for interactive computing with Python. Individual notebooks are composable, executable, and sharable documents that mix text, code, data, and visualizations. The IPython Notebook comes pre-loaded on many all-in-one python installers like Anaconda CE.
SQL is a specialized programming language used with databases. SQL is a declarative langauge for describing (declaring) the data you want from the database. We use a firefox plugin called SQLite Manager, for the lessons.
Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). Please ask your instructor to help you do this.
Windows doesn't have sqlite3
available on the the command line,
so we will use this plugin
for Firefox instead.
To install it:
We recommend
Text Wrangler or
Sublime Text.
In a pinch, you can use nano
or vi
,
which should be pre-installed.
Instead of using sqlite3
from the command line,
we will use this plugin
for Firefox instead.
To install it: