Manipulation and Visualization Political Statistics


Why they’re important
‘Concept Misinformation’
How to tackle the class


What we’ll cover today


Visualizing data


Before statistics & visualization…


Data must be ‘cleaned’
What to do with missing values?
How to address typos
Levels of measurement


Statistics involves decisions
Tidy Data




Our systematically collected and organized observations should be easy to work with




Intuitive column names
No typos
Easy to analyze
Idea to output gap is small




Our systematically collected and organized observations should have certain technical characteristics




Each column is one variable
Each row is one observation
Each cell is one value


Tidy Data


Variable: A characteristic across which individual observations vary in expression


Row: An observation, one row per student


Cell: A value for the observation for a variable




What are my students’ favorite colors?




Student unique ID (student_id)
Favorite Color (favorite_color)
Tidy Data
student_id favorite_color
1 glue
2 NA
3 blurple
4 purple
5 ted
6 NA


The Garden of Forking Paths


Statistics is about making decisions


Data never speaks for itself


How do you deal with missing values?


How do you deal with strange values?


How does your analysis differ when you make different choices?
The Garden of Forking Paths


Easy choices to make


glue -> blue
ted -> red


Tougher choices to make


blurple -> blue
blurple -> purple
blurple -> student messing with me


I need to make a choice
The Garden of Forking Paths
Blurple Change the Blurples No change to Blurples Blurple -> Other Blurple -> Blue Blurple -> Purple Discard blurples Blurple means blurple


Change the value


Overcounting blues/purples
False sense of consolidation
Maybe blurple is a real color


Don’t change the value


What do ‘others’ have in common?
Discarding throws out information


Alternative Measurements


Was my measurement consonant with my concept?
Multiple choice?
Free-text question?


Our choices began before we started manipulating data
Statistics sans Numbers


Why not cover start with numbers?


Data isn’t given
It is produced
Choices were made in the process


The order matters




Statistics sans Numbers


Statistics is a tool


Political Scientists aren’t second-rate statisticians. They’re social theorists who use statistisc
No Changes
student_id favorite_color
1 blue
2 NA
3 blurple
4 purple
5 red
6 NA
Ways to summarize: count each color
favorite_color count
blue 2
blurple 2
purple 1
red 3
NA 2
Tables vs Graphics


Reasons to use visualizations


I have the attention span of a goldfish
Tables are boring
Differences become apparent easily
Fewer cognitive resources necessary
People are familiar with the foundations of visualization


Elements of Graphics


Data: the underlying information. Graphics represent data


Aesthetics: The dimensions that represent variables and encode information e.g. x-axis, y-axis, color, etc.


Geoms: The shape we decide to use to represent our information
The Garden v2
Data x = color, y = count x = count, y = color geom_point geom_bar geom_point geom_bar




Discussed Data
Its organization
Its manipulation
Its visualization
Choices are important!


Share this

Leave a Reply