support@example.com

Upgrade Your cooking game with our ultimate recipe E-book, packed with 50+ mouth-watering recipes.

Concepts
Why they’re important
‘Concept Misinformation’
How to tackle the class
Agenda

 

What we’ll cover today

 

Organizing
Manipulating
Visualizing data

 

Before statistics & visualization…

 

Data must be ‘cleaned’
What to do with missing values?
How to address typos
Levels of measurement

 

Statistics involves decisions
Tidy Data

 

Normative

 

Our systematically collected and organized observations should be easy to work with

 

Characteristics

 

Intuitive column names
No typos
Easy to analyze
Idea to output gap is small

 

Technical

 

Our systematically collected and organized observations should have certain technical characteristics

 

Characteristics

 

Each column is one variable
Each row is one observation
Each cell is one value

 

Tidy Data

 

Variable: A characteristic across which individual observations vary in expression

 

Row: An observation, one row per student

 

Cell: A value for the observation for a variable

 

Example

 

What are my students’ favorite colors?

 

Variables

 

Student unique ID (student_id)
Favorite Color (favorite_color)
Tidy Data
student_id favorite_color
1 glue
2 NA
3 blurple
4 purple
5 ted
6 NA

 

The Garden of Forking Paths

 

Statistics is about making decisions

 

Data never speaks for itself

 

How do you deal with missing values?

 

How do you deal with strange values?

 

How does your analysis differ when you make different choices?
The Garden of Forking Paths

 

Easy choices to make

 

glue -> blue
ted -> red

 

Tougher choices to make

 

blurple -> blue
blurple -> purple
blurple -> student messing with me

 

I need to make a choice
The Garden of Forking Paths
Blurple Change the Blurples No change to Blurples Blurple -> Other Blurple -> Blue Blurple -> Purple Discard blurples Blurple means blurple
Implications

 

Change the value

 

Overcounting blues/purples
False sense of consolidation
Maybe blurple is a real color

 

Don’t change the value

 

What do ‘others’ have in common?
Discarding throws out information

 

Alternative Measurements

 

Was my measurement consonant with my concept?
Multiple choice?
Free-text question?

 

Our choices began before we started manipulating data
Statistics sans Numbers

 

Why not cover start with numbers?

 

Data isn’t given
It is produced
Choices were made in the process

 

The order matters

 

Concepts
Measurement
Manipulation
Visualization

 

Statistics sans Numbers

 

Statistics is a tool

 

Political Scientists aren’t second-rate statisticians. They’re social theorists who use statistisc
No Changes
student_id favorite_color
1 blue
2 NA
3 blurple
4 purple
5 red
6 NA
Ways to summarize: count each color
favorite_color count
blue 2
blurple 2
purple 1
red 3
NA 2
Tables vs Graphics

 

Reasons to use visualizations

 

I have the attention span of a goldfish
Tables are boring
Differences become apparent easily
Fewer cognitive resources necessary
People are familiar with the foundations of visualization

 

Elements of Graphics

 

Data: the underlying information. Graphics represent data

 

Aesthetics: The dimensions that represent variables and encode information e.g. x-axis, y-axis, color, etc.

 

Geoms: The shape we decide to use to represent our information
The Garden v2
Data x = color, y = count x = count, y = color geom_point geom_bar geom_point geom_bar
Summarize

 

Summary

 

Discussed Data
Its organization
Its manipulation
Its visualization
Choices are important!

 

Search

About

Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.