6 Anova

6.1 Fungsi Anova

menguji perbedaan variasi antar kelompok (lebih dari 2) variabel yang akan kita uji
kita akan menguji household income and happiness
RQ: apakah terdapat perbedaan tingkat kebahagiaan ditinjau dari tingkat pendapatan?

6.2 Library

library(psych)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.1     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ ggplot2::%+%()   masks psych::%+%()
✖ ggplot2::alpha() masks psych::alpha()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(rstatix)


Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter

library(report)

6.3 Membaca data

data pengukuran tingkat kebahagiaan
tingkat DIY

income <- read_csv("income_happiness_diy.csv")

Rows: 913 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Provinsi
dbl (3): Income_ind, Income_hh, Happiness

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(income)

spc_tbl_ [913 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Provinsi  : chr [1:913] "Di Yogyakarta" "Di Yogyakarta" "Di Yogyakarta" "Di Yogyakarta" ...
 $ Income_ind: num [1:913] 5 5 5 NA 5 4 1 3 4 5 ...
 $ Income_hh : num [1:913] 5 5 5 5 3 4 2 1 1 1 ...
 $ Happiness : num [1:913] 6 8 7 6 6 6 9 8 5 8 ...
 - attr(*, "spec")=
  .. cols(
  ..   Provinsi = col_character(),
  ..   Income_ind = col_double(),
  ..   Income_hh = col_double(),
  ..   Happiness = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

6.4 Visualisasi awal

income %>% ggplot(aes(Income_hh, Happiness, color = Income_hh)) +
  geom_jitter()

6.5 Membuat boxplot

income %>% mutate(income_fct = as.factor(Income_hh)) %>% ggplot(aes(income_fct, Happiness, color=Income_hh)) + 
  geom_boxplot()

6.6 Deskriptif

income %>%
  group_by(Income_hh) %>%
  get_summary_stats(Happiness, type = "mean_sd")

# A tibble: 5 × 5
  Income_hh variable      n  mean    sd
      <dbl> <fct>     <dbl> <dbl> <dbl>
1         1 Happiness    58  8.22  1.03
2         2 Happiness    73  8.10  1.03
3         3 Happiness   123  7.50  1.26
4         4 Happiness   288  7.58  1.46
5         5 Happiness   371  7.17  1.46

6.7 Uji anova

beda <- aov(Happiness~Income_hh, data=income)
summary(beda)

             Df Sum Sq Mean Sq F value   Pr(>F)    
Income_hh     1   87.2   87.24   45.71 2.45e-11 ***
Residuals   911 1738.8    1.91                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.8 Effect Size

A small effect size is about .01.
A medium effect size is about .06.
A large effect size is about .14.

report(beda)

The ANOVA (formula: Happiness ~ Income_hh) suggests that:

  - The main effect of Income_hh is statistically significant and small (F(1,
911) = 45.71, p < .001; Eta2 = 0.05, 95% CI [0.03, 1.00])

Effect sizes were labelled following Field's (2013) recommendations.