Pokracujeme v praci s viacrozmernymi datami “cars” z minuleho tyzdna. Cielom je vytvorit vizualizaciu na casovej osi, zobrazujucu niekolko premennych naraz tak aby sa dali vidiet trendy v case. Pripravil som pre Vas pociatocne kroky, Vasou ulohou bude ich rozvinut dalej, hlavne s vyuzitim balickov ggplot2 a tidyverse.
Vysledky prace na ulohach 1-3 vlozte do Odevzdavarny ako ukol1, budu hodnotene bodovo.
Ak Vam budu pri praci chybat nejake balicky v R, mali by ste byt schopni si ich nainstalovat prikazom install.packages()“. Budu instalovane vo Vasom domovskom adresari. V pripade potreby si vypytajte pristup k serveru hedron.fi.muni.cz
Pokial nemate uchovanu historiu z minuleho tyzdna:
cardata <- read.table("data/cars.data")
names(cardata) <- c("mpg","cylinders","displacement","horsepower","weight","acceleration","model.year","origin")
carnames <- read.table("data/cars.names")
names(carnames) <- c("name")
cars <- data.frame(c(carnames,cardata))
Len velmi povrchne sa zoznamime s moznostami upravy dat pred vizualizaciou. K tomu posluzia funkcie a balicky zo sady balickov “tidyverse”, hlavne tidyr a dplyr.
Upravime do 3.normalnej formy data v subore le_mess.csv. read_csv() z readr posluzi k nacitaniu dat, operator %>% z tidyr k retazeniu prikazov, funkcia gather() zas k reorganizacii premmennych zo stlpcov do novej tabulky (data frame).
library(readr)
life_exp_df <- read_csv("data/le_mess.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## country = col_character()
## )
## See spec(...) for full column specifications.
#View(life_exp_df)
life_exp_df
## # A tibble: 202 x 67
## country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960` `1961` `1962` `1963` `1964` `1965` `1966`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghan… 27.1 27.7 28.2 28.7 29.3 29.8 30.3 30.9 31.4 31.9 32.5 33.0 33.5 34.1 34.6 35.1
## 2 Albania 54.7 55.2 55.8 56.6 57.4 58.4 59.5 60.6 61.8 62.9 63.9 64.8 65.6 66.2 66.6 66.9
## 3 Algeria 43.0 43.5 44.0 44.4 44.9 45.4 45.9 46.4 47.0 47.5 48.0 48.6 49.1 49.6 50.1 50.6
## 4 Angola 31.0 31.6 32.1 32.7 33.2 33.8 34.3 34.9 35.4 36.0 36.5 37.1 37.6 38.2 38.7 39.3
## 5 Antigu… 58.3 58.8 59.3 59.9 60.4 60.9 61.4 62.0 62.5 63.0 63.5 63.9 64.4 64.8 65.2 65.6
## 6 Argent… 61.9 62.5 63.1 63.6 64.0 64.4 64.7 65 65.2 65.4 65.5 65.6 65.7 65.8 66.0 66.1
## 7 Armenia 62.7 63.1 63.6 64.1 64.5 65 65.4 65.9 66.4 66.9 67.3 67.8 68.3 68.8 69.3 69.7
## 8 Aruba 59.0 60.0 61.0 61.9 62.7 63.4 64.1 64.7 65.2 65.7 66.1 66.4 66.8 67.1 67.4 67.8
## 9 Austra… 68.7 69.1 69.7 69.8 70.2 70.0 70.3 70.9 70.4 70.9 71.1 70.9 71.0 70.6 71.0 70.8
## 10 Austria 65.2 66.8 67.3 67.3 67.6 67.7 67.5 68.5 68.4 68.8 69.7 69.5 69.6 70.1 69.9 70.2
## # … with 192 more rows, and 50 more variables: `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>, `1971` <dbl>,
## # `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
## # `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>,
## # `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>,
## # `1996` <dbl>, `1997` <dbl>, `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>,
## # `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>, `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>,
## # `2012` <dbl>, `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
life_exp_tidy <- life_exp_df %>% gather(key = "year", value = "life_exp", -country)
#View(life_exp_tidy)
life_exp_tidy
## # A tibble: 13,332 x 3
## country year life_exp
## <chr> <chr> <dbl>
## 1 Afghanistan 1951 27.1
## 2 Albania 1951 54.7
## 3 Algeria 1951 43.0
## 4 Angola 1951 31.0
## 5 Antigua and Barbuda 1951 58.3
## 6 Argentina 1951 61.9
## 7 Armenia 1951 62.7
## 8 Aruba 1951 59.0
## 9 Australia 1951 68.7
## 10 Austria 1951 65.2
## # … with 13,322 more rows
Nastudujte material R for Data Science z osnovy, cast Tidy data a vyrieste cvicenie 12.3.3-4 a 12.6.1-4
Naucite sa pouzivat graficke prikazy z balika ggplot2. Vizualizacia dat v nom vychadza z principov grafickej gramatiky (Grammar of Graphics). Tato rozdeluje rozhodovanie o konecnej podobe grafickeho vystupu na pracu s relativne nezavislymi zlozkami ako su priestor, graficke znacky a mapovanie ich vlastnosti na data.
Porovnajte klasicky vystup s pouzitim systemu grid
plot(cars$model.year, cars$mpg)
a obdobny prikaz v ggplot2
library(ggplot2)
quickplot(data=cars, model.year,mpg)
## Warning: Removed 8 rows containing missing values (geom_point).
Obdobny prikaz s pouzitim plnej vyjadrovacej sily je
g <- ggplot(cars, aes(x=model.year,y=mpg))
g + geom_point()
## Warning: Removed 8 rows containing missing values (geom_point).
Vsimnite si ake jednoduche je zmenit typ vizualizacie
g + stat_summary(geom="area")
## Warning: Removed 8 rows containing non-finite values (stat_summary).
## No summary function supplied, defaulting to `mean_se()
V tejto chvili experimentujte s dalsimi moznostami, ktore balicek ggplot2 poskytuje.
Vytvorime podoblast pre data od roku 1969 do 1983
library(grid)
grid.newpage()
pushViewport(plotViewport())
pushViewport(viewport(xscale=c(1969,1983)))
years <- c(1969:1983)
Vykreslime body pre jednotlive roky, kazdy desiaty vyraznejsie
grid.circle(years,0,0.15,default.units="native",gp=gpar(fill="red"))
syears <- years[which(years/10 == floor(years/10))]
grid.circle(syears,0,0.25,default.units="native",gp=gpar(fill="blue"))
Popiseme modre body rokmi
grid.text(syears,syears,-0.1,default.units="native",rot=90,gp=gpar(fontsize=14,fontface="bold"))
Aby sme mohli opakovat rozne vizualizacie s tou istou casovou osou, ulozime si predchadzajuce prikazy ako funkciu initTimeline()
initTimeline <- function(years){
...
}
Vytvorime farebnu plochu, ktora bude znazornovat spotrebu aut v danom roku Najprv ale potrebujeme funkciu, ktora spocita pocet aut pre dany rok
countCars <- function(year){
length(which(cars$model.year == year-1900))
}
Vyskusajte, ci funguje:
countCars(1971)
## [1] 29
vysledok by mal byt 29. Vyborne!
podobna funkcia nam spocita priemerny pocet valcov pre dany rok
countCylinders <- function(year){
mean(cars$cylinders[which(cars$model.year == year-1900)])
}
countCylinders(1971)
## [1] 5.517241
Je vysledok 5.517241?
a priemernu spotrebu
countMpg <- function (year)
{
if(countCars(year)>0){
mean(cars$mpg[which(cars$model.year == year-1900)],na.rm=TRUE)
} else {
0
}
}
Mozeme zobrazit oblast podla poctu aut v danom roku
grid.polygon(years,mapply(countMpg,years)/100+0.05,default.units="native",gp=gpar(fill="grey"))
Nad nou zobrazime prislusny pocet valcov tak, aby sa zdoraznilo, ze jedna klesa zatial co druha stupa. Potrebujeme funkciu drawCyl()
drawCyl <- function(){
grid.rect(x=0.5,y=0.7,width=0.8,height=0.6,gp=gpar(fill="brown"))
grid.rect(x=0.5,y=0.9,width=0.8,height=0.08,gp=gpar(fill="brown"))
grid.rect(x=0.5,y=0.3,width=0.1,height=0.2,gp=gpar(fill="brown"))
grid.circle(0.5,0.1,0.2,gp=gpar(fill="grey"))
}
… a mozeme zacat vykreslovat, viewport vzdy nastavime na spravne miesto pomocou x= a y=
drawSymbols <- function(year){
n <- round(countCylinders(year))
for(i in 1:n){
pushViewport(viewport(x=year,y=1.0-i*0.075,width=0.5,height=0.06,default.units="native"))
drawCyl()
popViewport()
}
}
Funkcia mapply aplikuje funkciu drawSymbols na vsetky prvky vektora (roky)
mapply(drawSymbols,c(1970:1982))
## [[1]]
## NULL
##
## [[2]]
## NULL
##
## [[3]]
## NULL
##
## [[4]]
## NULL
##
## [[5]]
## NULL
##
## [[6]]
## NULL
##
## [[7]]
## NULL
##
## [[8]]
## NULL
##
## [[9]]
## NULL
##
## [[10]]
## NULL
##
## [[11]]
## NULL
##
## [[12]]
## NULL
##
## [[13]]
## NULL
Output:
Vyrieste 3.6.1-6 z odkazu “R for Data Science”
Ako prve hodnotene cvicenie vyrieste ulohy:
1. 7.5.3.1-3 z odkazu "R for Data Science"
2. s pouzitim nastrojov tidyverse a ggplot2 vytvorte vizualizaciu
+ obdobnu vysledku prikazu z cvicenia 1: xyplot(displacement ~ mpg | cylinders, data = cars)
+ obdobnu vizualizacii z prostredia grid v tomto cviceni vyssie