Converting Universal Transverse Mercator (UTM) to lattitude/longitude data
Visualizing Spatial Data
Packages and Libraries
library(maps)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.1 ✓ purrr 0.3.3
## ✓ tibble 3.0.1 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x purrr::map() masks maps::map()
library(sp)
library(rmarkdown)
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
World and Thai Maps
First, we’ll use map_data
function from ggplot2
to turn any map from the maps
package into a data frame. This provides longitude and lattitude data. Then, we’ll filter for Thailand to get Thai longtitude and lattitude data.
world.map <- map_data("world")
head(world.map)
## long lat group order region subregion
## 1 -69.89912 12.45200 1 1 Aruba <NA>
## 2 -69.89571 12.42300 1 2 Aruba <NA>
## 3 -69.94219 12.43853 1 3 Aruba <NA>
## 4 -70.00415 12.50049 1 4 Aruba <NA>
## 5 -70.06612 12.54697 1 5 Aruba <NA>
## 6 -70.05088 12.59707 1 6 Aruba <NA>
THAI.map <- world.map %>% filter(region == "Thailand")
head(THAI.map)
## long lat group order region subregion
## 1 99.66309 6.521924 1404 87912 Thailand Ko Tarutao
## 2 99.64404 6.516113 1404 87913 Thailand Ko Tarutao
## 3 99.60664 6.596827 1404 87914 Thailand Ko Tarutao
## 4 99.65401 6.714111 1404 87915 Thailand Ko Tarutao
## 5 99.70136 6.570557 1404 87916 Thailand Ko Tarutao
## 6 99.66309 6.521924 1404 87917 Thailand Ko Tarutao
Longitude and Lattitude Value Ranges
Before converting UTM to longitude/lattitude data, we should know the range of both Longitudes and Lattitudes for Thailand.
summary(THAI.map$long)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 97.37 99.08 100.26 100.71 102.27 105.64
summary(THAI.map$lat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.637 9.084 13.213 13.249 17.820 20.424
Jobpost Data Frame
Our objective is to visualize utm_x and utm_y in the jobpost
data frame by turning them into lattitude and longitude data first. The jobpost
data frame is retrieved from PostgreSQL.
Preparation includes writing it to CSV before loading into Rmarkdown.
jobpost <- read.csv("jobpost.csv")
glimpse(jobpost)
## Rows: 50
## Columns: 25
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ jobpost_id <int> 54, 66, 33, 34, 35, 36, 28, 32, 30, 55, 67, 68, 37,…
## $ job_name <fct> "Facebook Marketing", "แอดมิน", "Accountant", "แคชเ…
## $ job_qty <int> 3, 1, 1, 2, 2, 5, 3, 1, 5, 1, 22, 10, 1, 1, 2, 2, 1…
## $ age_min <int> 22, 25, 29, 20, 20, 19, 28, 28, 20, 25, 30, 21, 18,…
## $ age_max <int> 26, 32, 35, 35, 35, 40, 120, 40, 40, 45, 45, 30, 50…
## $ study_field <fct> "-", "แฟชั่น", "-", "-", "-", "-", "-", "จัดการผักผ…
## $ job_qualification <fct> "อ่าน เขียน ภาษาอังกฤษ ได้ดี", "ตอบคำถาม ภาษาอังกฤษ…
## $ min_salary <int> 30000, 12000, 20000, 13000, 10000, 15000, 15000, 12…
## $ job_description <fct> "ทำการตลาดทางช่องทาง facebook", "แอดมินดูแล เพจ เสื…
## $ manychat_id <dbl> 3.961592e+15, 2.984969e+15, 2.941175e+15, 3.416291e…
## $ job_sex <int> 3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 1, 3, …
## $ study_level <int> 5, 5, 5, 0, 2, 2, 3, 4, 4, 5, 5, 4, 0, 2, 2, 5, 5, …
## $ work_exp <int> 1, 0, 3, 1, 0, 0, 0, 3, 0, 3, 3, 0, 0, 1, 1, 3, 6, …
## $ created <fct> 2020-06-07 09:00:36, 2020-06-14 23:12:35, 2020-05-2…
## $ updated <fct> 2020-06-08 09:05:23, 2020-06-14 23:12:35, 2020-05-2…
## $ confirmed <fct> 2020-06-07 09:00:36, 2020-06-14 23:12:35, 2020-05-2…
## $ batch <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ location <fct> บางนา, รามอินทรา 65, พระรามเก้า ซอย 60, ห้าง ริเวอร…
## $ utm_x <dbl> 674486.5, 678167.2, 676504.5, 661251.7, 714943.7, 6…
## $ utm_y <dbl> 1511131, 1532008, 1519745, 1515611, 1477934, 152128…
## $ utm_zone_number <int> 47, 47, 47, 47, 47, 47, 48, 47, 47, 47, 35, 48, 47,…
## $ utm_zone_letter <fct> P, P, P, P, P, P, Q, P, P, P, L, P, P, P, P, P, P, …
## $ job_type <int> NA, NA, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, 0, 0…
## $ online <lgl> NA, NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
Subset Data Frame called UTM
We’ll select only utm_x
and utm_y
columns from jobpost
because we’re interested in these two columns.
utm <- data.frame(jobpost$utm_x, jobpost$utm_y)
str(utm)
## 'data.frame': 50 obs. of 2 variables:
## $ jobpost.utm_x: num 674486 678167 676504 661252 714944 ...
## $ jobpost.utm_y: num 1511131 1532008 1519745 1515611 1477934 ...
Handle Missing Values and Outliers
Row 50 in jobpost
and also utm
is missing so we’ll delete that. Then we’ll also delete row 11 because it’s location is Zambia, Africa and its longitude and lattitude numbers are very different from Thailand - which will distort the map.
utm <- utm[-50, ]
utm <- utm[-11, ]
jobpost <- jobpost[-50, ]
jobpost <- jobpost[-11, ]
str(utm)
## 'data.frame': 48 obs. of 2 variables:
## $ jobpost.utm_x: num 674486 678167 676504 661252 714944 ...
## $ jobpost.utm_y: num 1511131 1532008 1519745 1515611 1477934 ...
str(jobpost)
## 'data.frame': 48 obs. of 25 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ jobpost_id : int 54 66 33 34 35 36 28 32 30 55 ...
## $ job_name : Factor w/ 48 levels ".Net Developer",..: 6 48 2 21 19 39 11 35 4 42 ...
## $ job_qty : int 3 1 1 2 2 5 3 1 5 1 ...
## $ age_min : int 22 25 29 20 20 19 28 28 20 25 ...
## $ age_max : int 26 32 35 35 35 40 120 40 40 45 ...
## $ study_field : Factor w/ 19 levels "-","Food science",..: 1 12 1 1 1 1 1 6 1 5 ...
## $ job_qualification: Factor w/ 41 levels "-","- มีใบขับขี่รถยนต์\n- ผ่านการเกณฑ์ทหาร",..: 41 16 9 38 37 32 33 30 15 23 ...
## $ min_salary : int 30000 12000 20000 13000 10000 15000 15000 12000 11500 25000 ...
## $ job_description : Factor w/ 50 levels "- Develops, modifies application software according to specifications and requirements.\n- Develops application"| __truncated__,..: 30 50 27 4 16 14 15 23 7 47 ...
## $ manychat_id : num 3.96e+15 2.98e+15 2.94e+15 3.42e+15 3.00e+15 ...
## $ job_sex : int 3 3 2 2 3 3 3 3 3 3 ...
## $ study_level : int 5 5 5 0 2 2 3 4 4 5 ...
## $ work_exp : int 1 0 3 1 0 0 0 3 0 3 ...
## $ created : Factor w/ 26 levels "2020-05-29 14:21:22",..: 12 24 1 1 1 1 1 1 1 13 ...
## $ updated : Factor w/ 33 levels "2020-05-29 14:21:22",..: 19 30 1 7 1 8 1 1 6 22 ...
## $ confirmed : Factor w/ 26 levels "2020-05-29 14:21:22",..: 12 24 1 1 1 1 1 1 1 13 ...
## $ batch : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ location : Factor w/ 50 levels "112/3 หมู่ 7 ต.บางโฉลง อ.บางพลี จ.สมุทรปราการ 10540",..: 25 35 29 47 30 9 38 11 49 39 ...
## $ utm_x : num 674486 678167 676504 661252 714944 ...
## $ utm_y : num 1511131 1532008 1519745 1515611 1477934 ...
## $ utm_zone_number : int 47 47 47 47 47 47 48 47 47 47 ...
## $ utm_zone_letter : Factor w/ 4 levels "L","N","P","Q": 3 3 3 3 3 3 4 3 3 3 ...
## $ job_type : int NA NA 0 0 0 0 0 0 0 NA ...
## $ online : logi NA NA FALSE FALSE FALSE FALSE ...
Conversion of UTM into Lat/Long
After some research, we find out that Thailand’s UTM zone is 47N. The stack overflow source I used to find the conversion code is here.
We’ll create two SpatialPoints object classes. Then transform them into a data frame containing lat and long data.
Remember to load sp
library for this operation.
sputm <- SpatialPoints(utm, proj4string = CRS("+proj=utm +zone=47N +datum=WGS84"))
spgeo <- spTransform(sputm, CRS("+proj=longlat +datum=WGS84"))
thai.map2 <- data.frame(Location = jobpost$location, lat = spgeo$jobpost.utm_y,
long = spgeo$jobpost.utm_x)
head(thai.map2)
## Location lat long
## 1 บางนา 13.66385 100.6132
## 2 รามอินทรา 65 13.85233 100.6486
## 3 พระรามเก้า ซอย 60 13.74159 100.6324
## 4 ห้าง ริเวอร์ไซด์ พลาซ่า เจริญนคร ชั้น 1 ใน้ บันไดเลื่อน 13.70512 100.4912
## 5 เมืองชลบุรี 13.36114 100.9847
## 6 กรุงเทพ 13.75633 100.5018
Visualize with GGPLOT2
Here we’ll visualize the THAI.map
we created previously and overlay the new Lat/Long data points (from UTM).
We can see a concentration of utm
data points from jobpost
were made in Bangkok and the greater Bangkok areas with some jobs also posted outside Bangkok.
THAI.map %>% ggplot() + geom_map(map = THAI.map, aes(x = long,
y = lat, map_id = region), fill = "white", color = "black") +
geom_point(data = thai.map2, aes(x = long, y = lat, color = "red",
alpha = 0.9))
## Warning: Ignoring unknown aesthetics: x, y