Converting Universal Transverse Mercator (UTM) to lattitude/longitude data

Visualizing Spatial Data

Packages and Libraries

library(maps)
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.1     ✓ purrr   0.3.3
## ✓ tibble  3.0.1     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x purrr::map()    masks maps::map()
library(sp)
library(rmarkdown)
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)

World and Thai Maps

First, we’ll use map_data function from ggplot2 to turn any map from the maps package into a data frame. This provides longitude and lattitude data. Then, we’ll filter for Thailand to get Thai longtitude and lattitude data.

world.map <- map_data("world")
head(world.map)
##        long      lat group order region subregion
## 1 -69.89912 12.45200     1     1  Aruba      <NA>
## 2 -69.89571 12.42300     1     2  Aruba      <NA>
## 3 -69.94219 12.43853     1     3  Aruba      <NA>
## 4 -70.00415 12.50049     1     4  Aruba      <NA>
## 5 -70.06612 12.54697     1     5  Aruba      <NA>
## 6 -70.05088 12.59707     1     6  Aruba      <NA>
THAI.map <- world.map %>% filter(region == "Thailand")
head(THAI.map)
##       long      lat group order   region   subregion
## 1 99.66309 6.521924  1404 87912 Thailand Ko Tarutao 
## 2 99.64404 6.516113  1404 87913 Thailand Ko Tarutao 
## 3 99.60664 6.596827  1404 87914 Thailand Ko Tarutao 
## 4 99.65401 6.714111  1404 87915 Thailand Ko Tarutao 
## 5 99.70136 6.570557  1404 87916 Thailand Ko Tarutao 
## 6 99.66309 6.521924  1404 87917 Thailand Ko Tarutao

Longitude and Lattitude Value Ranges

Before converting UTM to longitude/lattitude data, we should know the range of both Longitudes and Lattitudes for Thailand.

summary(THAI.map$long)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   97.37   99.08  100.26  100.71  102.27  105.64
summary(THAI.map$lat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.637   9.084  13.213  13.249  17.820  20.424

Jobpost Data Frame

Our objective is to visualize utm_x and utm_y in the jobpost data frame by turning them into lattitude and longitude data first. The jobpost data frame is retrieved from PostgreSQL.

Preparation includes writing it to CSV before loading into Rmarkdown.

jobpost <- read.csv("jobpost.csv")
glimpse(jobpost)
## Rows: 50
## Columns: 25
## $ X                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ jobpost_id        <int> 54, 66, 33, 34, 35, 36, 28, 32, 30, 55, 67, 68, 37,…
## $ job_name          <fct> "Facebook Marketing", "แอดมิน", "Accountant", "แคชเ…
## $ job_qty           <int> 3, 1, 1, 2, 2, 5, 3, 1, 5, 1, 22, 10, 1, 1, 2, 2, 1…
## $ age_min           <int> 22, 25, 29, 20, 20, 19, 28, 28, 20, 25, 30, 21, 18,…
## $ age_max           <int> 26, 32, 35, 35, 35, 40, 120, 40, 40, 45, 45, 30, 50…
## $ study_field       <fct> "-", "แฟชั่น", "-", "-", "-", "-", "-", "จัดการผักผ…
## $ job_qualification <fct> "อ่าน เขียน ภาษาอังกฤษ ได้ดี", "ตอบคำถาม ภาษาอังกฤษ…
## $ min_salary        <int> 30000, 12000, 20000, 13000, 10000, 15000, 15000, 12…
## $ job_description   <fct> "ทำการตลาดทางช่องทาง facebook", "แอดมินดูแล เพจ เสื…
## $ manychat_id       <dbl> 3.961592e+15, 2.984969e+15, 2.941175e+15, 3.416291e…
## $ job_sex           <int> 3, 3, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 1, 3, …
## $ study_level       <int> 5, 5, 5, 0, 2, 2, 3, 4, 4, 5, 5, 4, 0, 2, 2, 5, 5, …
## $ work_exp          <int> 1, 0, 3, 1, 0, 0, 0, 3, 0, 3, 3, 0, 0, 1, 1, 3, 6, …
## $ created           <fct> 2020-06-07 09:00:36, 2020-06-14 23:12:35, 2020-05-2…
## $ updated           <fct> 2020-06-08 09:05:23, 2020-06-14 23:12:35, 2020-05-2…
## $ confirmed         <fct> 2020-06-07 09:00:36, 2020-06-14 23:12:35, 2020-05-2…
## $ batch             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ location          <fct> บางนา, รามอินทรา 65, พระรามเก้า ซอย 60, ห้าง ริเวอร…
## $ utm_x             <dbl> 674486.5, 678167.2, 676504.5, 661251.7, 714943.7, 6…
## $ utm_y             <dbl> 1511131, 1532008, 1519745, 1515611, 1477934, 152128…
## $ utm_zone_number   <int> 47, 47, 47, 47, 47, 47, 48, 47, 47, 47, 35, 48, 47,…
## $ utm_zone_letter   <fct> P, P, P, P, P, P, Q, P, P, P, L, P, P, P, P, P, P, …
## $ job_type          <int> NA, NA, 0, 0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, 0, 0…
## $ online            <lgl> NA, NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…

Subset Data Frame called UTM

We’ll select only utm_x and utm_y columns from jobpost because we’re interested in these two columns.

utm <- data.frame(jobpost$utm_x, jobpost$utm_y)
str(utm)
## 'data.frame':    50 obs. of  2 variables:
##  $ jobpost.utm_x: num  674486 678167 676504 661252 714944 ...
##  $ jobpost.utm_y: num  1511131 1532008 1519745 1515611 1477934 ...

Handle Missing Values and Outliers

Row 50 in jobpost and also utm is missing so we’ll delete that. Then we’ll also delete row 11 because it’s location is Zambia, Africa and its longitude and lattitude numbers are very different from Thailand - which will distort the map.

utm <- utm[-50, ]
utm <- utm[-11, ]
jobpost <- jobpost[-50, ]
jobpost <- jobpost[-11, ]
str(utm)
## 'data.frame':    48 obs. of  2 variables:
##  $ jobpost.utm_x: num  674486 678167 676504 661252 714944 ...
##  $ jobpost.utm_y: num  1511131 1532008 1519745 1515611 1477934 ...
str(jobpost)
## 'data.frame':    48 obs. of  25 variables:
##  $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ jobpost_id       : int  54 66 33 34 35 36 28 32 30 55 ...
##  $ job_name         : Factor w/ 48 levels ".Net Developer",..: 6 48 2 21 19 39 11 35 4 42 ...
##  $ job_qty          : int  3 1 1 2 2 5 3 1 5 1 ...
##  $ age_min          : int  22 25 29 20 20 19 28 28 20 25 ...
##  $ age_max          : int  26 32 35 35 35 40 120 40 40 45 ...
##  $ study_field      : Factor w/ 19 levels "-","Food science",..: 1 12 1 1 1 1 1 6 1 5 ...
##  $ job_qualification: Factor w/ 41 levels "-","- มีใบขับขี่รถยนต์\n- ผ่านการเกณฑ์ทหาร",..: 41 16 9 38 37 32 33 30 15 23 ...
##  $ min_salary       : int  30000 12000 20000 13000 10000 15000 15000 12000 11500 25000 ...
##  $ job_description  : Factor w/ 50 levels "- Develops, modifies application software according to specifications and requirements.\n- Develops application"| __truncated__,..: 30 50 27 4 16 14 15 23 7 47 ...
##  $ manychat_id      : num  3.96e+15 2.98e+15 2.94e+15 3.42e+15 3.00e+15 ...
##  $ job_sex          : int  3 3 2 2 3 3 3 3 3 3 ...
##  $ study_level      : int  5 5 5 0 2 2 3 4 4 5 ...
##  $ work_exp         : int  1 0 3 1 0 0 0 3 0 3 ...
##  $ created          : Factor w/ 26 levels "2020-05-29 14:21:22",..: 12 24 1 1 1 1 1 1 1 13 ...
##  $ updated          : Factor w/ 33 levels "2020-05-29 14:21:22",..: 19 30 1 7 1 8 1 1 6 22 ...
##  $ confirmed        : Factor w/ 26 levels "2020-05-29 14:21:22",..: 12 24 1 1 1 1 1 1 1 13 ...
##  $ batch            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ location         : Factor w/ 50 levels "112/3 หมู่ 7 ต.บางโฉลง อ.บางพลี จ.สมุทรปราการ 10540",..: 25 35 29 47 30 9 38 11 49 39 ...
##  $ utm_x            : num  674486 678167 676504 661252 714944 ...
##  $ utm_y            : num  1511131 1532008 1519745 1515611 1477934 ...
##  $ utm_zone_number  : int  47 47 47 47 47 47 48 47 47 47 ...
##  $ utm_zone_letter  : Factor w/ 4 levels "L","N","P","Q": 3 3 3 3 3 3 4 3 3 3 ...
##  $ job_type         : int  NA NA 0 0 0 0 0 0 0 NA ...
##  $ online           : logi  NA NA FALSE FALSE FALSE FALSE ...

Conversion of UTM into Lat/Long

After some research, we find out that Thailand’s UTM zone is 47N. The stack overflow source I used to find the conversion code is here.

We’ll create two SpatialPoints object classes. Then transform them into a data frame containing lat and long data.

Remember to load sp library for this operation.

sputm <- SpatialPoints(utm, proj4string = CRS("+proj=utm +zone=47N +datum=WGS84"))
spgeo <- spTransform(sputm, CRS("+proj=longlat +datum=WGS84"))

thai.map2 <- data.frame(Location = jobpost$location, lat = spgeo$jobpost.utm_y, 
    long = spgeo$jobpost.utm_x)

head(thai.map2)
##                                      Location      lat     long
## 1                                       บางนา 13.66385 100.6132
## 2                                 รามอินทรา 65 13.85233 100.6486
## 3                            พระรามเก้า ซอย 60 13.74159 100.6324
## 4 ห้าง ริเวอร์ไซด์ พลาซ่า เจริญนคร ชั้น 1 ใน้ บันไดเลื่อน 13.70512 100.4912
## 5                                    เมืองชลบุรี 13.36114 100.9847
## 6                                      กรุงเทพ 13.75633 100.5018

Visualize with GGPLOT2

Here we’ll visualize the THAI.map we created previously and overlay the new Lat/Long data points (from UTM).

We can see a concentration of utm data points from jobpost were made in Bangkok and the greater Bangkok areas with some jobs also posted outside Bangkok.

THAI.map %>% ggplot() + geom_map(map = THAI.map, aes(x = long, 
    y = lat, map_id = region), fill = "white", color = "black") + 
    geom_point(data = thai.map2, aes(x = long, y = lat, color = "red", 
        alpha = 0.9))
## Warning: Ignoring unknown aesthetics: x, y

Paul Apivat
Paul Apivat
CryptoData Analyst ⛓️

My interests include data science, machine learning and Python programming.