PrivlocR - analyzing location data easily and securely

data collection
Author

Haran Sened

Published

December 3, 2024

TL;DR - PrivlocR allows you to turn lists of GPS coordinates obtained in your study into a list of tags describing the vicinity of the location (e.g., shop, university, ocean, residential …) in a manner that is free, easy to implement, completely offline and secure, and reproducible, building on open map data from OpenStreetMap.

As an experience sampling researcher in social psychology I’m always interested in the situations my participants find themselves in when answering my beeps. As experience sampling platforms have begun to provide location data, the potential to understand participants better, without burdening them with additional questions, seemed endless. Location data can also be used between beeps, giving us insight as to what our participants were doing even when they weren’t being polled by us.

However, unlocking this potential poses challenges. The first step is what I’d call “blind” analysis, looking at features of locations without asking where participants actually are. These can include mobility features (e.g., amount of distance covered) and location clustering (e.g., noticing that participants spend most of their time at N specific “places” and identifying behavioral patterns dependent on the “place”), and I can’t recommend enough a primer on these issues by Müller and colleagues (2022). While thes features are certainly useful, they still leave a lot on the table. If I know the coordinates my participants were in, I should be able to know whether they were at a busy shopping district, walking along the shore, or visiting their local house of prayer.

Online services such as Google Maps can give us that kind of information, and that’s the route that Müller and colleagues suggest, but there are several drawbacks. First, these services require setting up accounts, and usually cost money. Setting up an account and writing scripts (or learning a new package) to access the API can take time; While each single query might cost a fraction of a cent, with large datasets the costs can add up, especially if you’re like me and you don’t get the query format right on the first, (or second, or third) time. Second, while costs in time and money might be manageable, there are larger inherent issues stemming from the fact that the full map data and the search algorithm are all proprietary. The first is privacy - querying an online service with the locations our participants were in means sharing those locations - which are extremely personal data - with a commercial entity. While we can ask participants’ consent for this, avoiding the risk entirely is always preferable. The second issue is reproducibility. We don’t have the map data, and we don’t know the exact algorithm used by the service provider to give us location tags. As such, we cannot evaluate map data quality, and certainly can’t allow reviewers to check us on that. This is exacerbated even further as map data is constantly changing - both as companies update their services and change the way they classify locations, and simply because the real world changes over time. If we come back to the data a year later and want to check locations a bit differently, the locations have already changed and we can’t turn back the clock.

So, to solve all of these issues at once, I’ve developed a new R package - privlocR. It uses maps downloaded from OpenStreetMap - an free open online map updated by volunteers around the world to provide location tags. Downloading maps is as simple as going to one of the OpenStreetMap mirrors (such as Geofabrik) and clicking on a link. Once you have a map file, privlocR cross-references a list of coordinates with the file, collecting tags present a fixed distance around each location. For example, you could ask for tags 5 meters around each location, and you would get, for each location, a list of tags for identified places present 5 meters from each location. This could include broad tags such as building_yes, which indicate that the location was near a building, to more specific ones such as amenity_church which indicates the presence of a church. All of this is done completely offline, based on the previously downloaded map file. Need to tweak the analysis a year later, or provide a reviewer or colleague with a way to reproduce your results? No problem, send them the original map file you downloaded and they should get the same results every time. Finally, it only requires a few lines of code:

library(privlocR)

# Directory that contains pbf files (the R temporary dir in this case)
mydir <- tempdir()

# Populate the directory with map file(s) downloaded from OpenStreetMap
file.copy(system.file("extdata", "tokelau.osm.pbf", package = "privlocR"), mydir)
#> [1] TRUE

# The distance around each location in which we want to search for tags
mydst = units::set_units(100, m)

# Example longitude and latitude values
long = c(-9.1979860, -9.192079)
lat = c(-171.8501176, -171.856883)

get_close_tags(mydir, long, lat, dst = mydst)
#> Re-reading with feature count reset from 52 to 29
#> Re-reading with feature count reset from 4 to 2
#> [[1]]
#> [1] "landuse_residential" "amenity_restaurant"  "natural_reef"       
#> [4] "natural_coastline"   "natural_scrub"       "tourism_hotel"      
#> [7] "leisure_park"        "building_yes"       
#> 
#> [[2]]
#> [1] "natural_reef"      "natural_coastline" "natural_water"    
#> [4] "natural_wood"      "building_yes"

To read more, you can go to the package website, which also has installation instructions. For more information about downloading map files, improving runtime and other advanced topics, you can go to the full usage vignette.

privlocR is under active development and I’m going to be adding features as I use the package in my own work. If you have any questions, or if you’ve used the package and have thoughts about additional features, see the about page for contact details.

Müller, S. R., Bayer, J. B., Ross, M. Q., Mount, J., Stachl, C., Harari, G. M., … & Le, H. T. (2022). Analyzing GPS data for psychological research: a tutorial. Advances in Methods and Practices in Psychological Science, 5(2)