Soofa’s Pedestrian Data Collection Process

Pedestrian data is one of the most valuable insights Soofa provides its partners, which is why we take great care in keeping our data secure and accurate. This data allows cities to better understand the movements of their constituents and advertisers to better understand their customers. This is the first in an upcoming series regarding the technical and operational nature of Soofa’s digital signage.  

Let’s look at how our pedestrian data pipeline works to give cities, advertisers and the general public a better idea of how we process foot-traffic data. Before we can do that, it’s important to understand how wi-fi works in mobile devices. 

 
 

How Phones Search for Wi-Fi

Have you ever noticed when you come home after being out that your phone automatically connects to the wi-fi? Phones store wi-fi names (also known as SSIDs) and passwords in their memory. When your phone connects to a wi-fi network, it creates a connection that allows you to use the internet through that access point. But when your phone doesn't recognize any wi-fi names around it, another interesting process occurs. 

When a mobile device is not connected to the internet but wi-fi is enabled, it constantly searches for wi-fi names it recognizes. If the device does not receive a response from a recognized wi-fi name, it will try again in the next interval. This may sound like a battery draining task, but phone manufacturers have made this process quite efficient on battery life! 

Mobile device probes contain phone information as well. For example, the MAC (Media Access Control) address is a unique 12-digit string of characters that can be used to identify a device. Mobile devices randomize parts of their MAC addresses to increase security and reduce tracking, which keeps identifying information of their owners secure and unknown. MAC addresses also randomize at different intervals depending on the manufacturer. For example, since iOS 14, iPhones change every 24 hours. 

Intake and Data Processing​

Soofa Signs come equipped with devices that listen for wi-fi probes. It’s important to note that these are not wi-fi access points and pedestrians cannot connect to them, so they work differently than actual wi-fi access points. When a pedestrian walks past a Soofa Sign, their wi-fi is likely enabled but it’s unlikely they are connected to a wi-fi network, so their phone scans for wi-fi through the process described above. 

As probes are sent out, Soofa Signs receive the probes with information including the MAC address, a start time, an end time, and the number of probes received from that device. The Soofa Sign will then take the MAC address and run it through a one way encryption process. After a value is encrypted, it’s very difficult to find the original value. This means that Soofa doesn’t collect any identifying data from a person or device, which adds an extra layer of security on top of the MAC address randomization listed before. 

This information is stored as temporary memory on the Soofa Sign. After some time has passed or when memory reaches capacity, the sign sends that device data to our backend system, remove all stored data, and continue listening for more probes. Our signs also send additional information that is static to the sign, including the sign’s unique identifier. Soofa Signs across the country send thousands of rows every day, and a single row of data can contain anywhere between 1 and 50 individual probe sessions!

Every night, Soofa’s system picks up and transforms the day’s data into insight. It starts by taking each row of data and pulling out the individual probe sessions, then combining any that share a MAC address and happen within 10 minutes of the last. 

We filter the probe sessions by removing sessions fewer than 10 seconds or longer than 30 minutes. This is to remove any entries from potential cars that have passed by. Wi-fi access points can also transmit probes as well, so any sessions that we deem longer than 30 minutes we typically assume is a router and remove it as well. Once this raw data is filtered, we store it for processing and future reference.

Next, we take the formatted data and perform an hourly breakdown on it. This breakdown helps us estimate roughly how many people walked past our signs per hour and how long they were near a Soofa sign. Finally, this data is provided to landowners and advertisers to get a better idea of pedestrian traffic in their cities and to better understand the impact of digital and static ads with Soofa. Since the data we collect on Soofa Signs is anonymized from the source, we cannot trace back to individual pedestrians, which keeps information secure. 

 
 

Future Improvements

We are constantly striving to improve our processes here at Soofa which is why we work closely with landowners, advertisers, and vendors to make pedestrian data as accurate and secure as possible. One way we're looking to improve accuracy is to  programmatically determine what data comes from mobile devices. As mentioned previously, MAC addresses are twelve characters long, but six of these characters are designated from the manufacturer and do not get randomized, so we are looking into ways of only accepting probes from known mobile devices based on these numbers. 

We are also looking into adding signal strength to data accepted by the signs and provided to our backend systems. This can give us a rough idea of the device's distance from the signs, and to better evaluate what might be coming from pedestrians. 

Overall, pedestrian privacy is a priority while we collect any data, and we aim to use pedestrian data to help improve Soofa Sign technology and functionality. 

Looking to learn more? Send us an email: hi@soofadigital.com, or fill out our landowner interest form to get started today.

​Written by Michael Pitts

Michael is a full-stack engineer at Soofa, and works with the engineering team to develop software updates and new features on Soofa Signs.