How sample size affects the value of retail analytics

Björn Sjölund

One discussion that has been surfacing a lot recently is the differences between Bluetooth and Wi-Fi when it comes to analysing visitor behaviour in real-world locations. The subject can be simplified to a discussion about sample size, i.e. the total amount of device observations that can be used to make good predictions/decisions. In my previous post I discussed the issues with using Bluetooth beacons as a source for analytics data. In this post I will go deeper on this subject and explain in detail what these issues are and how we see the field developing.

Reliable interior analytics for retail & airports

Let’s start from the beginning. At Walkbase we provide the tools necessary for analysing visitor behaviour in real-world retail stores and large venues, in a similar way you would analyse user behaviour on a website. We use state of the art sensing technologies to analyse the way visitors move in and around venues. Our sensors use a proprietary privacy scheme approved by privacy and data protection authorities, to ensure that no personal identifiers are ever stored in our system. The data we gather is aggregated and analysed in order to find patterns that our customers can use to gain insights and develop their business practices around.

The important thing here is that the aggregated data, i.e. the patterns we can identify, is where the statistical value lies. Displaying data of an individual shopper’s path through a venue would not give a business owner any real insights into how he/she should change their business. This is why having a larger sample size matters in analytics. The more visitors per pattern, the easier it is to separate, identify and act on them.

By including data from devices such as traditional people counters, point-of-sales (POS) and so on, we’re able to extract more value of the patterns we can detect. Take POS data as an example, it allows us to map the behaviour that leads to a purchase. This is again taking notes from how online analytics tools have been used to improve conversions for years.

What is the required sample size for reliable in-store analytics?

Ask any election poller and they will tell you how important it is to have a broad range of poll data before making any conclusions on how people are going to vote. The exact same principles apply to any field that uses statistics to make assumptions about the behaviour of a target group.

The factors generally affecting the required (minimum) sample size are:

    •    the size of the population
    •    the number of groups and subgroups to be analysed
    •    variability within the population
    •    the desired general accuracy
    •    the cost of obtaining the data

For indoor analytics, we typically want to analyse the behavioural patterns of the visitors to a specific store or site within one day or maybe a week. Thus the population size may vary significantly -- an airport may have several tens of thousands of visitors per day, supermarkets typically a few thousands, while a high-street store only a few hundreds.

But virtually always you would want to go further and splice this into different subgroups based on e.g. different zones within a store (e.g. the footwear section), different time intervals (between 2-3pm), repeat vs. new visitors, specific paths taken, and so on. Given how consumers’ behavioural patterns vary significantly, these factors already place pretty high demands on the sample size.

To continue, in-store analytics often begins with rather simple questions like what’s our daily footfall. But according to our experience, this quickly leads into rather specific behavioural questions such as:

    •    how did the fitting room to till conversion improve after service change?
    •    how did a change in window marketing affect passersby conversion?
    •    how many people saw this ad on digital signage and where did they go thereafter?
    •    which routes did passengers generally take before departing to this flight?

Some retailers are even going as far as setting KPIs and compensation schemes based on such data. 

Wi-Fi sample size

We’ve observed very high Wi-Fi penetration rates in most urban areas, sometime as high as 70%. The percentage of Wi-Fi devices has been calculated by comparing people counting data with our Wi-Fi detection data. 

Having a signal source as prevalent as Wi-Fi means that we can get the necessary sample size to make our insights matter. Coupled with a true people count we can take this even further, by extrapolating the Wi-Fi movement data based on the known people counts for given periods of time.

There have been many discussion about the impact that the Apple iOS9 update would have on these numbers. We’ve been working on this matter since Apple first announced MAC scrambling in iOS8 and made sure that our methods for generating Wi-Fi based analytics are unaffected. 

Bluetooth sample size

One of the issues with using Bluetooth for indoor analytics is the declining percentage of Bluetooth radios that can be detected. What has changed is the way Bluetooth is used, not the amount of devices using it. There has been a move to more power efficient solutions which means that Bluetooth no longer actively exposes itself for pairing by default.

Bluetooth has however brought a slew of new technologies that are useful for other purposes, one of them being beacons. Beacons enable continuous advertising of a unique identifier, ideal for proximity marketing and service discovery. The issue with using Bluetooth Beacons as a source of analytics information is the fact that they require an application to be installed on the visitors smartphones before any data can be gathered. Low adoption rates for brand specific applications means that it's very difficult to get a large enough user base for analytics gathering. Currently you would be pleased to see a 5-10% adoption rate for a retail service app. Clearly nowhere near the numbers that can reached with Wi-Fi.

The fact is that beacons were never designed to be used for analytics, but for triggering content and providing a context i.e. location for apps. Beacons will serve a part in the future of indoor analytics but mostly as a way of measuring the effectiveness of proximity based offers and marketing campaigns. There is still a way to go before we can get any significant data sample sizes from beacons.

The Future

Based on the observations we have presented above you can easily see that Wi-Fi is the best source of anonymous visitor movement data as of now. Having said this, we’re also using more traditional counting methods, i.e. footfall counters, to make our Wi-Fi data even more insightful and reliable. We do see bluetooth beacon data becoming much more relevant as the technology matures and adoption rises, especially with the new approach taken by Google. At Walkbase our main focus continues to be on using the technologies that give us the largest possible sample size. In the end it’s all about providing extremely valuable insights, and data for our customers.

Measuring Store Performance Over The Holiday Season


Walkbase provides a retail analytics solution for improving the impact of marketing on physical stores and personalising in-store shopping experience.