You are here

Old-Style Surveys and Observations : The Verification of Big Data

When the technology was first introduced to use personal Bluetooth devices in vehicles to measure travel speed, origins and destinations, it was seen as a significant step in collecting data with large sample size. 

In the past, a typical travel time survey was conducted with two vehicles, each with a driver and observer and each making three runs in a peak period. The time was recorded at each major node. This was later improved using GPS systems to give a location every second, so the detailed characteristics of the trip between major nodes were measured, albeit still with small sample sizes.  

But while the information was only indicative, it potentially gave traffic engineers some ideas about where the flow of traffic was being disrupted rather than just an average speed. 

With Bluetooth, there were now thousands of records, and this was an important - but not a perfect - step. The data was recorded only at major nodes, so the operational nuances between the nodes were typically buried in averages. You could plot variation in speed but not speeds at locations between nodes or differences dependent on lane use, nor the reasons for speed variations. 

In the early stages of Bluetooth utilisation, Austraffic did a study and found that some significant errors could creep in, such as false readings from nearby roads. Furthermore, the average speed of the recordings is not the average speed of the vehicles because of multiple records from buses and other high occupancy vehicles (which may or may not be a good thing, it is just critical to understand what you are getting).  

The systems are getting better, but the nature of the surveys and what they are really measuring should still be foremost in our minds. 

Like all data collection, the context and the details need to be understood, the data needs to be cleaned, and what it is measuring must be clearly defined. Furthermore, the value of the data can be greatly enhanced by conducting a sample by another method to add clarification. 

An excellent example of this arose in a recent session of the AITPM 2021 National conference. The session was a panel discussion titled “Data Insights, Analytics, Visualisations Q&A Recording”.  

Faria Shanjana Imam PhD, from the Ason Group presented on the subject of “Bus travel time estimation and prediction on arterial corridors for low-frequency buses”. 

In part she summarised her research as follows: 

“The accuracy and reliability of the real-time and predictive traveller information system strongly influences users’ satisfaction and attitudes towards public transport (bus) systems. Many studies have suggested that the fusion of multi-source data can achieve higher precision in the estimation and prediction of travel time than that of single-source data. Data-oriented travel speed (or travel time) prediction models require accurate estimation of the historical time series with equally spaced data points. The availability of bus speed time-series data points depends on bus frequency and other operational factors such as on time performance. Low frequency bus routes coupled with poor on time performance can result in time series with a number of missing values (or irregular interval of data points). 

With a case study on a Brisbane corridor, car speed is estimated using Bluetooth MAC Scanner (BMS) and bus speed is estimated using Automatic Fare Collection data (Go card). Findings are encouraging, and results of the integration of the two data sources indicate around 3% improvement in the bus speed estimation where the time series gaps are filled with car speed compared to the case where the time series gaps are filled with linear interpolation. Furthermore, this estimated bus speed time series is utilised to predict future bus speed with the application of Artificial Neural network (ANN). The prediction results are also improved for different prediction horizons. 

It is encouraging to see such a data-driven approach. 

One of the questions arising from the discussion session was “Is there a difference in speed travel time between lanes on the stretch of road in your example, left and right? Buses are more likely to use a left lane and has lane travel time being considered in the Bluetooth data, i.e., if one lane has consistently slower travel time, you might be filtering more of these out” 

Faria noted that they did not differentiate in this particular case. Still, it raises an interesting point where some old-style surveys may well be able to add some detail to the average figures coming from Bluetooth. 

The average speed in a section of the road may be hiding some significant variations. For example: 

  • The capacity and speed achieved may be disrupted by vehicles queuing onto the through lane from a turning bay with insufficient capacity. 
  • Lane changes from vehicles entering from one side of the road and hoping to exit soon after on the other side can be very disruptive. 
  • Buses stopping, which slows both the lane and the nearby lane if cars try to get around the stationary public transport vehicle. 
  • Left or right turning vehicles may block the lane because they have to wait for pedestrians to clear the side street crossing. 

This is a good example of data having a range of layers like an onion. You can look at the whole organic thing (there is extensive discussion as to whether an onion is a fruit or a vegetable, or both, hence the reference to “organic thing”!!). But an onion consists of layers, and if you dig down into the layers, you see more specific material that is part of the whole.  

A detailed understanding of what is happening on the ground is essential for finding local improvements. Some more detailed surveys and good old-fashioned observations are integral components to add to automatically collected data. 



2021 – AITPM National Conference panel discussion “Data Insights, Analytics, Visualisations Q&A Recording”. Available to registered participants 


John Reid

Managing Director, Austraffic

From the beginning of his career in local government and then when he established Austraffic in 1983, John realised that data collection is not just about numbers but about understanding people and the activities that serve the community's needs.  Poor or even bad data is counter-productive.  Even if results fit our preconceived ideas that doesn’t mean it is accurate. John has seen how good data expands our perceptions and thinking and can be surprising in its results. Connect with John on LinkedIn.

John Reid