Swimming in Data: Will You Sink, Tread Water or Swim?

Data-driven decision making is fast becoming the status quo for the development of infrastructure in our cities.

How we scan, categorise and assess data will determine whether we sink, tread water frantically or swim in the waves of information overload.

 

“While technology has opened the flood gates to data, so to speak, there will need to be a filter through which we process the information pouring in,” explains Rafid Morshedi, Data Analytics and Automation Engineer. “For example, when planning new rail lines, one of the most difficult tasks we face is to stay informed about Development Applications (DAs) in the surrounding environment as new residential or commercial developments can impact a proposed alignment.

 

“In New South Wales, this information is in the public domain but scattered across various local and state government websites, and the only way to access it is through manual searches. This is time consuming, costly and open to human error.

 

“As a result, there are three options to consider:

  1. Ignore the DAs and risk reworking the alignment later (sink)
  2. Undertake regular searches which relies on people to be proactive (causing us to tread water to keep up)
  3. Develop technology that removes human error by automating the process of data analysis and highlighting information that is relevant and important to our project (swim).

 

“This third option is one of the unique approaches we used on a large NSW rail project recently,” explains Mr Morshedi. “It began with having the right people involved who were familiar with both the planning process and the data that drove it.

 

“As with any research, you need to understand the research domain, and what you are looking for. This is where the human element is important – asking the right questions and thus defining the right data elements to extract. In this case, the question was simple; is our alignment going to be impacted by a third-party development?

 

“The next step involved gathering the data. To extract the relevant information and locate the data needed, several publicly available spatial datasets were integrated such as the Digital Cadastral Database and the Geocoded-National Address File. These datasets were critical to the whole process. Open data released by government agencies were invaluable in developing the system. Security and ethics were considered throughout the development cycle.

 

“The process of collecting DA data from public databases was automated and the resulting flood of data was automatically risk-rated using a machine learning algorithm trained on past high-risk DAs. It is important to note that the human element is still a vital part of the process – we need to be the ones to do the final check to ensure that the right information is being picked up.

 

“We tested several machine learning techniques to find an appropriate algorithm and set of hyperparameters that met our recall requirements. We needed to ensure a high true positive rate. A boosted tree algorithm was used to assign a preliminary risk rating to DAs.

 

“This is a new approach to infrastructure planning allows for rapid appraisals of new design options. It saves time, money and resources while at the same time optimising design outcomes.”
With the availability of data increasing, it is critical to convert it into information which can be used to inform decision making. The flood of data in transport infrastructure is here – are you sinking, treading water or swimming?

 

For more information, contact Rafid Morshedi. If you have a similar story you want to share, get in touch with us. We would love to hear from you.

 

To stay abreast of our latest news, publications, videos and posts, please follow us on LinkedIn.