Optimizing RCMCost Modeling for the Department of Public Works..

The power of NLP in infrastructure.

Through a unique collaboration between Rijkswaterstaat - the Dutch Ministry of Infrastructure and Water Management - and Cmotions, and using NLP techniques ranging from keyword selection to Generative AI, we were able to optimize their RCMCost model, allowing them to perform better estimation and cost management.

Rijkswaterstaat 's responsibilities include the maintenance of primary infrastructure, such as road and waterway construction. To make a good trade-off between costs, performance and risks when maintaining assets (think bridges, roads and waterways), Rijkswaterstaat uses a tool called RCMCost. With input data in this tool, they can view the expected maintenance per location.

The trigger.

The Department of Public Works approached Cmotions with a request to streamline the data that is the input to their RCMCost model. Aggregated analysis across many similar locations allows Rijkswaterstaat to make better estimates and control costs. However, the records of cost and description for each object are not standardized; individuals within the organization complete the maintenance interface in different ways.

This is where the complication arises: with hundreds of locations, multiple non-standardized registration fields and dozens of different experts filling out the registrations, analyzing costs of the locations is time-consuming and limited. The RCMCost modeling experts at Rijkswaterstaat and data scientists at Cmotions worked together to develop and implement a data-driven solution to automate and standardize the input data to the RCMCost model.

The solution.

Cmotions has several Natural Language Processing (NLP) techniques. implemented and evaluated to standardize inputs to the RCMCost model.

First, an exploratory data analysis (EDA) was conducted to find potential patterns and terms that occur frequently. Based on these common terms, we were able to reduce our textual input by selecting one term. Moreover, we were able to link these textual entities to the properties of the model to make the cost analysis much more efficient.

We then used a semantic similarity technique to identify and link text fields that are similar in content. The purpose of this step is to join different terms together and label them with a standardized version. Semantic similarity techniques look at the meaning of terms rather than the literal way they are written, allowing grouping of words that may not look the same but have the same meaning. To learn more about semantic similarity and how it works, give the following blog a good overview.

Finally, a more advanced technique was used for fields where the textual variety was very large and results based on semantic similarity were still not accurate enough. Using Generative AI, we were able to map this variety of values to a small number of standardized values. Large Language models have a deep understanding of language and can be used to identify meaningful categorization of words or phrases. In our case, this helped identify a set of valuable group labels and descriptions for functions that parts of infrastructure sites have.

The outcome.

Overall, using NLP techniques ranging from keyword selection to generative AI, we added standardized labels to the text fields used as input to the RCMCost model. The Department of Public Works is using these results to standardize values for efficient RCMCost modeling analyses and to define the standardized values for future use in their interface.

Key to the project's success was the close collaboration between the Department of Public Works' business experts and Cmotions' NLP experts, the iterative approach and the broad set of NLP techniques that were considered and ultimately applied.

Find out what we can do for your organization as well.