Custom cover image
Custom cover image

Cost-Weighted TF-IDF: A Novel Approach for Measuring Highway Project Similarity Based on Pay Items’ Cost Composition and Term Frequency

By: Material type: ArticleArticleDescription: 1-15 pISSN:
  • 0733-9364
Subject(s): Online resources: In: ASCE: Journal of Construction Engineering and ManagementSummary: State highway agencies (SHAs) often need to cluster or bundle projects in accordance with their scope similarity for various construction management tasks, including historical data-driven time, cost estimating, and project bundling. Conventionally, SHAs categorize similar projects into work types based on subjective judgment about the similarity between major pay items. A few quantitative methods for project similarity determination are found in the literature, but they mostly use one single source of information, either the cost contribution of pay items or the keywords of pay items descriptions, for measuring project similarity. This paper presents the first attempt to integrate multiple information sources for project similarity measurement. This research proposes a novel cost-weighted term frequency-inverse document frequency (CW-TF-IDF) method that incorporates the cost information of pay items into the traditional TF-IDF word embedding method to measure project similarity. The effectiveness of the proposed method in supporting project clustering and bundling was tested using the historical bid data collected from an SHA. The findings showed that the CW-TF-IDF method significantly improves project clustering performance compared to the most recent state-of-the-art method. The CW-TF-IDF method also showed its outperformance in project bundling as it yielded a cosine similarity of over 0.9 for most of the bundled projects in the testing data. This proposed method is expected to help SHAs accurately identify similar projects and eventually improve their project management effectiveness.
Holdings
Item type Current library Call number Vol info Status Date due Barcode
Articles Articles Periodical Section Vol. 149, No. 8 (August 2023) Available

State highway agencies (SHAs) often need to cluster or bundle projects in accordance with their scope similarity for various construction management tasks, including historical data-driven time, cost estimating, and project bundling. Conventionally, SHAs categorize similar projects into work types based on subjective judgment about the similarity between major pay items. A few quantitative methods for project similarity determination are found in the literature, but they mostly use one single source of information, either the cost contribution of pay items or the keywords of pay items descriptions, for measuring project similarity. This paper presents the first attempt to integrate multiple information sources for project similarity measurement. This research proposes a novel cost-weighted term frequency-inverse document frequency (CW-TF-IDF) method that incorporates the cost information of pay items into the traditional TF-IDF word embedding method to measure project similarity. The effectiveness of the proposed method in supporting project clustering and bundling was tested using the historical bid data collected from an SHA. The findings showed that the CW-TF-IDF method significantly improves project clustering performance compared to the most recent state-of-the-art method. The CW-TF-IDF method also showed its outperformance in project bundling as it yielded a cosine similarity of over 0.9 for most of the bundled projects in the testing data. This proposed method is expected to help SHAs accurately identify similar projects and eventually improve their project management effectiveness.