Operational Research in Engineering Sciences

CODE CLONE DETECTION WITH SELF-SUPERVISION ON DUAL GRAPHS

Chunguang Li ,
Faculty of Engineering, Rajamangala University of Technology Krungthep, Bangkok, Thailand, 10120
Adisorn Sirikham ,
Faculty of Engineering, Rajamangala University of Technology Krungthep, Bangkok, Thailand, 10120
Jessada Konpang ,
Faculty of Engineering, Rajamangala University of Technology Krungthep, Bangkok, Thailand, 10120
Yan Wang ,
Jiangsu College of Finance and Accounting, Lianyungang, China, 222061

Abstract

Code clone detection underpins a wide range of maintenance tasks from automated refactoring to real time plagiarism policing yet single view methods that rely on raw tokens, Abstract Syntax Trees or Control Flow Graphs still struggle to type-4 (semantic) clones. We present DG Clone, a dual graph self-supervised framework that couples a textual call dependency graph with an AS derived structural graph and fuses them through a lightweight cross graph attention module implemented in PyTorch~2.2 and PyTorch Geometric~2.5. The textual graph excels at capturing lexical context, while the AST graph models hierarchical syntax. Their fusion recovers semantic equivalence that each view alone misses, outperforming token sequence models (e.g., GraphCodeBERT) and single graph GNNs. Training employs a graph aware triplet loss that obviates manual labels by dynamically constructing positive/negative triplets from unlabelled repositories. DG Clone boosts F1 on BigCloneBench from 90.3% to 98.4% and on Google Code Jam from 81.7% to 89.8%. It lifts MAP by +6.8pp over Tree Based CNNs and +5.4pp over GraphCodeBERT, while cutting inference latency in an online judge by 31% in Python implementation. These findings demonstrate that integrating complementary graph views affords a label-efficient and practically viable route to uncovering subtle semantic similarities in source code.

Keywords
Code Clone Detection, Self-Supervised Learning, Graph Neural Network.

Browse Issue

SCImago Journal & Country Rank

CiteScore for Management Science and Operations Research

8.1
2021CiteScore
 
 
89th percentile
Powered by  Scopus

CiteScore for Engineering (miscellaneous)

8.1
2021CiteScore
 
 
93rd percentile
Powered by  Scopus

Information