Phishing Address Detection

Paper Link: https://dl.acm.org/doi/abs/10.1145/3650400.3650499

Background & Motivation

The rapid growth and adoption of blockchain technology, particularly Ethereum, have paved the way for decentralized finance (DeFi) applications. These applications enable peer-to-peer transactions and financial services without the need for traditional intermediaries, offering users increased financial sovereignty and efficiency. However, this technological advancement has also attracted malicious actors who exploit the system for phishing scams, costing users substantial financial losses.

Phishing scams on the Ethereum platform are sophisticated and adaptive, often employing tactics such as giveaway scams and fraudulent investment schemes to deceive users into interacting with malicious accounts. These scams can involve complex smart contracts and transaction patterns that are difficult to detect using traditional anti-phishing methods. The transparent and immutable nature of Ethereum's transaction records, while a boon for security and trust, also presents challenges for scam detection, as attackers continually evolve their strategies to evade existing security measures.

Given the evolving threat landscape, there is an urgent need for advanced detection methods that can identify and mitigate phishing attacks on the Ethereum network. Such methods must be capable of analyzing the intricate patterns of transactions and user interactions within the blockchain ecosystem, taking into account both the spatial relationships between transactions and the temporal sequences in which they occur.

Methodology

The Spatio-Temporal Fusion Network (STFN) is a sophisticated approach designed to detect phishing scams on the Ethereum network by analyzing both the spatial and temporal aspects of transactions. The methodology is divided into several key steps:

  1. Data Collection: The initial step involves gathering comprehensive Ethereum transaction data from Etherscan and GoPlus Security. This data includes transaction details such as sender and recipient addresses, transaction amounts, timestamps, and other relevant attributes. This dataset forms the foundation for subsequent analysis.

  2. Transaction Subgraph Construction: Each Ethereum address is treated as a node in a graph, and transactions between addresses are represented as edges. This constructs a transaction subgraph for each address, capturing the spatial relationships between transactions and the addresses involved. These subgraphs are dynamic and reflect the actual flow of funds within the Ethereum network.

  3. Temporal Sequence Formation: Concurrently, transaction sequences are formed for each Externally Owned Account (EOA). Transactions are ordered chronologically based on their timestamps, creating a timeline that reflects the temporal progression of an account's activity.

  4. Feature Extraction: The spatial features are extracted using a Graph Convolutional Network (GCN) encoder. The GCN processes the transaction subgraphs to identify patterns such as transaction direction, amount, and frequency. These features provide insights into the structure and behavior of the transactions around each address.

    Similarly, the temporal features are captured using a BERT encoder. The BERT model, pre-trained on Ethereum transaction sequences, is fine-tuned to generate representations that are sensitive to the order and timing of transactions. This allows the model to identify temporal patterns indicative of phishing activities.

  5. Feature Fusion: The spatial and temporal features extracted by the GCN and BERT encoders are fused to create a comprehensive representation of each transaction and address. This fusion process is crucial as it allows the model to consider both the 'who' and 'how' of transactions (spatial) as well as the 'when' (temporal).

  6. Machine Learning Classification: The fused features are then used as input for a machine learning algorithm, specifically a Multilayer Perceptron (MLP), to classify Ethereum addresses into phishing and non-phishing categories. The MLP learns to distinguish between benign and malicious transaction patterns based on the integrated spatial-temporal features.

  7. Evaluation and Optimization: The performance of STFN is evaluated using standard metrics such as Area Under the Curve (AUC), Precision, Recall, and F1-Score. The model is optimized through techniques such as cross-validation to ensure its robustness and generalizability.

This methodology represents a holistic approach to phishing detection on the Ethereum network, combining the strengths of graph analysis and sequence modeling to effectively identify and mitigate phishing threats. The integration of spatial and temporal features within STFN is a novel contribution to the field of blockchain security, offering a robust solution to protect users from the evolving landscape of cyber threats.

Results

Results of STFN for Ethereum Phishing Detection

The Spatio-Temporal Fusion Network (STFN) has been thoroughly evaluated through a series of experiments to measure its effectiveness in detecting phishing scams on the Ethereum network. STFN's performance was assessed using a range of metrics, including Area Under the Curve (AUC), Precision, Recall, and F1-Score. These metrics provide a multi-faceted view of the model's accuracy, with AUC offering an overall measure of the model's ability to distinguish between phishing and legitimate transactions, and Precision, Recall, and F1-Score providing insights into the model's performance in terms of false positives, false negatives, and overall accuracy.

STFN was compared against several state-of-the-art baseline methods to demonstrate its effectiveness. These baselines included traditional machine learning approaches using handcrafted features, as well as advanced graph-based methods such as DeepWalk, Node2Vec, and Trans2Vec. Additionally, the performance of STFN was compared with more recent methods like Graph Attention Networks (GAT), GraphSAGE, and Temporal Transaction Aggregation Graph Network (TTAGN). STFN achieved an AUC score of 93.26%, indicating a high level of discrimination between phishing and non-phishing transactions. This score is significantly higher than the AUC scores of the baseline methods, showcasing STFN's superior ability to correctly classify transactions. The model also demonstrated excellent Precision, with a score of 91.08%, suggesting that it rarely misclassifies legitimate transactions as phishing attempts. STFN's Recall score of 94.53% indicates its strong capability to identify actual phishing transactions without missing many positive cases. The F1-Score, which harmonizes Precision and Recall, was 92.77%, further confirming the model's overall effectiveness in balancing the detection of phishing transactions while maintaining low error rates.

Conclusion

To conclude, the results of the experiments conducted on STFN indicate that it is a highly effective tool for detecting phishing scams on the Ethereum network. The model's integration of spatial and temporal features, combined with its ability to outperform several state-of-the-art baselines, positions it as a leading solution in the field of blockchain security and phishing scam detection.

Last updated