Testing Topological NLP Transformers on Text Message Data to Determine the Best Online Phishing Detection Technique
Keywords:
Dependency parsing, Phishing, Topological transformer processing, Transfer learning.Abstract
This study uses topological sentence transformer techniques to create an ideal classification model for online
SMS spam detection. Malicious actors' increasingly complex and disruptive actions are the reason behind the study. We
offer a workable and lightweight way to combine sklearn capability with pre-trained NLP repository models. The study
design presents a user-extensible spam SMS solution and replicates the spaCy pipeline component architecture in a
downstream sklearn pipeline implementation. We use linguistic NLP transformer approaches to short-sentence NLP
datasets and use HuggingFace (RoBERTa-base) large-text data models via spaCy. Using a normal sklearn pipeline
architecture, we iteratively retest models and compare their F1-scores. An optimal F1-score of 0.938 is obtained by
applying spaCy transformer modeling; this result is comparable to research output from modern BERT/SBERT/'black box'
prediction models. Using semantically similar paraphrase/sentence transformer techniques, this study presents a
lightweight, user-interpretable, standardized, predictive SMS spam detection model that produces the best F1-scores for an
SMS dataset. For a Twitter assessment set, significant F1-scores are also produced, suggesting possible real-world
applicability.