EmailSpamDetector: Spam Detection with Bidirectional LSTMs

EmailSpamDetector uses a Bidirectional LSTM network to classify emails as spam or legitimate (ham). By processing email text in both forward and backward directions simultaneously, the model captures richer contextual signals than a unidirectional LSTM — improving detection of obfuscated spam patterns.

Why Bidirectional LSTMs?

Spam emails often insert legitimate-looking words at the start or end to fool unidirectional models. A Bi-LSTM processes the entire sequence from both ends, so a suspicious phrase in the middle of a message is still informed by both preceding and following context.

Pipeline

Preprocessing: tokenise email text, remove stop words, apply padding/truncation to a fixed sequence length.
Embedding: word embeddings (trainable or pre-trained GloVe) map tokens to dense vectors.
Bi-LSTM: two LSTM layers (forward + backward) whose outputs are concatenated.
Classification: dense layer with sigmoid activation for binary spam/ham prediction.

Results

Evaluated on public email spam datasets (SpamAssassin, Enron). The Bi-LSTM achieves high precision and recall, with strong resistance to adversarial word-order manipulation common in spam.

Technology

Python, Keras (TensorFlow), Jupyter Notebooks. Pre-processing with NLTK.

Alessio Borgi

Why Bidirectional LSTMs?

Pipeline

Results

Technology