Phishing Emails Detection

This project aims to detect phishing emails using federated learning for OS Android. The application processes emails for feature extraction and uses those features in a machine learning process as a dynamicly created datasets for phishing email classification. It also allows training and retraining of the model on new data, evaluating models, and includes a federated server for model`s weight management.

Installation

Android App

To install and set up the Android application, follow these steps:

Clone the repository:

 git clone https:/your-username/phishing-emails-detection.git

Install the app through Android Studio:
Open the cloned project in Android Studio.
Set up debug key:

Open File -> Project Structure.
Navigate to SDK Location -> Debug keystore.
Set the path to the debug.keystore file in the root directory.

Build and run the app:

Click Run -> Run 'app'.
Choose your device or an emulat Note: This app is currently in development mode and limited to test users.

For test access, contact [email protected].

Federated Server

To set up the federated server, follow these steps:

Prerequisites

Python 3.8
pip3

Create and activate a Python virtual environment:

  cd server
  python3.8 -m venv env_server
  source ./env_server/bin/activate

Install dependencies and run the server:

  pip install -r requirements.txt
  python server.py

Usage

Email Processing

The app can import emails from various sources and process them for feature extraction.

Import Emails

Gmail Import: Users can use their Google account to import emails directly from Gmail.

EML Import: Users can import individual .eml files.

MBOX Import: Users can import .mbox files containing multiple emails.

When importing, users are asked to label the emails as phishing or safe.

Email Packaging

Email Packaging: Users can combine multiple emails into packages for processing.

Machine Learning

The app provides several features for machine learning, including feature extraction, training, and retraining.

Feature Extraction

Feature Extraction: Users can extract features from emails using Python integration.

Training

Training: Users can train the model on the extracted features.

Retraining

Retraining: Users can retrain the model with new data.

Model Evaluation

Model Evaluation: Users can evaluate the performance of the trained model.

Phishing Detection

Phishing Detection: Users can use the selected model to classify a single email as phishing or safe using logistic regression.

Federated Server Usage

The federated server handles weight management for federated learning.

Endpoints

Upload Weights: Users can upload the local model weights.

Download Global Weights: Users can download the globally averaged weights.
Check Server Status: Users can ping the server to check its status.

Features

Google Login: Users can log in using their Google account.
Logout: Users can log out from their account.
Integration with Gmail API: Seamless integration with Gmail API for importing emails.
Email Import: Users can import emails from Gmail, .eml, and .mbox files.
Email Labeling: Users can label imported emails as phishing or safe.
Email Packaging: Combine multiple emails into packages for processing.
Feature Extraction: Extract features from emails using integrated Python scripts.
Machine Learning:
- Training: Train the model on extracted features.
- Retraining: Retrain the model with new data.
- Model Evaluation: Evaluate the performance of trained models.
Phishing Detection: Classify individual emails as phishing or safe using logistic regression.
Federated Learning:
- Upload Weights: Upload local model weights to the federated server.
- Download Weights: Download globally averaged weights from the server.
- Server Status: Check the operational status of the federated server.
- Set Federated Server IP: Dynamically set the IP address of the federated server.

Architecture

The project is structured to separate concerns and ensure modularity. Below is an overview of the main directories and their purposes:

Key Components:

Data: Contains data-related classes, repositories, and entities for handling email data.
- Local: Local data sources and caches.
- Remote: Manages remote data sources, such as API calls.
- Repositories: Interfaces for data access and management.
- Auth: Handles user authentication.
- DB: Database configurations and access.
  - Entity: Entity classes representing different data models such as EmailFull, EmailMinimal, EmailPackageMetadata, etc.
Python: Contains Python scripts and modules for machine learning and data processing.
- DataProcessing: Scripts for processing email data.
- EvaluateModel: Scripts for evaluating models.
- Prediction: Scripts for making predictions.
- Retraining: Scripts for retraining models.
- Training: Scripts for training models.
- WeightManager: Manages model weights.
- PythonSingleton: Singleton class for Python which starts and holds Python interpreter.
DI: Dependency injection modules.
- AppModule: Provides application-wide dependencies.
- DatabaseModule: Provides database-related dependencies.
- NetworkModule: Provides network-related dependencies.
UI: User interface components.
- Base: Base classes for UI components.
- component: Specific UI components for authentication, email detection, machine learning, and settings.
- App: Main application class.
- MainActivity: Main activity of the application.
- Utils: Utility classes and functions.

Python Component

Feature Finders and Detection Strategy

Our phishing detection uses several feature finders, each responsible for extracting specific elements from emails that are commonly used by phishing attempts:

HTMLFormFinder: Identifies HTML forms within emails, a common phishing vector to solicit user information.
IFrameFinder: Detects the use of IFrames, potentially embedding malicious content invisibly.
FlashFinder: Searches for Flash content links, which could execute harmful scripts.
AttachmentFinder: Counts email attachments, which may contain malicious payloads.
HTMLContentFinder: Looks for specific HTML content indicative of phishing.
URLsFinder: Extracts and evaluates URLs found within emails for malicious links.
ExternalResourcesFinder: Identifies external resources linked within emails that could be harmful.
JavascriptFinder: Detects JavaScript, which can be used in phishing for malicious activities.
CssFinder: Searches for custom CSS that might be used to disguise phishing attempts.
IPsInURLs: Checks for IP addresses in URLs, a technique used to bypass domain name suspicion.
AtInURLs: Identifies '@' symbols in URLs, which can be a sign of deceptive links.
EncodingFinder: Analyzes the content encoding for signs of obfuscation or unusual patterns.

Acknowledgments and References

This project builds upon and extends the work found at MachineLearningPhishing by Diego Ocampo.

Data Sources

The data used for training the phishing detection model were sourced from two main repositories, which provided a rich dataset of phishing emails:

Phishing Pot Dataset by rf-peixoto (converted .eml to mbox using scripts in this repo)
Phishing Dataset by jose at monkey.org (downloaded mbox files)

Contributing

If you want to contribute to this project, please follow these guidelines:

Fork the repository.
Create a new branch.
Make your changes and commit them.
Push your changes to your fork.
Create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
app		app
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
debug.keystore		debug.keystore
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phishing Emails Detection

Table of Contents

Installation

Android App

Federated Server

Prerequisites

Usage

Email Processing

Import Emails

Email Packaging

Machine Learning

Feature Extraction

Training

Retraining

Model Evaluation

Phishing Detection

Federated Server Usage

Endpoints

Features

Architecture

Key Components:

Python Component

Feature Finders and Detection Strategy

Acknowledgments and References

Data Sources

Contributing

License

About

Releases 1

Packages

Languages

License

martinszuc/phishing-emails-detection

Folders and files

Latest commit

History

Repository files navigation

Phishing Emails Detection

Table of Contents

Installation

Android App

Federated Server

Prerequisites

Usage

Email Processing

Import Emails

Email Packaging

Machine Learning

Feature Extraction

Training

Retraining

Model Evaluation

Phishing Detection

Federated Server Usage

Endpoints

Features

Architecture

Key Components:

Python Component

Feature Finders and Detection Strategy

Acknowledgments and References

Data Sources

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages