Skip to content

Onkar-2803/Text_Classification

Repository files navigation

Text_Classification

Text classification is one of the widely used natural language processing (NLP) applications in different business problems. Text classification also known as text tagging or text categorization is the process of categorizing text into organized groups. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content. In the given code we will classify news articles in 5 categories namely Business, Entertainment, Politics, Sport & Technical.

Text_Classification(1): Feature Engineering
Text_Classification(2): Model Training using Support Vector Machine.
Text_Classification(3): Dimensionality Reduction Plots
Text_Classification(4): Model Interpretation
Text_Classification(5): Sample Articles

Dataset:http://mlg.ucd.ie/datasets/bbc.html
It consists of 2.225 documents from the BBC news website corresponding to stories in five topical areas from 2004 to 2005.
The download file contains five folders (one for each category). Each folder has a single .txt file for every news article. These files include the news articles body in raw text. Ready made raw dataset is created combining the 5 text files and stored in Text_Classification/Pickles/News_dataset.pickle

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published