It extracts every article along with the heading stores it in a New file in a txt format and then does a sentiment analysis of the provided article which has the below > fields
- "url"
- "Positive Sentences"
- "Negative Sentences"
- "Polarity"
- "Subjectivity",
- "Average Sentence Length"
- "Complex Word Percentage",
- "Fog Index"
- "Average WordLength"
- "Complex Word Count"
- "Word Count""Syllable Count"
- "Personal Pronouns"
- beautifulSoup
- Requests
- Pandas
- os
- nltk
- re
- string
- Clone this repository to your local machine.
- Install the required dependencies by running >
pip install -r requirements.txt
.
- stopword folder
- dict_negative
- dict_positive
- inputfile.xlsx
- for storing the created text file for every article scraped
FOLLOW THESE STEP
- change the paths of the stopwords folder and files
def initialization(): #paths initialize them according to the location of your data stopword_folder=r"StopWords" #folder not "file" dictionary_postive=r"positivewords.txt" dictionary_negative=r"negativewords.txt" for filename in os.listdir(stopword_folder): with open(os.path.join(stopword_folder, filename), 'r') as file: stopw.update([word.lower() for word in file.read().splitlines()])
- change path of input file provide a csv file with urls
def file_open(): filepath = r"input.xlsx" df = pd.read_excel(filepath) dataset=list()