INDA Project: Web crawler

This web crawler project aims at creating a program that is capable of mapping websites by finding hyperlinks, pointing to other subpages of the current domain, on the page. The crawler itself will be written in Go, and support tools are most likely going to be written in Python.

Core features

The crawler should be able to:

request a webpage using a hyperlink,
parse webpage content,
locate hyperlinks on the page,
spread using the found hyperlinks,
catalog pages visited.

Installation instructions

Will be added later.

Testing process

For the most part, the crawler will be tested using small, controlled environments; most likely a set of interlinked, text based pages located behind localhost adresses.

Additional analytics tools will be tested using unit testing. testesttest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

INDA Project: Web crawler

Core features

Installation instructions

Testing process

Files

README.md

Latest commit

History

README.md

File metadata and controls

INDA Project: Web crawler

Core features

Installation instructions

Testing process