Skip to content

Latest commit

 

History

History
24 lines (15 loc) · 838 Bytes

README.md

File metadata and controls

24 lines (15 loc) · 838 Bytes

INDA Project: Web crawler

This web crawler project aims at creating a program that is capable of mapping websites by finding hyperlinks, pointing to other subpages of the current domain, on the page. The crawler itself will be written in Go, and support tools are most likely going to be written in Python.

Core features

The crawler should be able to:

  • request a webpage using a hyperlink,
  • parse webpage content,
  • locate hyperlinks on the page,
  • spread using the found hyperlinks,
  • catalog pages visited.

Installation instructions

Will be added later.

Testing process

For the most part, the crawler will be tested using small, controlled environments; most likely a set of interlinked, text based pages located behind localhost adresses.

Additional analytics tools will be tested using unit testing. testesttest