Crawler

TravisCode ClimateCodeBeatCoverageHex.pm

A high performance web crawler in Elixir, with worker pooling and rate limiting via OPQ.

Usage

Crawler.crawl("http://elixir-lang.org", max_depths: 2)

Configurations

OptionTypeDefault ValueDescription
:max_depthsinteger3Maximum nested depth of pages to crawl.
:workersinteger10Maximum number of concurrent workers for crawling.
:intervalinteger0Rate limit control - number of milliseconds before crawling more pages, defaults to 0 which is effectively no rate limit.
:timeoutinteger5000Timeout value for fetching a page, in ms.
:user_agentstringCrawler/x.x.x (...)User-Agent value sent by the fetch requests.
:save_tostringnilWhen provided, the path for saving crawled pages.
:parsermoduleCrawler.ParserThe default parser, useful when you need to handle parsing differently or to add extra functionalities.

Features Backlog

Crawler is under active development, below is a non-comprehensive list of features to be implemented.

Changelog

Please see CHANGELOG.md.

License

Licensed under MIT.