UrlFetcher

Tests

UrlFetcher fetches URLs present in image and anchor tags in a given URL.

Usage

UrlFetcher

UrlFetcher.fetch("https://myawesome.url/page.html") will retrieve all link and image URLs present in https://myawesome.url/page.html, returning them as lists links and assets in UrlFetcher.SiteData struct.

Some options you can provide to the fetcher:

HTTP Client behaviour

HTTP Client behaviour is defined in UrlFetcher.Http.Client. You can choose whatever HTTP client you prefer as long as it complies with that behavior or you implement a wrapper. Note that, by default, HTTP Client must follow redirects.

Installation

The package is available in Hex, and can be installed by adding url_fetcher to your list of dependencies in mix.exs:

def deps do
  [
    {:url_fetcher, "~> 0.2.1"}
  ]
end

Documentation can be found at https://hexdocs.pm/url_fetcher/.

Contributing

Please have a look at the contributing guidelines.

Url Fetcher has some automated CI Github actions that will take care of reviewing any pull request:

Once everything looks good, your PR will be merged. Every push to the main branch will trigger an automated publishing of the package and documentation to hex.

Benchmarking

In order to improve performance it is important to actually benchmark the code. UrlFetcher uses benchee for than. Have a look at benchmark.exs and compare your implementation against the current code before submitting a pull request. Run the benchmark with mix run bin/benchmark.exs <url>.