Funkspector
Web page inspector for Elixir.
Funkspector is a web scraper that lets you extract data from web pages and XML sitemaps.
Usage
Page Scraping
Simply pass Funkspector the URL of a web page to inspect and it will return its scraped data:
iex> { :ok, data } = Funkspector.page_scrape("http://github.com")Sitemap Scraping
Funkspector can extract the locations from XML sitemaps, like this:
iex> { :ok, data } = Funkspector.sitemap_scrape("http://validationhell.com/sitemap.xml")Custom options
Both Funkspector.page_scrape and Funkspector.sitemap_scrape accept options to customize the timeout and User Agent string.
For example, you could use:
Funkspector.page_scrape("http://github.com", %{recv_timeout: 5_000, user_agent: "My Bot"})
Funkspector.sitemap_scrape("http://validationhell.com/sitemap.xml", %{recv_timeout: 5_000, user_agent: "My Bot"})Scraped data
Currently Funkspector returns this scraped data both from pages and sitemaps:
headers. Response headers, including content-type etc.body. Raw body.original_urlandfinal_url. Funkspector follows redirections, here are the original URL given and the final one after following the redirections.scheme. Like, "http" or "https".host. Like, "github.com".root_url. Root url for the given URL. Forhttp://example.com/aboutit will behttp://example.com.
The PageScraper also returns:
links. Organized inraw,http.internal,http.externalandnon_http.
The SitemapScraper also returns:
locs. Collection ot URLs.
Error response
In case of error, Funkspector will return the original_url and the reason from the server:
case Funkspector.page_scrape("http://example.com") do
{ :ok, data } ->
IO.inspect(data)
{ :error, url, reason } ->
IO.puts "Could not scrape #{url} because of #{reason}"
endInstallation
If available in Hex, the package can be installed as:
Add funkspector to your list of dependencies in
mix.exs:def deps do
[{:funkspector, "~> 0.1"}]end
Ensure funkspector is started before your application:
def application do
[applications: [:funkspector]]end