How many times have you struggled to get Puppeteer or PhantomJs to render your page properly? PhantomJsCloud manages all that complexity, giving you the right results the first time. No crashing from memory leaks, and no blank pages from missing Fonts or AJAX requests.
Most website scraping tools do not load resources or execute JavaScript the way browsers do. Because of this, the results of data extraction is usually sub-par. PhantomJsCloud solves this by acting just like a browser (because it is a browser!) loading and executing resources in exactly the same way as a normal web user.
Because PhantomJsCloud is a Browser as an API you are provided with total control of the input and resources loaded.
Each request is billed at $0.15/Hour and $0.25/GB Output. This is around $0.000095/Page, or about $1.00 for every 10,500 pages. Subscribers get a volume discount, making it even cheaper. See Pricing for details.
Using a proxy allows you to control the IP Address that your browser requests use. While you can use a custom proxy, we also have a built in proxy to handle common needs. Docs for the builtin proxy solution can be found here.
Built-in Proxy Choices:
(35.188.112.61)
from the USA: Allows
whitelisting the PhantomJS
Cloud service in your firewall. Here's
an example. Our service adds about 200ms to the overhead of a normal browser request. As PhantomJsCloud is geographically distributed (East Asia, USA, and Western Europe), this means that you'll always get fast results no matter where you or your target page are located. If you have a high number of pages to render (millions), PhantomJsCloud will automatically spin-up backend workers when demand increases. Please see the API Docs "Testing and Performance Optimization" Section for more details.
PhantomJsCloud has proven reliability: greater than 99.999% uptime (View Uptime Report). If you need additional Enterprise-grade features like a Private Cloud + SLA or a Premium Support Plan, please see our Enterprise Features Page.
Scraping the raw HTML source of a page is fast, but if you want to execute the page's JavaScript, you need to use a browser to ensure it's executed the same way a user would see it. PhantomJsCloud uses PhantomJS WebKit instances to fully load resources and execute scripts prior to scraping it's contents.
We support many different output formats to meet your needs:
PLAINTEXT
For Web Content Scraping.
If you need a page's fully rendered DOM, simplly saving the HTML source won't cut it. Use our REST API with this output format (the default) and scrape the resulting HTML as usual. The page's JavaScript will be fully executed, and all DOM transformations completed.
JPEG / PNG
For visual inspection.
If you need to generate page previews, archive screenshots, or create thumbnails, this renders the page sends the result as JPEG or PNG.
For Archiving and Reports.
Create a PDF of the page or uploaded HTML, including all images, svg graphics, headers and footers.
HTML / RAW
Returns the target page in it's "native" form, including all response headers intact.
Useful for generating static versions of your Single-Page-App / AJAX Data, or for proxied requests. Very useful for SEO of Facebook / Twitter / Yahoo / Bing web bots.
JSON
For access to page Metadata and greatest flexibility
When outputting in JSON, you not only get your HTML, PDF, IMAGE, or RAW result, but are also sent full details about sub-resource load times, page response codes, and even the exact settings you used to make the request.
AUTOMATION
A special JSON output for advanced users, allowing unparalleled control over the browser:
If you have ever tried to use open-source headless browsers like Slimer.js or Phantom.js, You'll know that simple things like "Did the page finish loading?" are difficult problems to solve. PhantomJsCloud solves these problems for you so you can focus on other things.
With PhantomJsCloud, here's just a few of the things we take care of for you:
Most PhantomJS settings and rendering options are exposed through our REST API, and are configurable per page-request.
Export IFrame Content: Choose to Capture your page as JSON to get the
contents of every IFrame, regardless of cross-site security restrictions. Read the
PageRequest.renderSettings.renderIFrame
docs for more details.
Blacklist or replace sub-resources of the page like .css
files or ad network scripts.
CORS and JSONP is supported, allowing you to use the PhantomJs Cloud service directly in your web application.