The PhantomJsCloud API is organized around a REST-like, "JSON API" WebService.
The requests are made by submitting a request.json
payload describing your PageRequest,
and we send back your results (as JPEG, or in another renderType
format you specify in
your UserRequest) along with a HTTP response code indicating any errors,
and HTTP response headers to inform you of important metadata (page cost, etc).
This is the most direct way of interacting with PhantomJsCloud. If you are comfortable
composing GET
or POST
HTTP-Requests directly in your language of
choice, you can use our HTTP Endpoint.
This new API allows full flexibility. Most importantly, it allows simple and straightforward means to type on the keyboard, tap the screen, click with the mouse. See the New Automation API Docs
JSON API
via a strongly typed Node Library. Includes autoscaling helpers
and the new Automation API.
If you use Javascript or Typescript, you can use our Official NPM Module. This can also be used in browsers via the Browserify and Webpack projects.
The examples use the demo ApiKey
a-demo-key-with-low-quota-per-ip-address
to make requests
(Located in the "url line" section of the example).
This demo key is limited to 100 requests per day.
You can create a Free account to get 500 Pages/Day at Dashboard.PhantomJsCloud.com.
Then when following these examples,
replace the demo key with the ApiKey found on your account dashboard page.
For example:
https://PhantomJsCloud.com/api/browser/v2/ak-012345-abcde-012345-abcde-012345/
Most languages provide access to the response statusCode and headers. Please refer to the "Basic Troubleshooting" section (below) for descriptions of these.
These are fully described in the HTTP Endpoint docs, but is pretty-formatted here:
plainText
For Web Content Scraping.
If you need a page's fully rendered DOM, simply saving the HTML source won't cut
it. Use our REST API with this output format (the default) and scrape the resulting
HTML as usual. The page's JavaScript will be fully executed, and all DOM
transformations completed.
jpg
, jpeg
, png
For visual inspection.
If you need to generate page previews, archive screenshots, or create thumbnails,
this renders the page sends the result as JPEG or PNG.
pdf
For Archiving and Reports.
Create a PDF of the page or uploaded HTML, including all images, svg graphics,
headers and footers.
html
Returns the target page in it's "native" form, including all response headers
intact.
Useful for generating static versions of your Single-Page-App / AJAX Data, or for
proxied requests. Very useful for SEO of Facebook / Twitter / Yahoo / Bing web bots.
automation
NEW
For advanced users. To use this properly you must read the Automation API Docs. Allows unparalleled control over the browser:
script
You have three choices for using proxies with PhantomJsCloud:
Here are some examples using the 3 Proxy types:
//POST request JSON payload to use a worldwide anonymous proxy
{ url:"https://phantomjscloud.com/examples/helpers/requestdata", proxy:"anon-any"}
//anonymous proxy from Netherlands
{ url:"https://phantomjscloud.com/examples/helpers/requestdata", proxy:"anon-nl"}
//static IP from USA (35.188.112.61)
{ url:"https://phantomjscloud.com/examples/helpers/requestdata", proxy:"geo-us"}
//use your custom 3rd party proxy
{ url:"https://phantomjscloud.com/examples/helpers/requestdata", proxy:"custom-http://myProxy.com:8838:myname:secret"}
Please refer to the docs for additional proxy configuration details.
Once a resource is loaded, it is normally cached.
This means that another page request that loads the same resource will not make a
network request,
and thus there will not be any load for us to record in the
pageResponses.events
section.
To force all resources to load, you should pass the
pageRequest.requestSettings.clearCache:true
parameter.
This is also helpful if you are making changes to the resource and want to make sure the
newest version is the one used by your call to PhantomJsCloud.
Below is an example of what you would see in the pageResponses.events
section of your JSON response,
if the resource being loaded is https://example.com/resource.css
:
{
"key": "resourceRequested",
"time": "2016-05-24T15:36:50.376Z",
"value": {
"resourceRequest": {
"headers": "OUTPUT SUPPRESSED (Disabled to reduce verbosity of your JSON. You can enable by removing the related entry in your pageRequest.suppressJson settings)",
"id": 172,
"method": "GET",
"time": "2016-05-24T15:36:50.376Z",
"url": "https://example.com/resource.css"
}
}
},
{
"key": "resourceReceived",
"time": "2016-05-24T15:36:50.482Z",
"value": {
"resourceResponse": {
"body": "",
"bodySize": 3654,
"contentType": "application/javascript",
"headers": "OUTPUT SUPPRESSED (Disabled to reduce verbosity of your JSON. You can enable by removing the related entry in your pageRequest.suppressJson settings)",
"id": 172,
"redirectURL": null,
"stage": "start",
"status": 200,
"statusText": "OK",
"time": "2016-05-24T15:36:50.482Z",
"url": "https://example.com/resource.css"
}
}
},
{
"key": "resourceReceived",
"time": "2016-05-24T15:36:50.492Z",
"value": {
"resourceResponse": {
"contentType": "application/javascript",
"headers": "OUTPUT SUPPRESSED (Disabled to reduce verbosity of your JSON. You can enable by removing the related entry in your pageRequest.suppressJson settings)",
"id": 172,
"redirectURL": null,
"stage": "end",
"status": 200,
"statusText": "OK",
"time": "2016-05-24T15:36:50.491Z",
"url": "https://example.com/resource.css"
}
}
},
{
"key": "resourceReceived",
"time": "2016-05-24T15:36:50.492Z",
"value": {
"url": "https://example.com/resource.css",
"status": 200
}
},
If there is a problem with your script you need to debug,
use the outputAsJson:true
parameter then search the output for the term
browserError
which will be under pageResponse.events
.
This should give you an idea of any syntax errors your script may have caused.
These Samples show and explain how to use the overseerScript
to perform advanced automation techniques. To use these properly you should understand:
overseerScript
executes in a secure ES2018 Javascript Sandbox. At least be familiar with the await
keyword (MDN docs here)
If you have a request for another scenario sample, please let us know!
Building off what you learned in the above "How can I load a page, navigate to
another..." sample, Here is an example request.json
that will
login to LinkedIn and capture a screenshot of your home page:
{
"url": "https://www.linkedin.com/uas/login",
"renderType": "jpeg",
"overseerScript":'let _user="USER@EXAMPLE.COM"; let _pass="PASSWORD"; await page.waitForSelector("input#username"); await page.type("input#username",_user,{delay:50}); await page.type("input#password",_pass,{delay:50}); page.click("button[type=submit]"); await page.waitForNavigation();',
}
In the above request, we inject an overseerScript
that:
let _user="USER@EXAMPLE.COM"; let _pass="PASSWORD";
populates the username/password you will use to login.await page.waitForSelector("input#username");
wait
until the "input#username"
element is present in the HTMLawait page.type("input#username",_user,{delay:50}); await page.type("input#password",_pass,{delay:50});
Types the username/password into their respective input
elements slowly like a human would.page.click("button[type=submit]");
Clicks the Submit
button.await page.waitForNavigation();
Waits for a page navigation to occur. (side affect of clicking the button in step #4)Please see the New Automation API docs for more details on this powerful automation workflow, and also for more examples. Also let us know. if you have any questions or need help.
502 Bad Gateway
Errors
If you are getting 502 Bad Gateway
errors frequently, be sure that the
ExpectContinue
header to false. Some platforms (C# and Curl)
set this to true by default, so be sure to change it!
If you get 502 errors frequently and this does not solve your problem, please let us know.
By default PhantomJsCloud waits for your target page to finish loading. If a page has a lot of AJAX (ads, lazy content, etc) it could take a long time.
To make the page finish faster (and thus your API call complete faster) you can try finishing at the page DomContentLoaded
event, in one of these two ways:
overseerScript:'await page.waitForNavigation("domcontentloaded"); page.done()'
to your request.json
. Read more about this technique hererequestSettings:{doneWhen:[{event:"domReady"}]}
to your request.json
. Read more about this technique hereOf the above two methods, we suggest the Automation technique as it allows more flexibility (access to the entire Automation API),
such as adding page.waitForSelector("input#someId")
to ensure a certain DOM element exists.
When processing the results you receive from PhantomJsCloud, be sure you pay attention to the two types of statusCode results. Be aware that these two statusCodes have separate meanings.
Response StatusCode: The HTTP StatusCode returned from
PhantomJsCloud will normally be 200
unless there was a problem
processing the request.
For example: if the target server is offline or if the request is invalid. If there
is a timeout requesting the target URL a 424 Failed Dependency
error
will be returned.
When a Response Failure is sent to you, we try to provide useful data in the
statusCode_Help parameter.
Here is a general description of the Response Status Codes we send and what they mean.
200
: OK The target page was captured properly.
400
: Bad Request Your request had an error in
it. Fix it before resubmitting.401
: Unauthorized You are using an invalid Api
Key. Please check for typos, or create an account.402
: Payment Required Your account is out of
credits. Login and either upgrade your Subscription or add Prepaid Credits.
403
: Forbidden Your request was flagged due to
abuse. Read the response for steps you should take to resolve the situation.
424
: Failed Dependency The target page was not
reachable (the request timed out).
Check and make sure your target URL is valid before retrying, or make sure
your requestSettings.maxWait parameter is set to be long enough.
We just return 424 to inform you that *something* didn't finish loading. If you need more details on what that something was, use the outputAsJson=true parameter and look at the pageResponse.events node, which will show a timeline of sub-resources (request and response).
429
: Too Many Simultaneous Requests
You sent a sudden spike of simultaneous requests.
PhantomJsCloud can handle hundreds of simultaneous requests, but we require you to gracefully
increase the number of concurrent requests over time, not send a sudden spike.
Please increase the number of your simultaneous requests according to the schedule shown in the
'Testing and Performance Optimization' section of the docs page.
(add +1 simultaneous requests every 3 seconds, or +10 simultaneous every 30 seconds).
You may retry this request immediately, with no modifications.
500
: Internal Server Error The PhantomJsCloud instance suffered an
internal error.
You can retry your request immediately, without modifications.
If errors still occur, these are the known causes:
pageRequest.requestSettings.resourceModifier:[{regex:'.*ttf.*|.*otf.*|.*woff.*',isBlacklisted:true}]
.
502
: Bad Gateway Your request did not reach PhantomJsCloud due to
a network failure. You can retry your request immediately, without modifications. If errors
still occur, see the "502 Bad Gateway" Troubleshooting item above.503
: Server Too Busy SERVER TOO BUSY: The serer is temporarily
overwhelmed with other requests, and it's request backlog is very large.
We are returning this to you to prevent risk of a http timeout occurring instead.
You may immediately retry your request. Support@PhantomJsCloud.com has been notified and will
investigate.
You may retry this request with no modifications.
200
(valid).
If obtain results in JSON
format, there is a great deal of useful metadata that is
returned.
If you return your captured results in a different format (PDF, JPEG, HTML, etc) we provide the
most important of these metadata in the form of HTTP Response Headers.
pjsc-billing-credit-cost
: The total cost of this capture.pjsc-billing-daily-subscription-credits-remaining
: The number of Daily
Subscription Credits your account has remaining.pjsc-billing-prepaid-credits-remaining
: The number of Prepaid Credits your
account has remaining.pjsc-billing-total-credits-remaining
: The total number of Credits your account
has remaining (Subscription + Prepaid)pjsc-content-name
: If you were to save the response payload as a file, this is
a suggested name for the file. Example:content.jpeg
pjsc-content-status-code
: See the "Debugging Page Errors: Status Codes" section
above for a description of "Content StatusCode"pjsc-content-url
: The final URL (after redirects) that was captured and
returned to you.pjsc-backend-id
: The PhantomJsCloud instance that handled your request.
Provided for support (debug) purposes.If you feel additional metadata would be useful if returned as part of the response headers, please let us know...
Geolocation lets your requests come from a specific geographic location. We currently support two forms of geolocation:
Here are example request JSON showing how to do the two forms:
//anonymous proxy from Netherlands
{ url:"https://phantomjscloud.com/examples/helpers/requestdata", proxy:"anon-nl"}
//static IP from USA
{ url:"https://phantomjscloud.com/examples/helpers/requestdata", proxy:"geo-us"}
Both forms of Geolocation are performed via our proxy solution. Please read the proxy docs for more details on how to perform Geolocation.
For an updated list of countries we support for Random IP locations, click here. Here is the list as of June 2019:
any "Worldwide (Global)"
au "Australia"
br "Brazil"
cn "China"
de "Germany"
es "Spain"
fr "France"
gb "Great Britain"
in "India"
jp "Japan"
nl "Netherlands"
sg "Singapore"
th "Thailand"
us "United States"
Using Node.js? If you use the official PhantomJsCloud Node.js API Client Library you do not need to rate-limit requests. Autoscaling is handled automatically.
PhantomJsCloud automatically scales it's capacity based on demand, but it still takes a few seconds for the additional capacity to come online when there are spikes in demand. To ensure graceful capacity ramp up, please follow the following guideline:
429
or 503
delay adding
additional parallel requests for 45 seconds.
Wait Interval: By default we set
pageRequest.requestSettings.waitInterval=1000
(1 second).
This padding allows waiting for AJAX or css animations before rendering.
However if you know your page does not require this wait interval, setting
waitInterval=0
will reduce render time (and price) by 1 second.