A headless browser is a piece of software that access webpages, but doesn’t show them to real users. It’s a web browser without a graphical user interface. They are often used to provide the content of webpages to other programs, for instance, it could be used by a program to access a webpage and extract some information like how wide that page is, what font is used or what’s the coordinates of a specific element.
- Run automated tests
- Scraping webpage data
- Take screenshot of webpages
- Automating interaction of webpages
- Perform DDOS attack and increase banner impressions
We are listing all those browsers that provide complete or near-complete headless implementation, and simulate a browser environment.
Table of Contents
Zombie.js is a simulated browser environment for Node.js. It doesn’t render DOM and have limited support for DOM events. It provides a full-featured API to interact with page content, including pressing a submit button. The tool performs faster than full browsers, but is unable to correctly interpret most of the popular websites.
As far as involving resources is considered – retrieving scripts, XHR requests and HTML pages over HTTP and HTTPS, is done behind the scenes. Zombie inspects the history of retrieved resources, which is useful for troubleshooting errors related to resource loading.
X-Ray comes with an entirely composable API that gives you flexibility in how you scrap each webpage. The tool supports an array of objects, nested object structures, strings, pagination, crawler concurrency, strings delays, timeouts, throttles and pluggable drivers.
X-Ray becomes more powerful when users compose multiple instances together. In addition, it supports “collection of collections” allowing users to smartly select all items in all lists.
AngleSharp is a .NET library that allows you to parse angle bracket based hypertexts like SVG, HTML and MathML. The parser is developed as per W3C specification, and CSS can also be parsed. Moreover, the library focuses on standards compliance, interactivity and extensibility.
AngleSharp features useful abstractions (type helpers), fully-functional DOM, form submission, navigation, enhanced LINQ, and standards conform. It has been developed as PCL that supports a wide range of platforms, including .NET framework 4.5, Windows 8.1, Xamarin.Android/iOS, and Silverlight 5.
SlimerJS runs Gecko, the browser engine of Mozilla. It is not natively headless yet, but you can make it headless yourself with the used of xvfb under Linux.
Using Splash, you can process multiple webpages in parallel, write Lua browsing scripts, get detailed rendering data in HAR format, turn off images, or use Adblock rules to make rendering faster, and develop Lua scripts in Splash-Jupyter Notebooks.
Most of the API part is coded as a port of PhantomJS, which is perfect for automated testing. If you are familiar with phantom, then you already know how to use TrifleJS.
Since IE is losing market share continuously, the whole project will become useless if Microsoft decides to drop IE.
HtmlUnit is capable enough to deal with basic HTTP authentication, automatic page redirection and HTTPS security. It lets Java test code to analyze returned pages either as XML DOM, text or as a collection of links, forms and tables. Moreover, the implementation speed of HtmlUnitDriver is faster as compared to other WebDriver.
PhantomJS an optional solution for headless webpage testing, screen capturing, page automation and network monitoring. It is often used as a way to automate attacks against websites, and it mimics legitimate user traffic and can complicate attack mitigation technologies.
World’s most popular browser, Google Chrome, now supports headless environment (version 59 and above). It brings all modern web platform functionalities offered by Chromium and the Blink rendering Engine to the command line.
Since there are some useful command line flags, you don’t need to programmatically script Headless Chrome. It can be developed as a library for embedding into a C++ application, which is similar to controlling the browser over a DevTools connection, but it offers more customization keys like for mojo services and networking.
The Embedder API allows you to add headless library into your app. It provides default implementation for low level adaption points like networking and the run loop. Moreover, with headless client API, you can drive the browser and interact with loaded web pages.