80

A Puppeteer bridge for PHP

 5 years ago
source link: https://www.tuicool.com/articles/hit/nEVFnie
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

PuPHPeteer

2IvMvm2.png!web

A Puppeteer bridge for PHP, supporting the entire API. Based on Rialto , a package to manage Node resources from PHP.

Here are some examples borrowed from Puppeteer's documentation and adapted to PHP's syntax:

Example- navigating to https://example.com and saving a screenshot as example.png :

use Nesk\Puphpeteer\Puppeteer;

$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();

$page = $browser->newPage();
$page->goto('https://example.com');
$page->screenshot(['path' => 'example.png']);

$browser->close();

Example- evaluate a script in the context of the page:

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

$puppeteer = new Puppeteer;

$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto('https://example.com');

// Get the "viewport" of the page, as reported by the page.
$dimensions = $page->evaluate(JsFunction::create("
    return {
        width: document.documentElement.clientWidth,
        height: document.documentElement.clientHeight,
        deviceScaleFactor: window.devicePixelRatio
    };
"));

printf('Dimensions: %s', print_r($dimensions, true));

$browser->close();

Requirements and installation

This package requires PHP >= 7.1 and Node >= 8.

Install it with these two command lines:

composer require nesk/puphpeteer
npm install @nesk/puphpeteer

Notable differences between PuPHPeteer and Puppeteer

Puppeteer's class must be instanciated

Instead of requiring Puppeteer:

const puppeteer = require('puppeteer');

You have to instanciate the Puppeteer class:

$puppeteer = new Puppeteer;

This will create a new Node process controlled by PHP.

You can also pass some options to the constructor, see Rialto's documentation .

Note:If you use some timeouts higher than 30 seconds in Puppeteer's API, you will have to set a higher value for the read_timeout option (default: 35 ):

$puppeteer = new Puppeteer([
    'read_timeout' => 65, // In seconds
]);

$puppeteer->launch()->newPage()->goto($url, [
    'timeout' => 60000, // In milliseconds
]);

No need to use the await keyword

With PuPHPeteer, every method call or property getting/setting is synchronous.

Some methods have been aliased

The following methods have been aliased because PHP doesn't support the $ character in method names:

  • $ => querySelector
  • $$ => querySelectorAll
  • $x => querySelectorXPath
  • $eval => querySelectorEval
  • $$eval => querySelectorAllEval

Use these aliases just like you would have used the original methods:

$divs = $page->querySelectorAll('div');

Evaluated functions must be created with JsFunction

Functions evaluated in the context of the page must be written with the JsFunction class , the body of these functions must be written in JavaScript instead of PHP.

use Nesk\Rialto\Data\JsFunction;

$pageFunction = JsFunction::create(['element'], "
    return element.textContent;
");

Exceptions must be catched with ->tryCatch

If an error occurs in Node, a Node\FatalException will be thrown and the process closed, you will have to create a new instance of Puppeteer .

To avoid that, you can ask Node to catch these errors by prepending your instruction with ->tryCatch :

use Nesk\Rialto\Exceptions\Node;

try {
    $page->tryCatch->goto('invalid_url');
} catch (Node\Exception $exception) {
    // Handle the exception...
}

Instead, a Node\Exception will be thrown, the Node process will stay alive and usable.

License

The MIT License (MIT). Please see License File for more information.

Logo attribution

PuPHPeteer's logo is composed of:

Thanks to Laravel News for picking the icons and colors of the logo.


Recommend

  • 140

    Page Not Found Looks like you've followed a broken link or entered a URL that doesn't exist on this site.

  • 200
    • 掘金 juejin.im 6 years ago
    • Cache

    无头浏览器 Puppeteer 初探

    作者简介 轻声 蚂蚁金服数据前端 我们日常使用浏览器的步骤为:启动浏览器、打开一个网页、进行交互。而无头浏览器指的是我们使用脚本来执行以上过程的浏览器,能模拟真实的浏览器使用场景。 有了无头浏览器,我们就能做包括但不限于以下事情: 对网页进行截图保存...

  • 175

    A Guide to Automating & Scraping the Web with JavaScript (Chrome + Puppeteer + Node JS)Learn to Automate and Scrape the web with Headless Chrome

  • 182
    • blog.fundebug.com 6 years ago
    • Cache

    Puppeteer之爬虫入门 | Fundebug博客

    Puppeteer之爬虫入门译者按: 本文通过简单的例子介绍如何使用 Puppeteer 来爬取网页数据,特别是...

  • 179

    很早很早之前,前端就有了对 headless 浏览器的需求,最多的应用场景有两个 UI 自动化测试:摆脱手工浏览点击页面确认功能模式 爬虫:解决页面内容异步加载等问题 也就有了很多杰出的实现,前端经常使用的莫过于

  • 117
    • www.jianshu.com 6 years ago
    • Cache

    Puppeteer 入门 - 简书

    Puppeteer的GitHub链接 本文是对该链接的翻译,扩充解释和举例说明 Puppeteer 是谷歌公司最近推出的基于Node开发的一套高级API库,通过开发协议来控制无界面的浏览器。 通俗的说就是有这么一套API, 可以用来控制浏览器的行为,比如打开网页,查看控件/文本,填入...

  • 113

    Scraping the Web with Puppeteer: Lessons LearnedI'm currently contracted to create a web service using some data from a third party Angular application. I worked off a proof of concept codebase that used Chrome's new

  • 100
    • jeffjade.com 6 years ago
    • Cache

    大前端神器安利之 Puppeteer

  • 120

    Puppeteer(中文翻译"木偶") 是 Google Chrome 团队官方的无界面(Headless)Chrome 工具,它是一个 Node 库,提供了一个高级的 API 来控制 DevTools协议上的无头版 Chrome 。也可以配置为使用完整(非无头)的 Chrome。Chrome 素来在浏览器界稳执牛耳,因此,Chrom...

  • 82

    headless-devtools Lets you use Chrome DevTools from code. npm install headless-devtools Motivation Chrome DevTools is an indispensable tool for analyzing your Web application. headless-d...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK