• Programming,
  • Parallel programming
  • I recently tried pthreads and was pleasantly surprised - it's an extension that adds the ability to work with multiple real threads in PHP. No emulation, no magic, no fakes - everything is real.



    I am considering such a task. There is a pool of tasks that need to be completed quickly. PHP has other tools for solving this problem, they are not mentioned here, the article is about pthreads.



    What are pthreads

    That's all! Well, almost everything. In fact, there is something that may upset an inquisitive reader. None of this works on standard PHP compiled with default options. To enjoy multithreading, you must have ZTS (Zend Thread Safety) enabled in your PHP.

    PHP setup

    Next, PHP with ZTS. Don't mind the big difference in execution time compared to PHP without ZTS (37.65 vs 265.05 seconds), I didn't try to generalize the PHP setup. In the case without ZTS, I have XDebug enabled for example.


    As you can see, when using 2 threads, the speed of program execution is approximately 1.5 times higher than in the case of linear code. When using 4 threads - 3 times.


    You can note that even though the processor is 8-core, the program execution time remained almost unchanged if more than 4 threads were used. It seems that this is due to the fact that my processor has 4 physical cores. For clarity, I have depicted the plate in the form of a diagram.


    Summary

    In PHP, it is possible to work quite elegantly with multithreading using the pthreads extension. This gives a noticeable increase in productivity.

    Tags:

    • php
    • pthreads
    Add tags

    The article describes the organization of multi-requests using PHP using the cURL library. This mechanism is supposed to be used to create scripts that make automated requests to multiple web servers.

    In their practice, webmasters often have to use software robots that perform regular or mass requests for web pages, filling out registration forms, or performing other similar actions. Traditionally and quite justifiably, the PHP language and the cURL library are used for this purpose, which are installed on almost all web servers. The cURL library, in essence, is an overlay on sockets and is just a convenient-to-use service for generating an http request depending on the specified parameters of the programmer.

    In cases where it is necessary to make a request to one web server, the usual cURL tools are quite sufficient, but if you need to generate a large number of web requests, then the use of the multithreading mechanism can provide a significant increase in performance and speed up the script.

    Before we begin to describe the mechanism for developing scripts, first let us know what I mean by multithreading. The point here is that there is actually no multithreading in PHP and when the term “ multithreading» regarding the cURL library, we are talking about multiqueries.

    The mechanism of multi-requests is that when sending requests to web servers, PHP does not wait for a response from each request sent in turn, but sends (again alternately) several requests at once, and only after that processes the responses coming from them. Therefore, it makes sense to use multithreading only when requests are made to different servers - if it is necessary to make a large number of requests to one server, then multithreading will not bring a noticeable increase in script performance.

    I would like to note right away that the tools for working with multithreading in cURL are very scarce, but even with those that are available, you can organize full-fledged work with multi-requests.

    So, now about practice... Let's consider an example when you need to download a large number of web pages in order, for example, to check the presence of a backlink code on them. To do this you will need the following:

    1. Place the list of all URIs in an array
    2. Create an array of “regular” cURLs in the required quantity (number of threads) and one cURL_multi
    3. Initialize each created cURL (URL from the previously prepared array, post variables, if required, proxies, etc.)
    4. Add each cURL to cURL_multi
    5. Launch all threads using the cURL_multi call
    6. In the loop, we poll the state of cURL_multi and if there is a completed thread, we process the resulting page and launch a new cURL in its place. If the list of URIs is over, then we only process the result. The loop continues as long as there is at least one unfinished thread.
    7. Close all cURL.

    Now, in fact, the script that performs this operation:

      function Parse(&$urls ,$flowcount ) (

      // $urls - array with URLs

      // $flowcount - number of threads

      //Start threads

      $ch = array () ;

      $lcount0 =count($urls);

      if ($flowcount >$lcount0 ) $flowcount =$lcount0 ;

      for ($flow =0 ;$flow<$flowcount ;$flow ++) $ch =curl_ini(array_pop ($urls ) ) ; //creating a cURL array

      $mh =curl_multi_init() ; //creating cURL_multi

      for ($flow =0 ;$flow<$flowcount ;$flow ++) { //In this loop, cURL is initialized

      curl_setopt($ch [ $flow ],CURLOPT_REFERER,‘TESTREFERER’) ;

      curl_setopt($ch [ $flow ],CURLOPT_USERAGENT,” ) ;

      curl_setopt($ch [ $flow ],CURLOPT_RETURNTRANSFER,1) ;

      curl_setopt($ch [ $flow ],CURLOPT_POST,1) ;

      curl_setopt($ch [ $flow ],CURLOPT_POSTFIELDS,‘TEST=TESTVAR’ ) ;

      curl_setopt($ch [ $flow ],CURLOPT_COOKIE,‘TEST=TESTCOOKIE’) ;

      curl_multi_add_handle($mh ,$ch [ $flow ] ) ;

      $flows =null ;

      do ( //The main loop continues as long as there is at least one working thread

      do curl_multi_exec($mh ,$flows ) ; while ($flows ==$flowcount) ; //cyclically check the number of running threads

      $info =curl_multi_info_read($mh ) ;

      if (!count ($urls) ) ( //No more URLs to process

      curl_close($info [ 'handle' ] ) ;

      $flowcount –;

      ) else ( //There is still a URL to process

      curl_setopt($info [ 'handle' ] ,CURLOPT_URL,array_pop ($urls ) ) ;

      $res =curl_multi_getcontent($info [ 'handle' ] ) ;

      curl_multi_remove_handle($mh ,$info [ 'handle' ] ) ;

      There are enough comments in the code text to understand what is happening. Let me clarify a few points...

      1. The curl_multi_init call must be made after all “regular” cURLs have been initialized, i.e. It is impossible to swap the 9th and 10th lines, so the code sections for initializing $ch and setting the necessary parameters are separated.

      2. Each time curl_multi_exec is called on line 22, the $flows variable is set to the number of active threads, which is then compared to the number of running threads (the $flowcount variable will be decremented if there are no more entries in the list of URLs to process (the $urls array).

      3. curl_multi_info_read returns information about the next processed thread, or false if there have been no changes since the previous call to this function.

      4. The curl_multi_info_read function updates the data placed in the $info variable only after curl_multi_exec is executed, so both functions must be used to process each thread.

      5. To add a new thread, you must sequentially call three functions: curl_multi_remove_handle, curl_multi_add_handle and curl_multi_exec.

      And one last thing: sometimes it is important to know some additional information related to the stream being processed. In this case, you can create an associative array whose keys will be the thread identifiers, i.e. values ​​in $info['handle'].

    In programming you constantly have to work with various resources: files, sockets, http connections. And they all have some kind of access interface, often incompatible with each other. Therefore, in order to eliminate these inconsistencies and unify work with various data sources, starting with PHP 4.3 were invented PHP Streams - streams.

    Although PHP 4.3 came out a long time ago, many PHP programmers have a very vague idea about threads in PHP, and continue to use CURL everywhere, although in PHP There is a more convenient alternative for this in the form Stream Context.

    The following types of streams exist in PHP:

    • File on hard drive;
    • HTTP connection with a website;
    • Compound UDP with the server;
    • ZIP file;
    • File * .mp3.

    What do all these resources have in common? All of them can be read and written, i.e. read and write operations can be applied to all of them. Force PHP streams The trick is that you can access all of these resources using the same set of functions. It is very comfortable. Also, if such a need suddenly arises, you can write your own implementation of the thread handler "stream wrapper". In addition to reading and writing, streams in PHP also allows you to perform other operations such as renaming and deleting.

    Programming on PHP, You have already encountered threads, although you may not have realized it. So, functions that work with threads are fopen(), file_get_contents(), file() etc. So, in fact, you are already using file streams all this time, completely transparently.

    To work with another type of stream, you must specify its protocol (wrapper) in the following way: wrapper://some_stream_resource, Where wrapper://- this is, for example http://, file://, ftp://, zip:// etc., and some_stream_resource - URI address, identifies what you want to open. URI address does not impose any restrictions on the format. Examples:

    • http://site/php-stream-introduction.html
    • file://C:/Projects/rostov-on-don.jpg
    • ftp://user: [email protected]/pub/file.txt
    • mpeg://file:///music/song.mp3
    • data://text/plain;base64,SSBsb3ZlIFBIUAo=

    However, please note that not all protocols and handlers may work for you, since support for some shells depends on your settings. Therefore, to find out which protocols are supported, you need to run the following script:

    // list of registered socket transports
    print_r(stream_get_transports());

    // list of registered threads (handlers)
    print_r(stream_get_wrappers());

    // list of registered filters
    print_r(stream_get_filters();

    PHP Thread Contexts

    Often there is a need to specify additional parameters when making an http request. Thread contexts solve this problem by allowing you to specify additional parameters. Many thread-aware functions have an optional thread context parameter. Let's look at the function file_get_contents():

    String file_get_contents(string $filename [, int $flags = 0 [, resource $context [, int $offset = -1 [, int $maxlen = -1]]]])

    As you can see, the thread context is passed as the third parameter. Contexts are created using the function stream_context_create(), which takes an array and returns a context resource.

    $options = array(
    "http" => array(
    "method" => "GET",
    "header" => "Accept-language: en\r\n".
    "Cookie: foo = bar\r\n"
    );

    $context = stream_context_create($options);

    // Using this with file_get_contents ...
    echo file_get_contents("http://www.example.com/", 0, $context);

    So today we learned what it is threads and thread contexts in PHP, looked at examples of their use, and in the following articles we will talk about stream metadata and create our own handler.

    Sometimes it becomes necessary to perform several actions simultaneously, for example, checking changes in one database table and making modifications to another. Moreover, if one of the operations (for example, checking changes) takes a lot of time, it is obvious that sequential execution will not ensure resource balancing.

    To solve this kind of problem, programming uses multithreading - each operation is placed in a separate thread with an allocated amount of resources and works inside it. With this approach, all tasks will be completed separately and independently.

    Although PHP does not support multithreading, there are several methods for emulating it, which will be discussed below.

    1. Running several copies of the script - one copy per operation

    //woman.php if (!isset($_GET["thread"])) ( system("wget ​​http://localhost/woman.php?thread=make_me_happy"); system("wget ​​http://localhost/ woman.php?thread=make_me_rich"); ) elseif ($_GET["thread"] == "make_me_happy") ( make_her_happy(); ) elseif ($_GET["thread"] == "make_me_rich") ( find_another_one( ; )

    When we execute this script without parameters, it automatically runs two copies of itself, with operation IDs ("thread=make_me_happy" and "thread=make_me_rich"), which initiate the execution of the necessary functions.

    This way we achieve the desired result - two operations are performed simultaneously - but this, of course, is not multithreading, but simply a crutch for performing tasks simultaneously.

    2. Path of the Jedi - using the PCNTL extension

    PCNTL is an extension that allows you to fully work with processes. In addition to management, it supports sending messages, checking status and setting priorities. This is what the previous script using PCNTL looks like:

    $pid = pcntl_fork(); if ($pid == 0) ( make_her_happy(); ) elseif ($pid > 0) ( $pid2 = pcntl_fork(); if ($pid2 == 0) ( find_another_one(); ) )

    It looks quite confusing, let's go through it line by line.

    In the first line, we “fork” the current process (fork is copying a process while preserving the values ​​of all variables), dividing it into two processes (current and child) running in parallel.

    To understand whether we are currently in a child or a mother process, the pcntl_fork function returns 0 for the child and the process ID for the mother. Therefore, in the second line, we look at $pid, if it is zero, then we are in the child process - we are executing the function, otherwise, we are in the mother (line 4), then we create another process and similarly perform the task.

    Script execution process:

    Thus, the script creates 2 more child processes, which are its copies and contain the same variables with similar values. And using the identifier returned by the pcntl_fork function, we find out which thread we are currently in and perform the necessary actions.

    Is there a realistic way to implement the multi-threading model in PHP, is it for real or just to fake it. Some time ago it was suggested that you could force the operating system to load another instance of the PHP executable and handle other concurrent processes.

    The problem is that when PHP code finishes executing a PHP instance, it remains in memory because there is no way to kill it from PHP. So if you are simulating multiple threads, you can imagine what will happen. So I'm still looking for a multithreading method that can be efficiently or effectively simulated from PHP. Any ideas?

    Multithreading is possible in php

    Yes, you can do multithreading in PHP using pthreads

    From the PHP documentation:

    pthreads is an object-oriented API that provides all the tools needed for multithreading in PHP. PHP applications can create, read, write, execute, and synchronize with Threads, Workers, and Threaded.

    Warning. The pthreads extension cannot be used in a web server environment. Therefore, threads in PHP should remain in CLI-based applications.

    Simple test

    #!/usr/bin/phparg = $arg; ) public function run() ( if ($this->arg) ( $sleep = mt_rand(1, 10); printf("%s: %s -start -sleeps %d" . "\n", date(" g:i:sa"), $this->arg, $sleep); sleep($sleep); printf("%s: %s -finish" . "\n", date("g:i:sa" ), $this->arg); ) ) ) // Create an array $stack = array(); //Initiate Multiple Thread foreach (range("A", "D") as $i) ( $stack = new AsyncOperation($i); ) // Start The Threads foreach ($stack as $t) ( $t- >start(); ?>

    First race

    12:00:06pm: A -start -sleeps 5 12:00:06pm: B -start -sleeps 3 12:00:06pm: C -start -sleeps 10 12:00:06pm: D -start -sleeps 2 12: 00:08pm: D -finish 12:00:09pm: B -finish 12:00:11pm: A -finish 12:00:16pm: C -finish

    Second run

    12:01:36pm: A -start -sleeps 6 12:01:36pm: B -start -sleeps 1 12:01:36pm: C -start -sleeps 2 12:01:36pm: D -start -sleeps 1 12: 01:37pm: B -finish 12:01:37pm: D -finish 12:01:38pm: C -finish 12:01:42pm: A -finish

    Real world example

    Error_reporting(E_ALL); class AsyncWebRequest extends Thread ( public $url; public $data; public function __construct($url) ( $this->url = $url; ) public function run() ( if (($url = $this->url)) ( /* * If a large amount of data is being requested, you might want to * fsockopen and read using usleep in between reads */ $this->data = file_get_contents($url); ) else printf("Thread #%lu was not provided a URL\n", $this->getThreadId()); ) ) $t = microtime(true); $g = new AsyncWebRequest(sprintf("http://www.google.com/?q=%s", rand() * 10)); /* starting synchronization */ if ($g->start()) ( printf("Request took %f seconds to start ", microtime(true) - $t); while ($g->isRunning()) ( echo "."; usleep(100); if ($g->join()) ( printf(" and %f seconds to finish receiving %d bytes\n", microtime(true) - $t, strlen($g ->data)); else printf(" and %f seconds to finish, request failed\n", microtime(true) - $t);

    why don't you use popen?

    For ($i=0; $i<10; $i++) { // open ten processes for ($j=0; $j<10; $j++) { $pipe[$j] = popen("script2.php", "w"); } // wait for them to finish for ($j=0; $j<10; ++$j) { pclose($pipe[$j]); } }

    Threading is not available in stock PHP, but concurrent programming is possible using HTTP requests in the form of asynchronous calls.

    If the curl timeout setting is set to 1 and uses the same session_id for the processes you want to bind to each other, you can bind to session variables like in my example below. With this method, you can even close your browser and the concurrent process still exists on the server.

    Don't forget to check the correct session ID, for example:

    http://localhost/test/verifysession.php? sessionid = [valid id]

    startprocess.php

    $request = "http://localhost/test/process1.php?sessionid=".$_REQUEST["PHPSESSID"]; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $request); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_TIMEOUT, 1); curl_exec($ch); curl_close($ch); echo $_REQUEST["PHPSESSID"];

    process1.php

    set_time_limit(0); if ($_REQUEST["sessionid"]) session_id($_REQUEST["sessionid"]); function checkclose() ( global $_SESSION; if ($_SESSION["closesession"]) ( unset($_SESSION["closesession"]); die(); ) ) while(!$close) ( session_start(); $_SESSION ["test"] = rand(); session_write_close(); if ($_REQUEST["sessionid"]) session_id($_REQUEST["sessionid"]); function checkclose() ( global $_SESSION; if ($_SESSION["closesession"]) ( unset($_SESSION["closesession"]); die(); ) ) while(!$close) ( session_start(); $_SESSION ["test"] = rand(); session_write_close(); if ($_REQUEST["sessionid"]) session_id($_REQUEST["sessionid"]); function checkclose() ( global $_SESSION; if ($_SESSION["closesession"]) ( unset($_SESSION["closesession"]); die(); ) ) while(!$close) ( session_start(); $_SESSION ["test"] = rand(); session_write_close();

    verifysession.php

    if ($_REQUEST["sessionid"]) session_id($_REQUEST["sessionid"]); session_start(); var_dump($_SESSION);

    closeprocess.php

    if ($_REQUEST["sessionid"]) session_id($_REQUEST["sessionid"]); session_start(); $_SESSION["closesession"] = true; var_dump($_SESSION);

    While you can't thread, you do have a certain degree of control over the process in php. Here are two useful sets:

    Process control functions http://www.php.net/manual/en/ref.pcntl.php

    You can fork your process using pcntl_fork - return the PID of the child. You can then use posix_kill to use this PID.

    However, if you kill the parent process, a signal must be sent to the child process telling it to die. If php itself doesn't recognize this, you can register a function to control it and do a clean exit using pcntl_signal.

    You can simulate threads. PHP can launch background processes via popen (or proc_open). These processes can be communicated using stdin and stdout. Of course, these processes could be a php program. This is probably as close as you will get.

    You can use exec() to run a command line script (like php command line) and if you attach the output to a file, your script won't wait for the command to complete.

    I can't remember the php CLI syntax, but you need something like:

    Exec("/path/to/php -f "/path/to/file.php" | "/path/to/output.txt"");

    I think a few shared hosting servers have exec() disabled by default for security reasons, but it might be worth a try.

    Depending on what you're trying to do, you could also use curl_multi to achieve it.

    It supports bidirectional inter-thread communication and also has built-in protection for killing child threads (orphan prevention).

    You may have the option:

    1. multi_curl
    2. You can use the system command for the same
    3. The ideal scenario is to create a streaming function in C and compile/set up in PHP. This function will now be a PHP function.

    What about pcntl_fork?

    check out our man page for examples: PHP pcntl_fork

    pcntl_fork will not work in a web server environment if enabled safe mode. In this case it will only work in the PHP version of the CLI.

    The Thread class is available since PECL pthreads ≥ 2.0.0.