Best way to handle large result set (+400 000 rows)
-
03 Agustus 2012 13:33Hi,
I started to migrate our old components from the old php_mssql.dll driver to SQLSRV 3.
I've a lot of large result set (~400 000 rows output to CSV file). With the old driver I was using the batch_size parameter of mssql_query function to improve the query speed. This procedure takes 49s with the old driver.
With the new driver, this parameter is not available anymore. I tried both PDO and non-PDO versions and here are my conclusion. The faster way to get my 400 000 rows is to prepare, execute and fetch my rows as object (using the default stdClass). It takes approximately 52 second.
Am I right ? Is there any better solution to improve speed ? is there any hidden parameter to improve speed ? (buffer size or something else)- Diedit oleh Kevin Labecot 03 Agustus 2012 13:37
Semua Balasan
-
06 Agustus 2012 8:09
Hi,
I made a big mistake during my tests.
- Test with the old driver : Rows 484856 - 28 seconds
- Test with the new driver using PDO : Rows 484856 - 52 seconds
I don't understand why the new driver is soooo slow.
With the old driver I'm using mssql_query("my query",20000) and loop 484 856 times to get rows as array (mssql_fetch_array & mssql_fetch_batch).
With the new driver I'm using prepare & execute (using cursor PDO::CURSOR_FWDONLY) and loop 484 856 times using pdoStatement->fetch(\PDO::FETCH_ASSOC)
Summary of my tests with only 40 000 rows :
testSQLServer2::testOldDriver_mssql = Rows 40000 : 1 seconds
testSQLServer2::testNewDriver_nonPDO = Rows 40000 : 7 seconds
testSQLServer2::testNewDriver_PDO = Rows 40000 : 4 seconds
I decided to try with PDO/ODBC : testSQLServer2::testPDO_ODBC = Rows 40000 : <1 second
I'm surprised that the ODBC connection is a lot faster !! (<25s for my 484 856 rows). Is the SQLSRV driver definitely slower ?
Kevin
- Diedit oleh Kevin Labecot 06 Agustus 2012 11:42 Added PDO/ODBC test
-
06 Agustus 2012 9:32Moderator
Hi Kevin,
Thank you for your question.
I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated.
Thank you for your understanding and supportBest Regards,
Iric
Please remember to mark the replies as answers if they help and unmark them if they provide no help. -
06 Agustus 2012 18:20
Also, please note that the mssql driver is not a Microsoft driver, but rather a community-built driver. I will wait for Iric to help you investigate and see why you may be seeing slow performance.
Thanks,
Jonathan
This posting is provided 'AS IS' with no warranties, and confers no rights.
-
07 Agustus 2012 6:17Ok, thanks, I though it was an official driver... Maybe it may explain why it's faster though ODBC with a "fully microsoft driver".
Anyway, I'm waiting Iric's reply, thanks.
Kevin
- Diedit oleh Kevin Labecot 07 Agustus 2012 6:17
-
07 Agustus 2012 10:09
Please could you post your PHP code, including the SQL query?
Using prepare/execute is faster in *some* circumstances, but not if you are using prepare/execute for every single iteration of your test.
mssql_query is a straight through execution, and you should use sqlsrv_query for a comparison, not sqlsrv_prepare/sqlsrv_execute.
Rob
-
07 Agustus 2012 10:14
Hi,
SQLSRV code :
public static function testNewDriver_nonPDO() { $timeStart = time(); $i=0; $conn = sqlsrv_connect("192.168.0.36", array("UID"=>"TPESURPC2","PWD"=>"hidden","Database"=>"hidden","QuotedId"=>false)); $rs = sqlsrv_query($conn, "SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM myTable WHERE NOM LIKE 'something%')", array(), array( "Scrollable" => SQLSRV_CURSOR_FORWARD )); while($row = sqlsrv_fetch_array($rs,SQLSRV_FETCH_ASSOC)) { $i++; } $timeStop = time(); echo __METHOD__." = Rows $i : <b>" . ($timeStop - $timeStart) . " seconds</b><br>"; }result : testSQLServer2::testNewDriver_nonPDO = Rows 484856 : 119 seconds
Same code using mssql :
public static function testOldDriver_mssql() { $timeStart = time(); $conn = mssql_connect("192.168.0.36", "TPESURPC2", "hidden"); $rs = mssql_query("SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM myTable WHERE NOM LIKE 'something%')", $conn, 2000); $i = 0; if(mssql_num_rows($rs) == 0) { die("Aucun résultat"); } else { do { while($data = mssql_fetch_array($rs)) { $i++; } } while(mssql_fetch_batch($rs)); } $timeStop = time(); echo __METHOD__." = Rows $i : <b>" . ($timeStop - $timeStart) . " seconds</b><br>"; }Result : testSQLServer2::testOldDriver_mssql = Rows 484856 : 28 seconds
Kevin
-
07 Agustus 2012 11:28
Thanks for the data, it's an interesting result.
Could you try changing the sqlsrv test lines?
It is possible this change to sqlsrv_fetch_array will make it even slower, but it produces the same result as mssql_fetch_array().
Also, remove the final 2 parameters to sqlsrv_query - they are not needed, and it *should not* make a difference; but I would be interested to know:
$rs = sqlsrv_query($conn, "SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM myTable WHERE NOM LIKE 'something%')"); while($row = sqlsrv_fetch_array($rs)) // This syntax compatible with mssql_fetch_array() // ...EDIT: Sorry - I forgot to add, please could you modify the call to mssql_query, to make it compatible with the sqlsrv call, remove the batched rows:
$rs = mssql_query("SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM myTable WHERE NOM LIKE 'something%')", $conn);Your test highlights at least one of these, neither is good:
- The SQL Server Native Client (ODBC) driver is slower than the old named-pipe 'C' interface.
- The PHP driver is not as efficient as it should be.
Possible work around:
- There is a new scrollable cursor type which might speed things up when you need to iterate a large result set: SQLSRV_CURSOR_CLIENT_BUFFERED
- The equivalent in PDO would be: PDOStatement::fetchAll().
Possible other tests:
- Create an ODBC driver connection in Excel (or other ODBC client) and see how quickly the result is displayed/iterated.
- If it's fast, then the problem is almost certainly in the source code of the PHP driver.
- If it's slow then it's probably the SQL Server Native Client. - See how quickly the full result is displayed (or saved to a file) in SQL Server Management Studio.
- Look at netstat on your remote SQL Server and see if the ODBC driver is connecting with TCP/IP or named pipe. Try to make the ODBC driver and mssql connect using the same protocol, otherwise the tests cannot be veritable.
Rob
- Diedit oleh Robert Johnson 07 Agustus 2012 11:40 change to mssql_query
-
07 Agustus 2012 12:44
Thanks for your help.
New test of SQLSRV gives me same result (121 seconds).
I cannot remove the batched rows with MSSQL because it gives me a fatal error about the memory (Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 382464 bytes) in). Same things appears with fetchAll & SQLSRV_CURSOR_CLIENT_BUFFERED (Memory limit of 10240 KB exceeded for buffered query). I've too many data, do not forget I've +484 000 rows and a lot of columns to exports.
With Excel 2007 + ODBC, my data are loaded into the sheet in less than 30 seconds.
Both drivers are using a TCP/IP connection. With SSMS it takes 25 seconds to display to grid.
Kevin
- Diedit oleh Kevin Labecot 07 Agustus 2012 12:45 edit : test with SSMS
-
07 Agustus 2012 14:11
Thanks for the quick testing. Yours is a very good question that has not been covered in the forum.
There are 3 different ways to test sqlsrv against mssql based on your result.
Option 1 - make mssql fetch 1 row at a time:
To perform the test on a like-for-like basis this is the code you need for the mssql test, obviously I should have spotted that buffering the entire result set would overflow the memory:
$rs = mssql_query("SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM myTable WHERE NOM LIKE 'something%')", $conn, 1);Option 2 - increase the memory limit in PHP
You can increase the memory limit of PHP and the buffer used by the sqlsrv driver, and run the tests with the adjustments in my post above where all results for mssql and sqlsrv are batched - you might need to experiment with the memory limit you need:
- To increase PHP's buffer, set something like "memory_limit = 1024M" in php.ini
- To increase the buffer for SQLSRV_CURSOR_CLIENT_BUFFERED, set it like this: "sqlsrv_configure('ClientBufferMaxKBSize', 0);", which uses all available memory up to the PHP limit.
Option 3 - do some fake batching in your SQL Server query
Obviously it is not good to allocate a huge buffer for PHP. This is a work around for the sqlsrv driver to fit each 2000 records in the buffer. You do this by requesting 2000 rows at a time - something like this should work:
$tsql = <<<SQL with numbers as ( SELECT numéro, row_number() over (order by numéro) ROWNUM FROM myTable WHERE NOM LIKE 'something%' ) SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM numbers WHERE ROWNUM between ? and ?) SQL; $min = 0; $max = 2000; $rs = sqlsrv_prepare($conn, $tsql, array(&$min, &$max), array("Scrollable" => SQLSRV_CURSOR_CLIENT_BUFFERED)); while ( sqlsrv_execute($rs) ) { $row = sqlsrv_fetch_array($rs); if ( !$row ) break; do { ++$i; } while( $row = sqlsrv_fetch_array($rs) ); $min = $max+1; $max += 2000; }Finally:
Although the sqlsrv driver is not performing very well for your scenario, overall I think that the sqlsrv driver's approach is better for the following reasons:
- It is faster for the majority of web-based queries, where it is necessary for the first rows to be returned extremely quickly.
- Special batching PHP calls are not necessary, keeping your code more straightforward and consistent.
Rob
-
07 Agustus 2012 14:44
Well, you're right, our PHP proccesses are not representing classical usage of PHP. But I cannot choose anything else... The engineer that I am must find a solution to replace mssql library with an efficient and durable solution.
So I think, the best answer is to use ODBC instead of MSSQL and SQLSRV...
Kevin
-
07 Agustus 2012 15:40
If you find that using PHP_ODBC with the SQL Server Native Access Client is faster than SQLSRV, then we have a problem with the PHP driver. If it's the same speed, then it's a problem with the SNAC driver. Would be interesting to see what your results are when using the ODBC adapter...?
Thanks,
Jonathan
This posting is provided 'AS IS' with no warranties, and confers no rights.
-
07 Agustus 2012 17:44
I just ran a test of my own. Retrieving 733218 rows:
MSFT SQLSRV: 3.7 seconds (SLOWEST time out of 3 executions)
PHP ODBC: 53.3 seconds (FASTEST time out of 3 executions)
SSMS: 18 seconds (AVERAGE time taken to populate a grid)
MSSQL: N/A
Looking at the speed I achieved, either I have a much better computer than you, or your result is questionable.
Something else could be going on - possibly your query is generating PHP warnings/errors with each iteration - could you check your PHP error log to see if the delay is caused by excessive logging of error messages?
Rob
- Diedit oleh Robert Johnson 07 Agustus 2012 17:56 SSMS result added
-
08 Agustus 2012 6:12
test with PDO ODBC : testSQLServer2::testPdoODBC = Rows 484856 : 24 seconds
public static function testPdoODBC() { $timeStart = time(); $i=0; $conn = new \PDO( "odbc:Driver={SQL Server Native Client 11.0};Server=192.168.0.36;Database = avomark-data", "TPESURPC2", "hidden"); $conn->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION); $stmt = $conn->prepare("SELECT * FROM consommateurs WHERE [point de vente saisie] IN (SELECT numéro FROM myTable WHERE NOM LIKE 'something%')"); $stmt->execute(); while($row = $stmt->fetch(\PDO::FETCH_ASSOC)) { $i++; } $timeStop = time(); echo __METHOD__." = Rows $i : <b>" . ($timeStop - $timeStart) . " seconds</b><br>"; }Robert, it's very strange !! Just looked my logs, no warning, no error.
New try with error_reporting = E_ALL and still no warning, no error.
Kevin
-
08 Agustus 2012 6:32
New tests :
- With odbc_* functions : Very slow (driver SQL Server Native Client 11.0 + odbc_exec + odbc_fetch_array), +400 seconds !?! Large result set seems to be handled better with PDO ODBC. Anyway, it's not my problem.
- I moved my script to 192.168.0.36 (our developpement database server used in scripts above) and no more luck (+120s)
Can be related to database configuration ?
We're running SQL Server 2005 but our database is running into compatibility level 80. Robert, can you try again with this compatibility level ?
edit : I tested myself, I created a test base running compatibility level 90. It slow too :( (<90s)
Kevin
- Diedit oleh Kevin Labecot 08 Agustus 2012 7:51 a few seconds ago typos fixes + test on 2005
-
08 Agustus 2012 12:15
Thanks Kevin
I tried again on a busy SQL Server 2005 across our network on a slower connection, and the time increased from 3.7 seconds to 16 seconds. The time for the same remote 2005 server in SSMS stayed at 18 seconds to populate the whole grid, which was interesting.
My PHP timings are consistently quicker than SSMS (as expected, since PHP is incrementing a counter and SSMS is drawing the rows)
Could you run a test using a local SQL server, e.g. SQLEXPRESS? It will highlight whether your speed issues are caused by network speed. I'm not sure why your PDO version is performing so much better than the procedural API.
Do you run several tests before recording your results? e.g. do 5 tests, discard the slowest and fastest, then average, or take the best/worst from the 3 remaining times?
I don't think SQL Server configuration is the issue, because it would probably affect each client in the same way, and you are getting a good result with the PDO driver.
Rob
-
08 Agustus 2012 12:23
Hi Robert,
I can't install SQLEXPRESS on my computer but this morning I moved my script to our developement server to test with a local connection (connection string set to 127.0.0.1) : it's slow .
Yes, I ran several tests (up to 3). No big improvement.
edit : I did a test with our production server (more powerful). Same result... still slow : 109s (fatest time out of 3 executions).
Kevin
- Diedit oleh Kevin Labecot 08 Agustus 2012 12:33 Added test to production server
-
08 Agustus 2012 15:11
Hi Kevin
Sorry, I have to apologise. I managed to reproduce your problem - I made a mistake in my tests.
The delay is caused by the driver, when it creates the PHP row array.
In my tests above I called sqlsrv_fetch() instead of sqlsrv_fetch_array(). When I changed it to call sqlsrv_fetch_array, my best time increases from 3.7 seconds to 106 seconds, which is similar to your result.
Rob
- Ditandai sebagai Jawaban oleh Jonathan GuerinMicrosoft Employee 09 Agustus 2012 18:57
-
08 Agustus 2012 17:51
Thanks guys! Kevin, does using fetch increase your throughput? I'll report this to our dev team to have a look at.
Cheers,
Jonathan
This posting is provided 'AS IS' with no warranties, and confers no rights.
-
09 Agustus 2012 6:04
Ahhh, good ! I'm not alone anymore ;)
Note that fetch_object is slow as fetch_array. Only fetch is fast but it doesn't help me. I need a fetch_array or fetch_object to have backwards compatibilities with our softwares.
So, what is the next step ? Is the dev team fast to release a fixed version ?
Kevin
-
09 Agustus 2012 9:54
I don't want to speak out of turn, but I doubt anything will be done soon. There's a huge churn of code behind both sqlsrv_fetch_array() and sqlsrv_get_field(), which can only be improved by redesign. The same code is used behind sqlsrv_fetch_object, plus some extra, so that won't help - as you have discovered.
I believe that, with a different approach, it is possible to achieve the same speed as the old driver, Excel and SSMS.
Rob
-
09 Agustus 2012 17:58
Hi guys,
Unfortunately, the dev and test team are swamped right now. I have let them know what is going on, and they will investigate as soon as they can. If you are able, it would greatly help us if you could actually debug the driver and show us the code where you believe we are doing the wrong thing. With a repro, and a pointer to the code that is causing the problems, it would greatly help us in fixing this bug. :)
Thanks,
Jonathan
This posting is provided 'AS IS' with no warranties, and confers no rights.
-
09 Agustus 2012 18:34
Thanks for the reply Jonathan, I will do that over the weekend. It isn't a bug, but you're doing work in the driver that SNAC can do for you, and repeating it for every column in every row. It matters in Kevin's scenario - normally it only costs fractions of milliseconds, but for many rows with many columns, it adds up.
Rob
-
09 Agustus 2012 18:56
Thanks Rob, I greatly appreciate all the help and support you bring to this forum. :)
Cheers,
Jonathan
This posting is provided 'AS IS' with no warranties, and confers no rights.
-
10 Agustus 2012 6:09
Thanks guys for your help, I appreciate.
Is there a way to track SQLSRV updates ? A rss feed ? A mailing list ?
Kevin
-
10 Agustus 2012 17:18
We have the repository: http://sqlsrvphp.codeplex.com/
We also have a blog: http://blogs.msdn.com/b/sqlphp/
You can also follow me on Twitter: @kop48
Cheers,
Jonathan
This posting is provided 'AS IS' with no warranties, and confers no rights.
-
08 Desember 2012 9:02
Hi,
After reading the responses from this forum, i did my own testing using sqlsrv_fetch_array() and it is slow. I'm pretty new to MS SQL.
Currently I'm designing a website that hold 20,000 tests in the background and have been asked to use MS SQL. Whenever I fetch data using the above function it is always slow.
Is there a solution for this issue? Also I have been told using JSON it will be easier. I do not understand what it does and how that would help.
Can you please provide me with some solutions or if not some pointers?
Any help would be really appreciated.
Thanks!
-
10 Desember 2012 7:09
Hi,
You can use PDO+ODBC for a faster way.
Kevin