- 🇳🇱Netherlands theemstra
There are some limitations to box/sprout that would remove some functionality from the module that some people might be using, including "no automatic column widths, no column number/string formatting)" as per the first link you provided.
We could provide the option to choose one of the engines, with the tradeoffs of each library. - 🇨🇦Canada kiwad
Not sure box/spout is still maintained, project is archived on github
Maybe fastexcelwriter ?
https://packagist.org/packages/avadim/fast-excel-writer - heddn Nicaragua
phpoffice/phpspreadsheet
now has a 2.x release. Is it any faster? - Status changed to Closed: outdated
5 months ago 12:23pm 4 September 2024 - 🇳🇿New Zealand ericgsmith
Re opening for clarification - is there a reason why this is closed as outdated vs closed as won't fix?
I've done a quick test locally and it looks like v2 phpspreadsheet is still considerably slower than the fast-excel-writer mentioned in #5
I did a quick test with about 6000 rows of data and phpspreadsheet is taking 17 seconds to write the file, where as fast-excel-writer takes less than a second. In my test, the file produced from fast-excel-writer is double the size of the one produced by phpspreadsheet - so I assume there is some additional optimizing by phpspreadsheet that contributes to the additional time. Still - I think that result warrants further clarification.
My quick test results for anybody interested - I used our existing csv export which is around 6000 lines.
test-speed.php:
<?php include 'vendor/autoload.php'; $csvFile = fopen('examplefile.csv', 'r'); $spreadsheet = new PhpOffice\PhpSpreadsheet\Spreadsheet(); $spreadsheet->setActiveSheetIndex(0); $sheet = $spreadsheet->getActiveSheet(); $rowIndex = 1; while (($row = fgetcsv($csvFile)) !== FALSE) { $columnIndex = 1; foreach ($row as $value) { $sheet->setCellValue([$columnIndex, $rowIndex], $value); $columnIndex++; } $rowIndex++; } fclose($csvFile); $writer = PhpOffice\PhpSpreadsheet\IOFactory::createWriter($spreadsheet, 'Xlsx'); $writer->save('test-speed.xlsx');
Run twice:
time php scripts/test-speed.php real 0m17.630s user 0m17.584s sys 0m0.044s time php scripts/test-speed.php real 0m17.392s user 0m17.332s sys 0m0.056s
Same data with the alternative library:
<?php include 'vendor/autoload.php'; $csvFile = fopen('examplefile.csv', 'r'); $spreadsheet = avadim\FastExcelWriter\Excel::create(['Sheet1'], ['auto_convert_number' => true]); $sheet = $spreadsheet->sheet(); while (($row = fgetcsv($csvFile)) !== FALSE) { $sheet->writeRow($row); } fclose($csvFile); $spreadsheet->save('test-speed2.xlsx');
Huge improvement:
time php scripts/test-speed2.php real 0m0.893s user 0m0.863s sys 0m0.028s time php scripts/test-speed2.php real 0m0.920s user 0m0.891s sys 0m0.028s
- 🇳🇿New Zealand ericgsmith
One thing I missed is there is a popular and maintained fork of the originally mentioned Box/spout - https://github.com/openspout/openspout
While both are maintained openspout is more widely used.
I've tested this as well and its also looking good:
<?php include 'vendor/autoload.php'; $csvFile = fopen('examplefile.csv', 'r'); $writer = new \OpenSpout\Writer\XLSX\Writer(); $writer->openToFile('test-speed3.xlsx'); while (($row = fgetcsv($csvFile)) !== FALSE) { $cells = \OpenSpout\Common\Entity\Row::fromValues($row); $writer->addRow($cells); } fclose($csvFile); $writer->close(); print memory_get_peak_usage();
And the output:
time php scripts/test-speed3.php 1269864 real 0m1.283s user 0m1.233s sys 0m0.036s
What I have not done is a feature comparison of some of the views style plugin options.
Both openspout and fast-excel-writer treat reading and writing as separate things - where as with php spreadsheet you can open and write to the same spreadsheet. So if things like batch processing are concerned either option would need to do something similar to what views_data_export is doing here - https://git.drupalcode.org/project/views_data_export/-/blob/8.x-1.x/src/...
Open spout has docs for this - https://github.com/openspout/openspout/blob/4.x/docs/guides/1-add-data-e... where as for the fast read / write libraries it wasn't so clear from a first look how to ensure all the right info gets copied over.
I'm not sure how relevant that is - for us we are using this module to provide the serializer and views_data_export for the views plugin as batch processing was needed - I haven't looked into how / if batches are handed with this modules views plugins.
- 🇳🇿New Zealand ericgsmith
Ok, final bit of research.
I updated my test scripts to load the existing file and then loop the CSV again to add the rows again.
PHP spreadsheet:
time php scripts/test-speed.php 147031000 real 1m5.979s user 1m5.735s sys 0m0.188s
Openspout
time php scripts/test-speed3.php 1271096 real 0m7.810s user 0m7.716s sys 0m0.081s
This definitely enough for me to take this back to the client to try get more hours to work on this. My current thinking would be as a sub module that can override the
xls_serialization.encoder.xlsx
service - with the caveat that the full functionality may not be available with the open spout library. My use case does not currently require these features as we use the serializer with views data export. - Merge request !20Draft: #3276294 Work in progress - support openspout/openspout library → (Closed) created by ericgsmith
- 🇳🇿New Zealand ericgsmith
I have added a draft MR with a proof of concept for this libary.
If using with views_data_export I have added a related issue and MR to 📌 [PP1] Considering switching to another library to batch XLSX files because of performance on large files Postponed
The approaches are a bit rough - the main aim here is for me to verify that this library is going to be performant and applicable for use on an existing project where phpspreadsheet is problematic.
Early feedback always welcome for any interested parties - not setting for review yet as I need to do some more testing, and there are todo's related to the investigating functionality gaps between the 2 libraries - its possible some of these options are supported I just haven't investigated yet.
- 🇨🇦Canada kiwad
Early feedback :
Tested an XLSX export that wasn't working in phpspreadsheet due to lack of memory
1. Applied MR https://git.drupalcode.org/project/xls_serialization/-/merge_requests/20...
2. composer require openspout/openspout
3. Switched configThe export now works
- 🇳🇿New Zealand ericgsmith
Thanks for taking the time to test @kiwad - I appreciate it.
I've also been doing some additional testing and am keen to tidy up this MR into something that could make it into this module.
I'm not so keen on the current implementation of extending the encoder and replacing it with the factory - while that worked there only some commonality between the 2 and I feel it doesn't make it easy to track feature parity. Long term maintenance wide it also means the extending class is fragile and subject to breaking if the xls class changes / introduces new features.
One thought I had was refactoring the implementation to use an adapter pattern - e.g if we had:
interface SpreadsheetAdapterInterface { public function addRow(array $row); public function setColumnsAutoSize(); public function setRowsAutoHeight(); public function setMetaData(array $metadata); }
We could do something like:
try { $adapter = \Drupal::service('xls_serialization.adapter_factory')->get($this->xlsFormat); // Set headers. $this->setHeaders($adapter, $data, $context); // Set the data. $this->setData($adapter, $data); // Set the width of every column with data in it to AutoSize. if (empty($config->get('xls_serialization_autosize'))) { $adapter->setColumnsAutoSize(); } // Set rows to auto-height. $adapter->setRowsAutoHeight();
etc in the Xls class and split out the phpspreadsheet logic and openspreadsheet logic to their own files with using a shared adapter interface.
One of the challenging this for this view config options for applying styling, while possible to add styles in openspout it needs to be done at the time the row is added to the spreadsheet - not after as is currently done for the phpspreadsheet implementation.This would leave things like formatValue, validateWorksheetTitle, extractHeaders in the XLS class and library specific calls in the adapters.
This would be a bigger refactoring of the module - so keen to get feedback on this.
- 🇳🇿New Zealand ericgsmith
I have pushed a work in progress on the idea I had in #15 to try split the library specific code out - its a bit rough and not tested yet, end of day here so will pick back up next week. Assigning to me since I did not finish.
- 🇳🇿New Zealand ericgsmith
The encode unit test is passing with open spout - but I'm having more doubts around how best to structure this - I'm reversing from my previous thought that the adapter pattern is cleanest, as there is so little commonality in terms of supported features and the way the libraries implement those features (e.g we could support styling in openspout but we would have to set the styles at the time we write the row rather than at the end).
I also don't know if its worth trying to chase all the features - if people need the full features they probably doing fine with the old library, maybe reverting back to a completely separate service would be easier since the openspout code will be super minimal.
I think that makes sense. Then the old service and library could be deprecated, right?
- Merge request !21Issue #3276294: Add xls_serialization_open_spout sub module → (Closed) created by ericgsmith
- 🇳🇿New Zealand ericgsmith
Closed initial MR attempt - opened a new MR with a much more simpler approach to just override the service in a sub module.
I think doing less initially is better to get this committed - this will give users a clear option to go for performance instead of additional features.
Some features the phpspreadsheet option provides are technically possible with openspout, but I think it would make sense to do these as follow ups if people need them. In my case, we don't use any of them so have no issue going for the simpler library. Not all of them are supported and so I don't think chasing feature parity with the slower library is worth the effort.
I also added a hook requirements check - I think its fair that this module only recommends openspout and does not add it as a full dependency.
Ready for a proper review now.
- 🇳🇿New Zealand ericgsmith
MR is now testing on Drupal 11 by default - which this module does not have a release for - so ran manual pipeline against Drupal 10 - https://git.drupalcode.org/issue/xls_serialization-3276294/-/pipelines/3...
- 🇳🇿New Zealand RoSk0 Wellington
Looks really good! Thanks Eric!
There are couple code style issues that could be fixed to make it perfect.
- Status changed to RTBC
3 months ago 9:04am 5 November 2024 - 🇳🇿New Zealand ericgsmith
Doesn't rebase cleanly against latest changes in 8.x-1.x branch - will take a look soon
- Merge request !33Issue #3276294 by ericgsmith: Add submodule for using OpenSpout → (Merged) created by ericgsmith
- 🇳🇿New Zealand ericgsmith
Rebased against 2.0.x - minor changes made to constructor to get the commit to apply cleanly.
Hiding old MR and branch against 8.x-1.x
I note there is not 2.x or 2.1.x branch - as this is a new feature it would need to go into a 2.1.x release at this point no branch existis for this.
- 🇫🇷France mably
Hi @ericgsmith, I just created the 2.1.x branch, you can update your MR.
- Status changed to Needs review
10 days ago 9:52pm 12 January 2025 - 🇳🇿New Zealand ericgsmith
Thanks @mably - I've rebased the MR and updated target to 2.1.x
-
mably →
committed 0ab1a234 on 2.1.x authored by
ericgsmith →
Issue #3276294 by ericgsmith: Add submodule for using OpenSpout
-
mably →
committed 0ab1a234 on 2.1.x authored by
ericgsmith →