Problem/Motivation
On this thread I describe the second issue mentioned in https://drupal.org/node/2103199
2. When attempting to use this module in very large sites, the "steps" variables make virtually impossible to sync the results from the google_analytics_counter table to the node_counter (statistics module) table.
The reason is that the "steps" variables are increased sequentially by a "chunk" amount each time the cron runs, and those same "steps" are used later on as NID identifiers, to limit the range in the query that selects the nodes to be updated in node_counter:
google_analytics_counter_data.inc, line 388:
$pointer = $step*$chunk;
//dpm('START chunk '.$chunk);
//dpm('START step '.$step);
//dpm('START pointer '.$pointer);
$dbresults = db_select('node', 'n')
//->fields('n', array('nid','language'))
->fields('n', array('nid'))
//->condition('pagepath', 'node/%', 'LIKE')
->range($pointer, $chunk)
->execute();
If the first 1000 records retrieved from GA correspond to nodes 300000-301000 on our site, it would take 300 cron runs just to update their information in node_counter, and by that moment the cache would have probably been erased anyway.
Proposed resolution
The entire google_analytics_counter_update_node_counter() function should be rewritten to stop depending on the node table, but instead detect new entries on the google_analytics_counter table. For example, by adding a "last_updated" timestamp for each path and checking only the entries in google_analytics_counter updated after the last cron execution, without even querying the node table.