Log failed migration and provide an option to skip failed row from UI.

Created on 20 August 2024, 4 months ago
Updated 21 August 2024, 4 months ago

Some rows in a migration can be problematic.

Here is an example of a failed row in a migration.

Newspaper3k Failed to get (1) URL https://news.google.com/rss/articles/CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ?oc=5 "The command "'python3' '../python/google_cloudscraper_new/ArticleScraping.py' 'https://news.google.com/rss/articles/CBMipAFBVV95cUxNTENNa3EtZ182WEJURjRpZmJXelc3YklsLXFIWEludWlqNVBYZFBUbF81UW0xRDNwVW1WYVJGNDFmV3NTSWlBeXgwcGJCcExRc1EwUHJYZTlTV0x6M2lOQjROa21PYjNKTlhQbnZlNkZKd3FJRG5LYnFaSEZYRFhsc3l2bHZwbk9UZFpqb1BkY3kxeDZnSXd1b0lkMkJoM2NWd1UzWQ?oc=5'" failed. Exit Code: 1(General error) Working directory: /var/www/vhosts/mydomain/httpdocs/web Output: ================ Error Output: ================ Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request self._validate_conn(conn) File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn conn.connect() File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 653, in connect sock_and_verified = _ssl_wrap_socket_and_match_hostname( File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 806, in _ssl_wrap_socket_and_match_hostname ssl_sock = ssl_wrap_socket( File "/usr/local/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 465, in ssl_wrap_socket ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname) File "/usr/local/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 509, in _ssl_wrap_socket_impl return ssl_context.wrap_socket(sock, server_hostname=server_hostname) File "/usr/local/lib/python3.9/site-packages/cloudscraper/__init__.py", line 100, in wrap_socket return self.ssl_context.orig_wrap_socket(*args, **kwargs) File "/usr/local/lib/python3.9/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/usr/local/lib/python3.9/ssl.py", line 1040, in _create self.do_handshake() File "/usr/local/lib/python3.9/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1123) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request( File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request raise new_e urllib3.exceptions.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1123) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 847, in urlopen retries = retries.increment( File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.londontheatre1.com', port=443): Max retries exceeded with url: /reviews/roots-overseas-and-is-any-body-home-at-streatham-space-project/ (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1123)'))) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/www/vhosts/mydomain/httpdocs/web/../python/google_cloudscraper_new/ArticleScraping.py", line 64, in <module> scraped = scraper.get(newurl).text File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 602, in get return self.request("GET", url, **kwargs) File "/usr/local/lib/python3.9/site-packages/cloudscraper/__init__.py", line 259, in request self.perform_request(method, url, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/cloudscraper/__init__.py", line 192, in perform_request return super(CloudScraper, self).request(method, url, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 517, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='www.londontheatre1.com', port=443): Max retries exceeded with url: /reviews/roots-overseas-and-is-any-body-home-at-streatham-space-project/ (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1123)'))) ". 0

In this example the migration imported row is failing due to SSL: SSLV3_ALERT_HANDSHAKE_FAILURE

Now, in this case the item can remain in a feed for a long time (several months). I believe a single failing row could cause subsequent rows to fail. Even if subsequent rows are not skipped, this row will continue to fail.

I would like the ability to skip any failing rows as part of the migration.

In this case, the migration does exist in the migration map table. It has the following characteristics:

1. destination id is null
2. source row status is set to 3 (Indicates that the import of the row failed. const STATUS_FAILED = 3;
3, There is no record of this in the message table in this case. migrate_message_rss_google_node
4. When this fails, message is sent to global notices and can be viewed when enabling dblog

I am curious as to why this does not exist is migrate message table with some form of status, especially if this is in migrate_map_table although, as mentioned this is output to dblog if enabled I am aware that this is output to drupal Perhaps it would make sense to log failed rows in message table as well?

Feature request
Status

Active

Version

6.0

Component

Code

Created by

🇬🇧United Kingdom 2dareis2do

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @2dareis2do
  • 🇬🇧United Kingdom 2dareis2do

    Ok I can see that the status of the migration in migrate map table seems to come from the core migrate module e.g.

    interface MigrateIdMapInterface extends \Iterator, PluginInspectionInterface {
    
      /**
       * Indicates that the import of the row was successful.
       */
      const STATUS_IMPORTED = 0;
    
      /**
       * Indicates that the row needs to be updated.
       */
      const STATUS_NEEDS_UPDATE = 1;
    
      /**
       * Indicates that the import of the row was ignored.
       */
      const STATUS_IGNORED = 2;
    
      /**
       * Indicates that the import of the row failed.
       */
      const STATUS_FAILED = 3;

    I am thinking it might be good to add another status e.g. STATUS_IGNORE

    Also in the migrate_map* table we have level column. Possible options here are

      /**
       * Migration error.
       */
      const MESSAGE_ERROR = 1;
    
      /**
       * Migration warning.
       */
      const MESSAGE_WARNING = 2;
    
      /**
       * Migration notice.
       */
      const MESSAGE_NOTICE = 3;
    
      /**
       * Migration info.
       */
      const MESSAGE_INFORMATIONAL = 4;

    On the migrate message table we have the following 'levels' that seem to pertain to message

      /**
       * Migration error.
       */
      const MESSAGE_ERROR = 1;
    
      /**
       * Migration warning.
       */
      const MESSAGE_WARNING = 2;
    
      /**
       * Migration notice.
       */
      const MESSAGE_NOTICE = 3;
    
      /**
       * Migration info.
       */
      const MESSAGE_INFORMATIONAL = 4;

    I would expect a failed migration to be imported with MESSAGE_ERROR = 1 flag.

  • 🇬🇧United Kingdom 2dareis2do

    Thinking about this it seems a failed row with no destination id is omitted from the message table.

    My guess is that migrate update will not attempt to import failed rows. Perhaps this makes sense to keep like this.

    However a failed import will be logged in migrate_map table.

    My guess is there is nothing to prevent this failing again on subsequent imports? (need to check how this works)

    As migration message table has no record of rows with no destination id, It is not possible to manage this from the existing view, unless we add failed migrations (possible but possibly with ramifications when updating - need to check how this actually works).

    So one option could be add another view that lists migration map status for all rows in the migration. This could simply add another status that could be modified from the UI to flag a row to ignore subsequent imports.

    May also need to update code to skip any migrations with this status.

    Some of this code is in core, while other is in contrib (migrate tools). This complicates slightly how this change could be made. My feeling is that it might make sense to add an additional status in core as that is where the other status constants are defined.

  • 🇬🇧United Kingdom 2dareis2do

    Maybe part of the issue I am experiencing may be related to m use of $this->logger->notice using MigrateException to handle exceptions. Will try switching from regular notice to see how this affects the migration.

    I am thinking this makes more sense especially where entry exists in migrate map table.

    This may also improve the reliability of the migration as well.

Production build 0.71.5 2024