- πΊπΈUnited States mkindred
I was hoping to find a way to override the host for docker containers, so I tried the patches here. The latest #22 no longer applies cleanly, so I reworked it so that it does.
But I'm a little concerned about a few things with the patch.
- Seems like a patch should solve one problem, but this issue is a combination of several unrelated problems:
- environment host override
- check button
- exclusions (not in the current patch)
- and content published status
- I thought the logic used for the overridehosturl was unclear: use overridehosturl only if (host is set and host== basepath)... why not just override if overrideurl is set? (I might be missing something here that's not applicable in my environment.)
But I've always been a bit confused by linkchecker's config: seems like base 'path' includes hostname, and that has always perplexed me. I think the config descriptions could use some updating.
So #23 works for me locally, but I don't think I'll apply it for production. Perhaps the maintainers can chime in... the hostname override is a very good thing.
- Seems like a patch should solve one problem, but this issue is a combination of several unrelated problems:
- π¨π¦Canada jamesyao
Since the client requested "searching publishing contents only" functionality, the linkchecker_index and linkcker_link tables need to be refreshed when the Cron job is executed.
I updated the logic to clean all records in the two tables if the searching publishing contents only is requested. - π©πͺGermany gngn
I think the handling of 'scanning published contents' in LinkExtractorBatch::processEntities() starting in #17 assumes that the scanned entity is a node - but it could be any entity type (e.g. paragraph).
I tried to illustrate the problem with "COMMENT":
$entityTypes = $this->getEntityTypesToProcess(); [...] // COMMENT: naming this $publishedNodesOnly - setting name use "content" $publishedNodesOnly = $this->linkcheckerSetting->get('search_published_contents_only'); // COMMENT:looping through different entity types foreach ($entityTypes as $entityTypeData) { [...] // COMMENT: storage for current entity type $storage = $this->entityTypeManager->getStorage($entityType->id()); foreach ($ids as $id) { // COMMENT: $id is ID of current entity type (and we're loading it) $entity = $storage->load($id); if ($entity instanceof FieldableEntityInterface) { // COMMENT: $entity is a FieldableEntity if ($publishedNodesOnly) { // COMMENT: $entity->id() is the same as $id (since we loaded $entity with $id) ... $nid = $entity->id(); // COMMENT: ... but now we call it node-ID - and try to load this node! // That is only correct if current entity type really is node - but it could be anything. $node = $this->entityTypeManager->getStorage('node')->load($nid); [...] } else { // Handle $entity like before this patch [..] } } [..] } [..] }
Or am I getting something wrong?
- Assigned to joseph.olstad
- π¨π¦Canada joseph.olstad
I'm going to be doing a fairly serious cycle for a refresher/review on this in the coming days/weeks.
- last update
9 months ago 86 pass - Status changed to Active
9 months ago 4:21pm 8 March 2024 - π¨π¦Canada joseph.olstad
patch 27 didn't work out, working on a new one.
- π¨π¦Canada joseph.olstad
New patch, reduced the scope to docker and reverse proxy support, removed the exclusion of published content as there's another patch for this that is available in another issue:
β¨ Improve performance of LinkExtractorBatch & allow to skip unpublished content Needs review - Status changed to Needs review
9 months ago 4:49pm 8 March 2024 - last update
9 months ago 86 pass - Issue was unassigned.
- π¨π¦Canada joelpittet Vancouver
Can the scope just be "Provide 'Check links' button for running link checker"?
You could solve the other part with pathologic β or something like that?
In the latest patch @joseph.olstad there is a couple lines doing
isset()
followed by!empty()
but!empty()
does an impliedisset()
. Further the variable exists in all cases so you can just doif (!$var)
So
// for non-standard environment (docker & ubuntu) on the security issue if ((isset($uri['host']) && $uri['host'] == $basepath) && (isset($overridehosturl) && !empty($overridehosturl))) { if (isset($overridehostport) && !empty($overridehostport)) { $replacement = $overridehosturl . ':' . $overridehostport; } else { $replacement = $overridehosturl; } $host_path = $uri['scheme'] . '://' . $uri['host']; $url = str_replace($host_path, $replacement, $url); }
becomes
// For non-standard environment (docker & ubuntu) on the security issue. if ((isset($uri['host']) && $uri['host'] == $basepath) && $overridehosturl) { $replacement = $overridehostport ? $overridehosturl . ':' . $overridehostport : $overridehosturl; $host_path = $uri['scheme'] . '://' . $uri['host']; $url = str_replace($host_path, $replacement, $url); }
Those variable names should be snake case too...
$override_host_port
My preference would be to split the task up, easier for one to get accepted before the other.
- Status changed to Needs work
7 months ago 11:39pm 18 April 2024 - π¨π¦Canada joseph.olstad
I would suggest to remove the "provide check button" as it's already provided in another issue/patch.
This issue should focus on "Optional host and port substitution to check links behind a reverse proxy and/or docker environment"
I'd prefer to keep this optional functionality inside of linkchecker not requiring some third party complex module like pathologic which I doubt will even work in this case.