Google-like summaries with OTS at the backend

Created on 19 January 2016, over 9 years ago
Updated 29 July 2024, 9 months ago

Been experimenting with the idea of using OTS (Open Text Summaries https://packages.debian.org/jessie/libots0) to help build Google-like summaries for body text. Thought I'd share the results. $string1 in the very rough code below is the text from the Drupal.org "About" page, and $string2 is what OTS throws back as a summary for a 40% ratio. The function get_longest_common_subsequence() is from https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_c...

Output of the code below is

Build something amazing, for anyone

Drupal is content management software .... Drupal has great standard features, like easy content authoring, reliable performance, and excellent security .... Its tools help you build the versatile, structured content that dynamic web experiences need .... Modules expand Drupal's functionality .... Distributions are packaged Drupal bundles you can use as starter-kits. Mix and match these components to enhance Drupal's core abilities. Or, integrate Drupal with external services and other applications in your infrastructure. No other content management software is this powerful and scalable .... Drupal will always be free ...

I thought that was actually quite interesting.

$string1 = "The digital experiences you love. The organizations you trust most. The software they depend on.
Build something amazing, for anyone

Drupal is content management software. It's used to make many of the websites and applications you use every day. Drupal has great standard features, like easy content authoring, reliable performance, and excellent security. But what sets it apart is its flexibility; modularity is one of its core principles. Its tools help you build the versatile, structured content that dynamic web experiences need.

It's also a great choice for creating integrated digital frameworks. You can extend it with any one, or many, of thousands of add-ons. Modules expand Drupal's functionality. Themes let you customize your content's presentation. Distributions are packaged Drupal bundles you can use as starter-kits. Mix and match these components to enhance Drupal's core abilities. Or, integrate Drupal with external services and other applications in your infrastructure. No other content management software is this powerful and scalable.

The Drupal project is open source software. Anyone can download, use, work on, and share it with others. It's built on principles like collaboration, globalism, and innovation. It's distributed under the terms of the GNU General Public License (GPL). There are no licensing fees, ever. Drupal will always be free.";

$string2 = "Build something amazing, for anyone

Drupal is content management software. Drupal has great standard features, like easy content authoring, reliable performance, and excellent security. Its tools help you build the versatile, structured content that dynamic web experiences need. Modules expand Drupal's functionality. Distributions are packaged Drupal bundles you can use as starter-kits. Mix and match these components to enhance Drupal's core abilities. Or, integrate Drupal with external services and other applications in your infrastructure. No other content management software is this powerful and scalable. Drupal will always be free.";

$matches = array();

for ($i = 0; $i < strlen($string2); $i++) {
  $match = '';
  $match = get_longest_common_subsequence($string1, $string2);
  if (!$match) {
    break;
  }
  $string2 = str_replace($match, '', $string2);
  $match = preg_replace( "/^[^A-Za-z]+/", '', $match);
  $match = preg_replace('/[^a-z0-9]+\Z/i', '', $match);
  if ($match) {
    $matches[] = $match;
  }
}

foreach($matches as $item) {
  $ordered[strpos($string1, $item)] = $item;
}
ksort($ordered);

dpm(implode(' .... ', $ordered) . ' ...');
✨ Feature request
Status

Closed: outdated

Version

1.0

Component

Code

Created by

πŸ‡¬πŸ‡§United Kingdom lightsurge

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024