Skip to content

[Bug] docs: Fix search engine ranking of manual pages (SEO) #4579

@neteler

Description

@neteler

Describe the bug

The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.

Note: issue posted here since the core manual pages are affected (while the cronjobs are maintained in the addon repository).

How Python publishes its man pages

The scope of of a pull request in preparation is to partially adopt the Python manual pages concept which looks like this (checked Oct 23, 2024):

A Google search for "Python documentation" returns as the first hit the "3.13.0 Documentation" with the URL https://docs.python.org/. Clicking on this takes the user to https://docs.python.org/3/, which is identical to https://docs.python.org/3.13/.

This means that the same documentation is served at two URLs:

How can GRASS GIS publish its manual pages?

While the situation in the GRASS GIS project is a bit different, we can mimic the Python approach to some extent.

Current GRASS GIS version overview

label Ver
legacy 7.8
old 8.3
current stable 8.4
preview 8.5

I have started to locally implement modifications in the cronjobs to improve the terrible SEO situation and make more versions properly visible (ovedue for a long time).

Now, for a few days we have the following approach deployed on the server for testing purposes (cronjob PR coming soon):

Sitemaps:

  • These have also been updated, this is done by the cronjobs.

Observations:

  • SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues. All older and "devel" manual pages now point to "stable" as the canonical to avoid duplicate content.
  • Very old versions: The static 6.4 and 6.5 versions have recently been reactivated and the associated Apache redirects on grass.osgeo.org removed to reduce the current SEO problems. So they are accessible again, but will not be indexed by search engines due to the "canonical" pointers (see above). Note that e.g. older scientific publications point to these (now) old GRASS GIS manual pages.
  • Rolling stable manual pages: As I don't see any point in releasing the static released stable version (e.g. 8.4.0), instead the daily stable release branch is used to make manual improvements immediately available.
  • "Devel" vs "Preview": Historically, we have called the development version "grass-devel", which is called "preview" on the website. This should be streamlined.

Note that it may even take weeks for Google etc. to "learn" the improved structure. At time, I am feeding Google search tools and Bing webmasters tools with the appropriate updates every few days.

Additional context

TODO:

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmanualDocumentation related issues

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions