-
-
Notifications
You must be signed in to change notification settings - Fork 416
Description
Describe the bug
The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.
Note: issue posted here since the core manual pages are affected (while the cronjobs are maintained in the addon repository).
How Python publishes its man pages
The scope of of a pull request in preparation is to partially adopt the Python manual pages concept which looks like this (checked Oct 23, 2024):
- Python 3.14 (in development): https://docs.python.org/3.14/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - Python 3.13 (stable): https://docs.python.org/3.13/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - Python 3.12 (stable): https://docs.python.org/3.12/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - Python 3.11 (security-fixes): https://docs.python.org/3.11/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - Python 3.10 (security-fixes): https://docs.python.org/3.10/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - Python 3.9 (security-fixes): https://docs.python.org/3.9/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - Python 3.8 (EOL): https://docs.python.org/3.8/ ->
<link rel="canonical" href="https://docs.python.org/3/index.html" /> - ...
A Google search for "Python documentation" returns as the first hit the "3.13.0 Documentation" with the URL https://docs.python.org/. Clicking on this takes the user to https://docs.python.org/3/, which is identical to https://docs.python.org/3.13/.
This means that the same documentation is served at two URLs:
- https://docs.python.org/3/ - the canonical URL
- https://docs.python.org/3.13/ - the current stable one
How can GRASS GIS publish its manual pages?
While the situation in the GRASS GIS project is a bit different, we can mimic the Python approach to some extent.
Current GRASS GIS version overview
| label | Ver |
|---|---|
| legacy | 7.8 |
| old | 8.3 |
| current stable | 8.4 |
| preview | 8.5 |
I have started to locally implement modifications in the cronjobs to improve the terrible SEO situation and make more versions properly visible (ovedue for a long time).
Now, for a few days we have the following approach deployed on the server for testing purposes (cronjob PR coming soon):
- https://grass.osgeo.org/grass-devel/manuals/ - now copied in modified cronjob from 8.5.x (i.e.,
mainbranch) with<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html"> - https://grass.osgeo.org/grass-stable/manuals/ - now copied in modified cronjob from 8.4.x (i.e.,
releasebranch_8_4branch) - this is the overall main manual - https://grass.osgeo.org/grass85/manuals/ - current unreleased development (generated by cronjob, then copied to grass-devel) - with
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html"> - https://grass.osgeo.org/grass84/manuals/ - current stable release branch (generated by cronjob, then copied to grass-stable) - with
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html"> - https://grass.osgeo.org/grass78/manuals/ - with
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">and red box pointing to to grass-stable (generated by cronjob with box URL and canonical version defined) - https://grass.osgeo.org/grass65/manuals/ - with
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">and red box pointing to to grass-stable (old static pages) - https://grass.osgeo.org/grass64/manuals/ - with
<link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">and red box pointing to to grass-stable (old static pages) - ... likewise other old manual versions.
Sitemaps:
- These have also been updated, this is done by the cronjobs.
Observations:
- SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues. All older and "devel" manual pages now point to "stable" as the canonical to avoid duplicate content.
- Very old versions: The static 6.4 and 6.5 versions have recently been reactivated and the associated Apache redirects on grass.osgeo.org removed to reduce the current SEO problems. So they are accessible again, but will not be indexed by search engines due to the "canonical" pointers (see above). Note that e.g. older scientific publications point to these (now) old GRASS GIS manual pages.
- Rolling stable manual pages: As I don't see any point in releasing the static released stable version (e.g. 8.4.0), instead the daily stable release branch is used to make manual improvements immediately available.
- "Devel" vs "Preview": Historically, we have called the development version "grass-devel", which is called "preview" on the website. This should be streamlined.
Note that it may even take weeks for Google etc. to "learn" the improved structure. At time, I am feeding Google search tools and Bing webmasters tools with the appropriate updates every few days.
Additional context
TODO:
- cronjobs: expand script to run stand-alone grass-addons#1215 needs to be merged first to disentangle the tasks
- create new pull request with aforementioned changes (TODO MN)
- monitor Google search tools and Bing webmasters tools about the changes with special focus on "Page indexing > Alternate page with proper canonical tag" (TODO MN)
- Sitemap update: https://grass.osgeo.org/sitemap.xml to grass-stable/grass-devel: sitemap.xml: switch from versioned manual URLs to stable/devel grass-website#482