The time has come for the annual update of the AnarchistFederation.net platform! Here is what the team has been working on for the 2023 update. This update focuses mainly on promoting the original sources of the articles. We want this platform to help other websites as much as possible by inciting readers to visit and subscribe to the original sources.
The following updates were implanted on each of the 10 language-specific sites of the platform.
1) SOURCES DESCRIPTIONS AND AVATARS
We added a new block under each post that provides information about the source, with an avatar and a short description.
To do this, we created new scraper bots that regularly check the source’s website to import the details. The source description is scraped from the “description” meta-tag in the header of your website. If you manage a site being republished on our platform, you can change the description of your site by modifying the description meta-tag. If you use WordPress, you can change your site’s description by editing the tagline on the General Settings page.
The avatar is scraped from the favicon of the source’s website. With WordPress, this image can be modified by changing your favicon in your theme’s appearance settings (Site Identify). If the avatar is not found, we try to find a replacement image for known organizations and such.
2) SOCIAL MEDIA INTEGRATIONS FOR SOURCES WEBSITES
Most of the sources now have links to their social media platforms. Under each article, readers can now easily find the links to follow the original sources. We currently support Twitter, Facebook, Instagram, and Youtube.
The social media links are scraped from the source’s homepage. To make sure our bots found the correct URLs, we scrape 3 other pages from the site and check if we can find the same URLs. So for example, if we find the same Twitter profile link on at least 4 different pages, we assume this is the official URL. This is not perfect and there could be some errors, but our tests have shown a very high success rate.
3) REDESIGNED SOURCES LIST
The sources list now displays the avatar of the sources along with the links to the site and social media platforms. The responsive layout has been reworked for better display on mobile devices.
4) AUTO-REPAIR BROKEN RSS FEED
If the RSS feed returns a 404 error, our scraper bots will now try to find the new URL of the RSS feed in the code of your homepage. If a new URL is found it will automatically replace the old one.
With around 500 sources on some sites, it is unavoidable that some feeds will become broken from time to time. Sometimes the source websites will migrate to a different CMS and the RSS feed suddenly stops working, often without anybody noticing. Until today, the team was re-activating the feed manually and it took a lot of work. This new feature recovered around 200 sources across all sites that had a broken RSS feed URL, the sources have now been fixed and re-activated.
Now the feeds are being validated every day by checking for errors and validating the XML structure. If the feed is found but the XML structure is invalid, an automatic repair is attempted by our system.
5) IMPROVED SCRAPING & CONTENT EXTRACTION
Various improvements have been done to our scraper bots to increase accuracy and reduce errors. One of the problems faced by bots is getting blocked by the firewall, despite making sure not to make too many requests. This is usually a false positive and unintentional from the site admin. So our bots are now more sneaky by rotating IPs and using a fake User-Agent that looks like a legit request from a real human.
Improvements have also been done to content extraction, previously many articles were missing because the content extraction failed, but now the accuracy is VERY improved.
We also bought the upgrade to the latest version of full-text-rss for PHP 8.2 support.
6) HEADING IMAGES
Another minor issue we have been facing is the heading image showing multiple times. This happens when the original source has added the same image both as a heading image and also in the article’s content.
To prevent this problem, our scraper bots have been updated with an algorithm to fingerprint the heading image and then try to find it and remove it from the post content. Only the articles posted after July 2023 are fixed.
7) LOCALE-BASED DATES
We now use localized dates adapted to the platform’s language and geographical location. Each country uses a different syntax format for the date so the sites are now using the same date formats both on the websites and in the newsletters. The days and names are also fully translated.
8) ALMALINUX AND PHP 8.2
As announced previously, all of our servers were upgraded to AlmaLinux, and all sites were upgraded to PHP 8.2. Our plugins have been adapted to this new version of PHP, and we have bought the upgrades for the software we use.
9) IMPROVED CACHING
We added page cache for the sidebar and the widgets in the footer. Also, a lot of optimizations have been done to existing W3TC page caching. We now use Redis instead of Memcached for page cache, object caching, fragment caching, and database caching.
The cache purging has also been improved after we had some problems with stalled cache and the homepage not updating with the most recent posts.
10) AUTHOR PAGE
Some bugs were fixed on the author page that displays all of the posts from a specific source. The masonry layout was broken and sometimes infinite scroll wasn’t working.