How to Generate Sitemaps Using Laravel on the Fly

Written by spekulatius | Published 2020/03/30
Tech Story Tags: laravel | seo | web-crawlers | laravel-tips-and-tricks | php7 | github | programming | web-development | web-monetization

TLDR Sitemaps are XML-files containing structured data about the pages of the website. They are helpers for search-engines to discover all relevant pages and content on a website. If you are using a framework such as Laravel you can create these on the fly or whenever you publish or update your content. It's heavily reliant on PHP Spider, a crawler package for PHP. The sitemap crawler crawler uses some regex to identify the most interesting parts of the site. More detail can be found on the GitHub repo for Laravel-Sitemaps.via the TL;DR App

First things first: What is a sitemap?
Sitemaps are xml-files containing structured data about the pages of the website. Each page has an entry similar to this one:
  <url>
    <loc>https://startupnamecheck.com</loc>
    <lastmod>2020-03-06T20:31:03+00:00</lastmod>
    <priority>0.9</priority>
    <changefreq>monthly</changefreq>
  </url>
What are sitemaps good for?
Sitemaps are helpers for search-engines to discover all relevant pages and content on a website. While there are also sitemaps for images, the focus here is on web-pages only.
How can I generate a sitemap?
A sitemap can be created in various ways. If you are using a framework such as Laravel you can create these on the fly or whenever you publish or update your content.
After some experiments and checking several solutions on GitHub I've not found the solution I was looking for:
  • A simple, permanent crawler of the actual website.
  • It considers `
    noindex
    ` robots tags as well as canonicals and of course the `
    article:modified_time
    ` tag.
  • Ignores JavaScript as Google does mostly. This allows it to run much faster than executing a headless browser only to access a pure HTML5/CSS3 page.
My solution for sitemaps on the fly
As mentioned, after some research I haven't found what I had in mind. So, being a developer at heart, I've opted to build my own solution. It's heavily reliant on PHP Spider, a crawler package for PHP. Besides this, the package is using some regex to identify the most interesting parts of the website. Other values, such as `
priority
` are guessed by the depth within the website (nesting level). More detail can also be found on the GitHub repo for Laravel-Sitemaps.
How can I get this package?
The package is distributed using composer and can be installed using:
composer require bringyourownideas/laravel-sitemap
This will automatically configure the required Laravel ServiceProvider. If you opted out of package discovery you can install it manually using:
php artisan vendor:publish --provider="BringYourOwnIdeas\LaravelSitemap\SitemapServiceProvider"
How to use the package?
The package registers an `
artisan
`-command called `
generate:sitemap
`. This triggers a crawl of your site and writing out of the sitemap in the public-directory. For convenience, you can add this to your deployment steps.
Regular updates of the sitemap
If you'd like to run updates of the `
sitemap.xml
` regularly, you can add a new line in `
app/Console/Kernel.php
` in the `
schedule
` function:
/**
 * Define the application's command schedule.
 *
 * @param  \Illuminate\Console\Scheduling\Schedule  $schedule
 * @return void
 */
protected function schedule(Schedule $schedule)
{
    $schedule->command('generate:sitemap')->daily();

    // ...or with a defined time...

    $schedule->command('generate:sitemap')->daily()->at('02:50');
}
Summary & Feedback
If you've got issues please raise an issue on GitHub. To stay updated please subscribe to my newsletter. More information can also be found in the BYOI article around the Laravel Sitemap Generator. Thanks for reading - I hope you like the sitemap crawler :)
Previously published at https://peterthaleikis.com/posts/how-to-generate-a-laravel-sitemap-dynamically/

Written by spekulatius | Building side-projects and learning new stuff every day.
Published by HackerNoon on 2020/03/30