How to Clean Up Notion URLs: From Ugly URLs to Pretty URLs in Minutes

Written by RichardJohnn | Published 2021/02/11
Tech Story Tags: notion | super | scripting | automation | programming | hackernoon-top-story | url | web-development | web-monetization

TLDR How to Clean Up Notion URLs: From Ugly URLs to Pretty URLs in Minutes. How to clean up your notion page with a little shell command and a Node.js script. The easier a URL is to read for humans, the better it is for search engines. The first step to getting all your links pretty and into Super, is to get all the links off of your notion.so page. The next step is to create an array of objects with all our notion page ids and URLs they are going to.via the TL;DR App

In this article, I'll show you how to quickly setup pretty URLs in Super with a little shell command and a Node.js script.
If you are reading this, you may already be aware that https://super.so helps you go public with your https://www.notion.so pages. What you may not know, is that ugly URLs are bad for SEO. And notion's got some ugly URLs.
It should come as no surprise that the easier a URL is to read for humans, the better it is for search engines
- Rand Fishkin
Well, color me surprised! I've never needed to give SEO any thought before, but now it's my job to care! It's time to accommodate the web crawlers and get our new help and careers pages indexed.
So the first step to getting all your links pretty and into Super, is to get all the links off of your notion page. In the case of the new Hacker Noon help page (http://help.hackernoon.com/), there were 157 links to pretty up in this super form:
That's a lot of copy-pasting and clicking around and coming up with pretty URLs and so I decided to open up my chrome browser's network tab and see what was being sent to the server so I could just give it a single payload with all the links in it.
DISCLAIMER:  Using any site in a way not intended by the developers may result in loss of data.  Proceed at your own risk!  (maybe start with a little test where it won't hurt anything:)
To to get the network tab, one needs to open the developer console. On a mac, this is done with cmd+option+i. You can also reach this by clicking the ellipsis/hamburger menu (three dots, in a column) button, going to More Tools and then finally Developer Console
A portion of your window should be covered with a new panel and at the top of that, you should see some tabs. One of those is called Network and will provide us with a template to use in our own payload to send to Super.
You can see Network selected here with the blue underline. If I click on the clear button 🚫 next to the red circle, it'll clear out any previous network calls so we can clearly see what is sent when we save our pretty URLs.
Here we see a bunch of requests that happened, but the POST that goes to a URL called "update" looks interesting for our use case. (You may need to right-click the columns to select "Method" to be able to show that column) You can also use the search box with
method:POST
to filter to just those types of calls, if you already know what you are looking for.
If you click on the method and look at the Headers tab that will show up, then scroll to the bottom of that, you will see the request payload.
Open that up and see where the stuff you typed into the pretty URL window is. In this cast, there is a field called prettyUrls that is an array of objects with two fields,
pageId
which ends up just getting the ugly, unique identifier portion of a notion URL, and also a
url
field which is where it will redirect. So now we know what we are doing, we need an array of objects with all our notion page ids and the URLs they are going to. Upwards and onwards!
Right-click that method and select
Copy > Copy as Node.js fetch 
if you want to write a node.js script like I did. Maybe you're into curl or using fetch right in the browser's console.
Okay, so you've got this node code in our paste buffer and nowhere to put it! It's time to make a script and paste it in there.
We are going to ignore it for now though because I wrote this article a little out of order and I can't drag things around to reorder them and I'm just gonna roll with it!
Let's move on to getting all the notion links off the page. One way this can be done is by downloading the page and its links with:
wget -r -l0 https://help.hackernoon.com/
This took a few minutes and may have been overkill, but when it was done I had a neat little help.hackernoon.com folder with a bunch of HTML files. I ended up using the HTML files to come up with the pretty URL so maybe not overkill. Here's a bit of what it looked like in that directory when I listed the files:
help.hackernoon.com ls
0109b933c3084e9699b408a73b142e8a  5c115f8cb76445788bb16a4499758cd8  d1edc21b1f2d48d9afa43f4da6d4cc4f
0142ddfc63b94ef9a1ca01975c3d2cab  5c45cf10b99046939df336a23ddae597  d44cdbf1e6d94c0eab9fb5091c249e97
028466d28622424ca1523069faf1d7c6  5f54fe2e3b6641b3b1d49f7b47a961d7  db9dd7e7998141df9705628b03fcd75b
033557c93f714ae092e13e095ca9c2cf  604adc1e960840b782d7f2de2687f719  de55338768674dccbc80909d4ccea58f
Cool, so all these piles of alpha-numeric characters are the notion entry to put in the left side of that form.
To get the corresponding pretty URL to put on the right-hand side, I peeked inside one of the files and looked for where the header was by searching for
h1
There it is with the juicy "Hacker Noon Editor 101" meat/text inside. Now to start a script to iterate over all these files and parse and kebab-case the h1 headers. I used the library cheerio, which is jQuery for node.js, to do this. This is how things started to look:
const fs = require("fs");
const cheerio = require("cheerio");

(async ()=> {
  const files = await fs.promises.readdir("./");
  for(const file of files) {
    const stat = await fs.promises.stat(file);

    if(stat.isDirectory())
      continue; //we just want the html files

    const html = fs.readFileSync(file).toString();
    const $ = cheerio.load(html);
    const contents = $('h1').contents()[0];
    if (!contents) continue;
    const url = contents.data;
    console.log(url);
  }
})();
So this script was run from inside that help.hackernoon.com so it could find the files easily with the
./ 
path. Next, we used the
stat
function to help us determine if it was a file or directory, as there was one
_next
directory to skip.
Maybe you think the
readFileSync
is whack, because it is synchronous? You are correct.
It doesn't matter though, this is just a one-off script to get the job done. We call
toString()
on that to be able to pass the wad of HTML text at cheerio and now we can run selectors on it to scrape it.
I needed the first h1 so our selector is simply that and we
[0]
to get the first one, maybe it found more, if it didn't we continue going to the next file. After that the schema cheerio gives you dictates that to get the text out you will need to ask for the
.data
to get the text. All these get logged out and I could see if things were looking good.
By the way, one nice way to work on things like this and get quick feedback is to use another package https://nodemon.io/ which can be installed globally to use as a shell command or as a library to be used for your own file watching needs, like if you want to roll your own hot reloading framework for node development. In this case, my script was called sendIt.js and so I could run
nodemon sendIt.js
in one split pane of my terminal and anytime I saved the file in another split, the file was re-run and I got a handy feedback loop.
Okay, it's time to combine the node.js fetch bit with this loop that gets all the URL mappings.
I'm going to plop in a gist now that should get the idea across. The new code added to our existing code is just adding a Set and then adding our pretty URL objects to that.
You may also notice that I am adjusting the pretty URL around line 22. It's there that we modify the h1 text by making it lowercase on line and removing invalid URL characters. Spaces become dashes, double dashes turn back into one dash,
/[^0-9a-z-]/gi
strips out the non-dashes and all non-alphanumeric characters. Next, leading and trailing dashes are removed.
This could be condensed into less replace calls, but this is readable and this code will only ever run when you want to update your pretty URLs so I am not going to sweat it.
Next, around line 29, I skip some files with dots because those were downloaded but don't need mappings, like index.js. You could also just delete them from the folder of links that were downloaded.
Line 42 with the body was created by copying the string from the node.js fetch code we pasted in from the developer console into a node REPL and running
JSON.parse
on that. Now it's some code, so we can get a chance to take our Set of prettyUrls and get them into our payload.
Array.from(prettyUrls)
will turn the Set into an array.
The rest of the fetch is unchanged, save for the body section where I
JSON.stringify
it back into the original format we pasted in. I save this fetch call as a variable so I can print it and see what any issues are. There are always issues with the first (through at least the 10th) attempt.
In my case, the lines about stripping our invalid chars, having duplicate entries that were already added to the list of pretty URLs and all that were my issues that caused me to iterate a few times on this. So there were more console.logs sprinkled throughout this code to help me identify where things went wrong.
If the result of the fetch has a status 200 and the word OK, you are good. Refresh your pretty URL list and double-check. If you get a 500 and a Bad Gateway, you probably sent an invalid character in the URL or sent a duplicate entry.
So there you have it, you can use these ideas in a lot of other places than just notion and super. Just be careful you try out ideas on something else first that you are not worried about screwing up if you send the wrong payload. The saying "Measure twice, cut once" comes to mind.
Anywho, I hope this was helpful. You can tell me your thoughts about it on Twitter if you really want. I primarily wrote this to help me get over my writer's block and just publish something. ✌️

Written by RichardJohnn | VP of Engineering at HackerNoon
Published by HackerNoon on 2021/02/11