recently came across this cool article showing how to make URLs that don't suck. It cross references some other articles discussing the same topic - Jakob Nielsen and Tim Berners Lee. Essentially, URLs should be shareable, shortenable, and human readable+guessable. Lets discuss some methodologies to accomplish this in Rails 5.
Shareable
Rails by default uses a RESTful API, URLs contain parameters that are used to determine what to display on the page.
So it's harder to create worthless links like homesearch.com/homes/for_sale/search_results.asp
.
Though it's definitely still possible through JS or reconfiguration, and should be avoided.
It doesn't end there.
Shareability isn't a switch, it's a spectrum, and it's affected by length and intuitiveness.
An intuitive URL is something lacking arbitrary gotchas - a user can probably guess it.
Rails does a good job with this, change the URL extension to .json
and json will be sent.
It also promotes creating simple resources, so you avoid seemingly random URL hierarchies (e.g. site.com/user_name/_$!projects
).
An area that rails falls flat is in long IDs. Imagine a site that creates a lot of DB tuples such as a popular social network. If you use an integer id, user 1,000,000 will have to type a lot of characters to share his page. You could try ID slugs using FriendlyID, but problems can arise with duplicates and speed. A better solution is to create alphanumeric ids using a larger radix.
String Ids
URL safe, non-reserved characters are a-z, A-Z, 0-9, and $-_.+!*'(),
.
These 73 characters allow you to create a base 73 ID that is much shorter than a decimal number (1 billion = zfGya).
However, we need to remove some characters from our lexicon.
Including both periods and commas would create an easy mixup, so lets remove '.'.
MiXeD cASe is possible but can be difficult to remember and describe. You definitely should not use mixed case in resources (Events/eVENTS/EVENTS) but IDs are a bit more nuanced, Youtube uses mixed case IDs for example. Ultimately you should think about how users will share your links and decide if shorter IDs (due to including UpperCase in the radix) will be better than standardized case.
That leaves us with base 46 (1 billion=4+fv$k).
For security, IDs should also be randomized to prevent inference attacks. This is all very similar to how Youtube creates video IDs.
First, let's setup our model to accept a String based ID.
Let's use an "Event" model for example.
In the migration set id:false
to disable auto-incrementing integer ids.
Then recreate the id column as a string.
1
2
3
4
5
6
7
8
class CreateEvents < ActiveRecord::Migration[5.1]
def change
create_table :events, id:false do |t|
t.string :id, null: false, index: true, unique:true
#...
end
end
end
Run the migration. Inside the model, set the 'new' primary key.
1
2
3
4
class Event < ApplicationRecord
self.primary_key = :id
#...
end
Our model is ready to go!
Next, let's set up the word blacklist.
This will ensure the alphanumeric id does not contain offensive material.
Create lib/assets/blacklist_words.csv
, paste in this (NSFW) filter, and perform whatever data cleanup you need.
For example, these words will be appearing in a URL with a max length, so you can remove all spaces and all phrases over your max length (8 characters for example).
Keeping this blacklist concise will greatly speed up ID generation.
Now we need to create the id generation logic.
This is not application specific - it will work for multiple resources and multiple projects - and thus should go in the lib/
folder as well.
Create lib/short_ids.rb
and paste in the following code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
module ShortIds
#AKA @radix. URL safe, lowercase characters
@lexicon = ('0'..'9').to_a + ('a'..'z').to_a + ['$','-','_','+','!','*','(',')',',']
#Load blacklisted words from CSV into an iterable data structure
@blacklist = File.read(Rails.root.join('lib', 'assets', 'blacklist_words.csv')).split('\n')
#require 'csv' #if you have a more complicated CSV setup
#@blacklist = CSV.parse(@blacklist, :headers => true, :encoding => 'ISO-8859-1')
def self.create_id(max_len)
id = ""
#concat random letters from the lexicon together
(1..max_len).each do |i|
id.concat(@lexicon.sample)
end
return id
end
def self.id_is_acceptable?(id, model)
#ensure no bad words in ID
id_bad=false
@blacklist.each do |word|
id_bad = id.include? word
break if id_bad #choose a new id if this one has a blacklisted word
end
#ensure no other tuple already has this id
id_bad = id_bad or model.exists?(id)
return (not id_bad)
end
#Create an unused id for a given model
def self.new(model, max_len=7)
id = 0
loop do
id = self.create_id(max_len)
break if self.id_is_acceptable?(id,model)
end
return id
end
end
Notice @lexicon
, this defines our id vocabulary.
If you don't like my character choices, feel free to add uppercase and such back in.
self.new
creates candidate ids until it finds an acceptable one, then returns it.
Candidate ids are created by concatenating random characters together from the lexicon. I previously had another method of id creation here that was slower and more complex, you can see the benchmark here.
An id is acceptable if it does not contain profanity and is unique (doesn't yet exist in the model).
For a base 46 ID of 8 characters you can create 46^8 = 2.005×10¹³ unique IDs.
To use our new lib file the module and filename must follow Rails naming conventions (which they do) and it must be autoloaded in config/application.rb
:
1
2
3
4
5
6
module MyApp
class Application < Rails::Application
config.autoload_paths << Rails.root.join('lib')
#...
end
end
Finally, let's ensure the Event model uses the new id.
Open app/models/event.rb
and add a before_create
function to auto-assign ID.
1
2
3
4
5
6
7
8
9
class Event < ApplicationRecord
self.primary_key = :id #remember this from above?
before_create :set_id
private
def set_id
self.id = ShortIds.new(Event)
end
end
There you have it! An alphanumeric short ID for an Events model. Note that what Youtube and the recommended method does is create a table, populate it with available+random IDs, and periodically recreates IDs as the table runs low. We create IDs dynamically when the user asks for one, preallocating IDs allows us to offset generation processing to off-peak times. I leave that as an exercise to the reader.
Shortenable
As mentioned previously, pure slugged URLs are undesirable due to speed and duplication issues. But URLs should definitly be readable, otherwise the user doesn't know what they are clicking on. Stack overflow manages this well with user accounts, notice what happens to the URL when you navigate to https://stackoverflow.com/users/4180797. It auto-completes to https://stackoverflow.com/users/4180797/james-lowrey. Actually, you can put anything after the ID and it will always become "james-lowrey": https://stackoverflow.com/users/4180797/bad-word-filters-are-fun.
Let's get the same setup in Rails.
Once we're done, we'll have multiple routes pointing to the same resource. Be sure to assign canonical URLs else you might get an SEO hit.
Shortenable Show Action (Stack Overflow-esque)
Note that all of the following applies to the edit
action as well.
First step is to fix some routing.
We want to accept garbage after the id in a resource's show action.
This ensures resource/id/anything-here
will not 404 and instead will be redirected to resource/id
.
1
2
3
4
Rails.application.routes.draw do
resources :events
get "/events/:id/*other" => "events#show" #if any txt is trailing id, also send this route to events#show
end
Next, let's obtain readable text from the model.
If you want a resource to have readable text URLs, there should be some field to pull it from.
Let's assume your Events model has a name
field, and pull from that.
So name
must be defined in the migration.
1
2
3
4
5
6
7
8
class CreateEvents < ActiveRecord::Migration[5.1]
def change
create_table :events do |t| #works with string or integer ids
t.string :name, null: false
#...other stuff...
end
end
end
Now we should parse and modify URL requests.
Let's again place our logic in the lib/
folder (so ensure the autoloader is on as described in previous section).
Create lib/shortenable_urls.rb
and paste in the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
module ShortenableUrls
def self.redirect_for_readability?(request, id, readable_txt_parameterized)
#check if column_name is not seen in the URL of an HTML request
if request.format.html? #do not care how URL looks for json/data requests
id = id.to_s
#parse out the URL text following the id
url = request.original_url
id_end_indx = url.index(id) + (id).length + 1 #+1 for '/' character
#URL txt after id does not match readable_txt
if url[id_end_indx..-1] != readable_txt_parameterized
return true
end
end
return false
end
end
This stateless function requires the request, the object id and the object's parameterized readable text. It parses out the text following the id and returns whether or not it matches the expected text.
Finally, we should redirect in Events#show
if an incorrect value was in the request.
1
2
3
4
5
6
7
8
9
10
11
class EventsController < ApplicationController
def show
readable_txt = @event.name.parameterize
if ShortenableUrls.redirect_for_readability?(request, @event.id, readable_txt)
redirect_to "/events/#{@event.id}/#{readable_txt}"
return #do not continue with controller processing in a redirect. Once redirect finishes, action called again with correct URL, and return will be skipped.
end
#...other controller actions
end
#...other stuff
end
And you're done!
It's important to notice that return
function, and that this redirect logic is placed at the beginning of show
.
Assuming you have other logic in the show
action, returning you from wasting processing power when the page will just redirect.
After the redirect is complete, the URL will be correct, the return will be skipped, and your subsequent logic will be called!
Entering /events/id/ANYTYHING
or /events/id
will always redirect to /events/id/name
.
Note that this can also be accomplished client side with JS, and thus you could you use page <title>
or any other text, but it would break on browsers with JS disabled.
Resources Aliasing
Let's say Event is your only resource that begins with the letter 'e'.
Seems like it would be useful to provide /e
as an alias to /events
(this is common for sites to do with /users
and /u
).
Thankfully this is much simpler than the other things we've encountered thus far.
Open up your config/routes.rb
and add an event alias.
1
2
3
4
5
6
7
8
9
10
11
Rails.application.routes.draw do
#alias the "events" routes
match '/e/*other',
via: :all,
to: redirect { |path_params, req|
#We are matching /e from an absolute path. Thus, the first occurrence of '/e' is the one we want to substitute for '/events'. Sub the first and return
req.original_url.sub('/e','/events')
}
#...other stuff...
end
Usually you want to avoid match and stick to specific HTTP verbs, but we want this to be a catch-all pass-through for any given "/e" URLs. This routing will match "/e" URLs, substitute its first occurance, and redirect to the desired "/events" controller while passing along all trailing URL info.
Be careful with aliases. The worst thing you can do is break links, so you should be absolutely certain you want these aliases before you push to prod.
Originally I was using a different methodology for aliasing, but decided to go with this one due to its greater generalization and better handling of…
Canonical URLs
Duplicated content can carry a strict SEO penalty, but the answer is easy - canonical URLs. Canonical tags allow you to specify the preferred URL location of your webpage and helps direct search engines to display the correct page. In Rails, this is a breeze to set up with canonical-rails.
Add gem 'canonical-rails', github: 'jumph4x/canonical-rails'
to your Gemfile and bundle install
.
Generate the configuration via rails g canonical_rails:install
and modify it at config/initializers/canonical_rails.rb
.
I suggest setting protocol to HTTPS (you are using HTTPS right?) and opengraph_url
to true for basic setup.
Finally, paste <%= canonical_tag -%>
into your app/views/layouts/application.html.erb
's <head>
tag.
Run Rails, inspect a page, and behold the canonical glory - <link href="https://localhost:3000/events/3k66ee-/event-name" rel="canonical">
.
We've created aliasing for resources and readable show actions, but since we've accomplished this with redirects no further customization to canonical urls is required. You're all set out of the box.
Any other URL rodeoing I've missed? Let me know in the comments!