JRL About

URLs are UI in Rails 5

July 11, 2017
Updated: November 30, 2021

Twitter Hacker News Reddit LinkedIn
Article Header Image
I

recently came across this cool article showing how to make URLs that don't suck. It cross references some other articles discussing the same topic - Jakob Nielsen and Tim Berners Lee. Essentially, URLs should be shareable, shortenable, and human readable+guessable. Lets discuss some methodologies to accomplish this in Rails 5.

Shareable

Rails by default uses a RESTful API, URLs contain parameters that are used to determine what to display on the page. So it's harder to create worthless links like homesearch.com/homes/for_sale/search_results.asp. Though it's definitely still possible through JS or reconfiguration, and should be avoided.

It doesn't end there. Shareability isn't a switch, it's a spectrum, and it's affected by length and intuitiveness. An intuitive URL is something lacking arbitrary gotchas - a user can probably guess it. Rails does a good job with this, change the URL extension to .json and json will be sent. It also promotes creating simple resources, so you avoid seemingly random URL hierarchies (e.g. site.com/user_name/_$!projects).

An area that rails falls flat is in long IDs. Imagine a site that creates a lot of DB tuples such as a popular social network. If you use an integer id, user 1,000,000 will have to type a lot of characters to share his page. You could try ID slugs using FriendlyID, but problems can arise with duplicates and speed. A better solution is to create alphanumeric ids using a larger radix.

String Ids

URL safe, non-reserved characters are a-z, A-Z, 0-9, and $-_.+!*'(),. These 73 characters allow you to create a base 73 ID that is much shorter than a decimal number (1 billion = zfGya). However, we need to remove some characters from our lexicon. Including both periods and commas would create an easy mixup, so lets remove '.'.

MiXeD cASe is possible but can be difficult to remember and describe. You definitely should not use mixed case in resources (Events/eVENTS/EVENTS) but IDs are a bit more nuanced, Youtube uses mixed case IDs for example. Ultimately you should think about how users will share your links and decide if shorter IDs (due to including UpperCase in the radix) will be better than standardized case.

That leaves us with base 46 (1 billion=4+fv$k).

For security, IDs should also be randomized to prevent inference attacks. This is all very similar to how Youtube creates video IDs.

First, let's setup our model to accept a String based ID. Let's use an "Event" model for example. In the migration set id:false to disable auto-incrementing integer ids. Then recreate the id column as a string.

1
2
3
4
5
6
7
8
class CreateEvents < ActiveRecord::Migration[5.1]
  def change
    create_table :events, id:false do |t|
      t.string :id, null: false, index: true, unique:true
      #...
    end
  end
end

Run the migration. Inside the model, set the 'new' primary key.

1
2
3
4
class Event < ApplicationRecord
    self.primary_key = :id
    #...
end

Our model is ready to go!

Next, let's set up the word blacklist. This will ensure the alphanumeric id does not contain offensive material. Create lib/assets/blacklist_words.csv, paste in this (NSFW) filter, and perform whatever data cleanup you need. For example, these words will be appearing in a URL with a max length, so you can remove all spaces and all phrases over your max length (8 characters for example). Keeping this blacklist concise will greatly speed up ID generation.

Now we need to create the id generation logic. This is not application specific - it will work for multiple resources and multiple projects - and thus should go in the lib/ folder as well. Create lib/short_ids.rb and paste in the following code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
module ShortIds
    #AKA @radix. URL safe, lowercase characters
    @lexicon = ('0'..'9').to_a + ('a'..'z').to_a + ['$','-','_','+','!','*','(',')',',']

    #Load blacklisted words from CSV into an iterable data structure
    @blacklist = File.read(Rails.root.join('lib', 'assets', 'blacklist_words.csv')).split('\n')
    #require 'csv' #if you have a more complicated CSV setup
    #@blacklist = CSV.parse(@blacklist, :headers => true, :encoding => 'ISO-8859-1')


    def self.create_id(max_len)
        id = ""
        #concat random letters from the lexicon together
        (1..max_len).each do |i|
            id.concat(@lexicon.sample)
        end
        return id
    end

    def self.id_is_acceptable?(id, model)
        #ensure no bad words in ID
        id_bad=false
        @blacklist.each    do |word|
            id_bad = id.include? word
            break if id_bad #choose a new id if this one has a blacklisted word
        end

        #ensure no other tuple already has this id
        id_bad = id_bad or model.exists?(id)

        return (not id_bad)
    end

    #Create an unused id for a given model
    def self.new(model, max_len=7)
        id = 0
        loop do
            id = self.create_id(max_len)
            break if self.id_is_acceptable?(id,model)
        end
        return id
    end
end

Notice @lexicon, this defines our id vocabulary. If you don't like my character choices, feel free to add uppercase and such back in.

self.new creates candidate ids until it finds an acceptable one, then returns it.

Candidate ids are created by concatenating random characters together from the lexicon. I previously had another method of id creation here that was slower and more complex, you can see the benchmark here.

An id is acceptable if it does not contain profanity and is unique (doesn't yet exist in the model).

For a base 46 ID of 8 characters you can create 46^8 = 2.005×10¹³ unique IDs.

To use our new lib file the module and filename must follow Rails naming conventions (which they do) and it must be autoloaded in config/application.rb:

1
2
3
4
5
6
module MyApp
  class Application < Rails::Application
    config.autoload_paths << Rails.root.join('lib')
    #...
  end
end

Finally, let's ensure the Event model uses the new id. Open app/models/event.rb and add a before_create function to auto-assign ID.

1
2
3
4
5
6
7
8
9
class Event < ApplicationRecord
    self.primary_key = :id #remember this from above?
    before_create :set_id

    private
        def set_id
            self.id = ShortIds.new(Event)
        end
end

There you have it! An alphanumeric short ID for an Events model. Note that what Youtube and the recommended method does is create a table, populate it with available+random IDs, and periodically recreates IDs as the table runs low. We create IDs dynamically when the user asks for one, preallocating IDs allows us to offset generation processing to off-peak times. I leave that as an exercise to the reader.

Shortenable

As mentioned previously, pure slugged URLs are undesirable due to speed and duplication issues. But URLs should definitly be readable, otherwise the user doesn't know what they are clicking on. Stack overflow manages this well with user accounts, notice what happens to the URL when you navigate to https://stackoverflow.com/users/4180797. It auto-completes to https://stackoverflow.com/users/4180797/james-lowrey. Actually, you can put anything after the ID and it will always become "james-lowrey": https://stackoverflow.com/users/4180797/bad-word-filters-are-fun.

Let's get the same setup in Rails.

Once we're done, we'll have multiple routes pointing to the same resource. Be sure to assign canonical URLs else you might get an SEO hit.

Shortenable Show Action (Stack Overflow-esque)

Note that all of the following applies to the edit action as well.

First step is to fix some routing. We want to accept garbage after the id in a resource's show action. This ensures resource/id/anything-here will not 404 and instead will be redirected to resource/id.

1
2
3
4
Rails.application.routes.draw do
  resources :events
  get "/events/:id/*other" => "events#show" #if any txt is trailing id, also send this route to events#show
end

Next, let's obtain readable text from the model. If you want a resource to have readable text URLs, there should be some field to pull it from. Let's assume your Events model has a name field, and pull from that. So name must be defined in the migration.

1
2
3
4
5
6
7
8
class CreateEvents < ActiveRecord::Migration[5.1]
  def change
    create_table :events do |t| #works with string or integer ids
      t.string  :name, null: false
      #...other stuff...
    end
  end
end

Now we should parse and modify URL requests. Let's again place our logic in the lib/ folder (so ensure the autoloader is on as described in previous section). Create lib/shortenable_urls.rb and paste in the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
module ShortenableUrls
    def self.redirect_for_readability?(request, id, readable_txt_parameterized)
        #check if column_name is not seen in the URL of an HTML request
        if request.format.html? #do not care how URL looks for json/data requests
            id = id.to_s

            #parse out the URL text following the id
            url = request.original_url
            id_end_indx = url.index(id) + (id).length + 1 #+1 for '/' character

            #URL txt after id does not match readable_txt
            if url[id_end_indx..-1] != readable_txt_parameterized
                return true
            end
        end

        return false
    end
end

This stateless function requires the request, the object id and the object's parameterized readable text. It parses out the text following the id and returns whether or not it matches the expected text.

Finally, we should redirect in Events#show if an incorrect value was in the request.

1
2
3
4
5
6
7
8
9
10
11
class EventsController < ApplicationController
  def show
    readable_txt = @event.name.parameterize
    if ShortenableUrls.redirect_for_readability?(request, @event.id, readable_txt)
      redirect_to "/events/#{@event.id}/#{readable_txt}"
      return #do not continue with controller processing in a redirect. Once redirect finishes, action called again with correct URL, and return will be skipped.
    end
    #...other controller actions
  end
  #...other stuff
end

And you're done! It's important to notice that return function, and that this redirect logic is placed at the beginning of show. Assuming you have other logic in the show action, returning you from wasting processing power when the page will just redirect. After the redirect is complete, the URL will be correct, the return will be skipped, and your subsequent logic will be called!

Entering /events/id/ANYTYHING or /events/id will always redirect to /events/id/name. Note that this can also be accomplished client side with JS, and thus you could you use page <title> or any other text, but it would break on browsers with JS disabled.

Resources Aliasing

Let's say Event is your only resource that begins with the letter 'e'. Seems like it would be useful to provide /e as an alias to /events (this is common for sites to do with /users and /u). Thankfully this is much simpler than the other things we've encountered thus far.

Open up your config/routes.rb and add an event alias.

1
2
3
4
5
6
7
8
9
10
11
Rails.application.routes.draw do
  #alias the "events" routes
  match '/e/*other',
    via: :all,
    to: redirect { |path_params, req|
      #We are matching /e from an absolute path. Thus, the first occurrence of '/e' is the one we want to substitute for '/events'. Sub the first and return
      req.original_url.sub('/e','/events')
    }

  #...other stuff...
end

Usually you want to avoid match and stick to specific HTTP verbs, but we want this to be a catch-all pass-through for any given "/e" URLs. This routing will match "/e" URLs, substitute its first occurance, and redirect to the desired "/events" controller while passing along all trailing URL info.

Be careful with aliases. The worst thing you can do is break links, so you should be absolutely certain you want these aliases before you push to prod.

Originally I was using a different methodology for aliasing, but decided to go with this one due to its greater generalization and better handling of…

Canonical URLs

Duplicated content can carry a strict SEO penalty, but the answer is easy - canonical URLs. Canonical tags allow you to specify the preferred URL location of your webpage and helps direct search engines to display the correct page. In Rails, this is a breeze to set up with canonical-rails.

Add gem 'canonical-rails', github: 'jumph4x/canonical-rails' to your Gemfile and bundle install. Generate the configuration via rails g canonical_rails:install and modify it at config/initializers/canonical_rails.rb. I suggest setting protocol to HTTPS (you are using HTTPS right?) and opengraph_url to true for basic setup.

Finally, paste <%= canonical_tag -%> into your app/views/layouts/application.html.erb's <head> tag. Run Rails, inspect a page, and behold the canonical glory - <link href="https://localhost:3000/events/3k66ee-/event-name" rel="canonical">.

We've created aliasing for resources and readable show actions, but since we've accomplished this with redirects no further customization to canonical urls is required. You're all set out of the box.

Any other URL rodeoing I've missed? Let me know in the comments!