How To Avoid N + 1 problem in Rails

Author profile picture

@kalavhanJosué

Full stack web developer Ruby | JavaScript | PHP | CSS | RoR

Almost any app that we build has a lot of interactive options with a lot of data, to this we can also have conditions to which data should we show to what type of user, making us request data from different tables based on a unique identifier, if the queries are not handled in a proper manner this can slow down the loading, and that is because of the "N + 1" problem.
What is the "N + 1" problem?
The problem occurs when we have to make a query to get each of the associations, "1" being the first query to get the parent and "N" the number of queries required to get all the associations, let's take a look into an example.
author.rb
class Author < ApplicationRecord
  has_many :articles
end
article.rb
class Article < ApplicationRecord
  belongs_to :author
end
articles_controller.rb
class SalesController < ApplicationController
  def index
    @articles_list = Article.where("created_at >= ? AND created_at <= ?", start_date, end_date)
  end
end
index.html.erb
<% @articles_list.each do |article| %>
  <h3><%= article.created_at %></h3>
  <h1><%= article.author.name %></h1>
  <br>
<% end %>
If we run the example above we will notice that we are going to get more than one query in the terminal.

We have on query to get all the articles in the range of date
Article Load (0.4ms)  SELECT "articles".* FROM "articles" WHERE (published_at >= '2020-03-13' AND published_at <= '2020-06-24')

And we have an extra query to get the authors name of each article
Author Load (0.1ms)  SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ?  [["id", 3], ["LIMIT", 1]]
Author Load (0.1ms)  SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ?  [["id", 4], ["LIMIT", 1]]
Author Load (0.1ms)  SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
Author Load (0.1ms)  SELECT "authors".* FROM "authors" WHERE "authors"."id" = ? LIMIT ?  [["id", 2], ["LIMIT", 1]]
This is the "N + 1" problem, we have one query to get all the article but once we try to access the child (the relation to the author model) it will create a new query for each one.

How to avoid this?

There are Three ways to avoid this problem Preload, Includes and eager_load, we will not talk about preload since the behavior is pretty much the same as includes, but if we use Preload we can't use a "where" clause in the associated table and with Includes we can!
Includes
Includes helps us to fix the problem by making only two queries to get all the data, let's modify our example and take a look to the terminal.
articles_controller.rb
@articles_list = Article.includes(:author).where("published_at >= ? AND published_at <= ?", '2020-03-13', '2020-06-24')
We only have to add "includes(:name_of_association)" next to the name of the model and we will be using includes!
Now let's refresh and take a look at our terminal
Article Load (0.4ms)  SELECT "articles".* FROM "articles" WHERE (published_at >= '2020-03-13' AND published_at <= '2020-06-24')
  Author Load (0.5ms)  SELECT "authors".* FROM "authors" WHERE "authors"."id" IN (?, ?, ?, ?)  [["id", 3], ["id", 4], ["id", 1], ["id", 2]]
Now we only make two queries to get all the data of the main model and the specified associations if we want to get all the associations we have to add the name to the includes like this
Model.includes(:association_one, :association_two)
eager_load
This will create a single query using a left outer join to get the data of the main table and the association, we can also use a where clause to give conditions to the associated table, to use it is pretty much the same as we do with includes

article_controller.rb
@articles_list = Article.eager_load(:author).where("published_at >= ? AND published_at <= ?", start_date, end_date)
We have to add the name of the association in the parenthesis next to eager_load as we would do it with includes

Terminal
SQL (0.4ms)  SELECT "articles"."id" AS t0_r0, "articles"."title" AS t0_r1, "articles"."published_at" AS t0_r2, "articles"."author_id" AS t0_r3, "articles"."created_at" AS t0_r4, "articles"."updated_at" AS t0_r5, "authors"."id" AS t1_r0, "authors"."name" AS t1_r1, "authors"."age" AS t1_r2, "authors"."city" AS t1_r3, "authors"."created_at" AS t1_r4, "authors"."updated_at" AS t1_r5 FROM "articles" LEFT OUTER JOIN "authors" ON "authors"."id" = "articles"."author_id" WHERE (published_at >= '2020-03-13' AND published_at <= '2020-06-24')
As we can see with this we only need a single query to get all the data, the structure of the query will be created using a left outer join.

Which one to use?

As I metioned earlier we didn't used Preload since Includes does pretty much the same but we the difference that we can use a where clause, and Includes can also do the same as eager_load, we can force a left outer join using references
@articles_list = Article.includes(:author).references(:author).where("published_at >= ? AND published_at <= ?", '2020-03-13', '2020-06-24')
Not only that but when using includes even without the use of references sometimes it will automatically choose to use a left outer join query, then why don't we just stick with includes? well the reason is that although is designed to use the best possible pattern, sometimes it can go wrong and force a join where is not needed slowing down the process, you can get it right more often than rails.

Tags

The Noonification banner

Subscribe to get your daily round-up of top tech stories!