Iterating Through APIs with Multiple Pages

Anyone who has pulled data from an API, understands how useful it can be for pulling lots of data for a particular topic. To do so, you could use the following code:

all_characters = RestClient.get('http://www.swapi.co/api/people/')character_hash = JSON.parse(all_characters)

In that example, we’re pulling the Star Wars characters API. We use RestClient.get(‘http://www.swapi.co/api/people/’) to do so, which we set equal to a variable all_characters. This returns a string, which we then parse into a usable hash using JSON.parse. The final output of that hash is this:

{"count"=>87,

"next"=>"http://www.swapi.co/api/people/?page=2",

"previous"=>nil,

"results"=>

[{"name"=>"Luke Skywalker",

"height"=>"172",

"mass"=>"77",

"hair_color"=>"blond",

"skin_color"=>"fair",

"eye_color"=>"blue",

"birth_year"=>"19BBY",

"gender"=>"male",

"homeworld"=>"http://www.swapi.co/api/planets/1/",

"films"=>

["http://www.swapi.co/api/films/2/",

"http://www.swapi.co/api/films/6/",

"http://www.swapi.co/api/films/3/",

"http://www.swapi.co/api/films/1/",

"http://www.swapi.co/api/films/7/"],

"species"=>["http://www.swapi.co/api/species/1/"],

"vehicles"=>["http://www.swapi.co/api/vehicles/14/", "http://www.swapi.co/api/vehicles/30/"],

"starships"=>["http://www.swapi.co/api/starships/12/", "http://www.swapi.co/api/starships/22/"],

"created"=>"2014-12-09T13:50:51.644000Z",

"edited"=>"2014-12-20T21:17:56.891000Z",

"url"=>"http://www.swapi.co/api/people/1/"},

{"name"=>"C-3PO",

"height"=>"167",

"mass"=>"75",

"hair_color"=>"n/a",

"skin_color"=>"gold",

"eye_color"=>"yellow",

"birth_year"=>"112BBY",

"gender"=>"n/a",

"homeworld"=>"http://www.swapi.co/api/planets/1/",

"films"=>

I’ve only included the first few lines of the output, but basically it is a hash with different information for each of the major characters. One thing important thing to note though is that even though we pulled the entire characters API, only about 10 characters per page shows. In other words, you have to iterate through multiple pages in order to view/check information for all the characters.

How can we do this? If you look at the character_hash, you will notice that one of the keys, “next” shows what the next page’s URL is. If we can set the URL so that RestClient pulls the next page, we would be able to loop (hint) through each page until we found what we were looking for.

For purposes of our lab, we wanted our method to take in an argument of a character, and iterate through the hash to look for that character and return an array of the URLs of the movies that character was in. The resulting code was this:

def get_character_movies_from_api(character)

all_characters = RestClient.get('http://www.swapi.co/api/people/')character_hash = JSON.parse(all_characters)

while character_hashfilm_urls = character_hash["results"].find do |hash|hash["name"].downcase == characterendif film_urlsreturn film_urls["films"].map do |film|JSON.parse(RestClient.get(film))endendcharacter_hash = character_hash["next"] ? JSON.parse(RestClient.get(character_hash["next"])) : nilendend

We implement on a “while” loop, which will keep running the block while character_hash is not false or nil. It calls the #find method to look for the character passed in the argument. If it finds the character, it takes the resulting array and iterate on it to get another array of films.

Once the character is found and the resulting array is iterated on, the method is done. However, if the character is not found on the 1st page, we need to turn to the next page. This is where the next line of code comes into play. If the “next” key in the character_hash has a URL, then we parse that URL to get a hash of the next page and repeat the loop. If the value of the “next” key is nil, then character_hash is set to nil and the while loop is broken.