Original article available at https://habrahabr.ru/post/275281/ In this article, I’m going to tell you about a content project for which I had to redesign the architecture. The project was initially built with a classic LAMP (Linux, Apache, MySQL, PHP) stack. However, the user base was growing steadily and approaching one million visitors, so the server couldn’t process all the traffic properly. First and foremost, I suggested getting another server, but in this market segment, partner programs don’t see particularly high conversion rates, so project managers wouldn’t stump up the money. database If you’d like to know what changes I had to make to the existing architecture and how I implemented a rating and rotation system, read on. What’s so special about this project is that it distributes video content hosted on donor sites like YouTube. The site had to display only videos by their IDs with BBCode, so there was no need to constantly generate HTML on the fly — it was done periodically (for example, once a day), and then the system was changed to rotate the content each 1,000 impressions. Apache was replaced with Nginx, which simply served generated static HTML content. Each time a user visits the site, they should see something new. But everything new is well-forgotten old, as is often the case. In short, I needed to implement a video preview rotation system. There are quite a few rotation algorithms. You can’t even imagine how ingenious marketers are. That’s why I’ll tell you about only one simplest algorithm. The first ten slots get filled with only new video previews. The next 90 slots get filled with video previews having the highest click-through rate (CTR, number of image clicks divided by the number of image impressions) in the category. A video may potentially be popular, while its preview doesn’t reflect its content very well. This scenario is highly likely, because instead of letting a human select the best moments from a video, a company may choose to use a computer program to generate a preview from randomly selected frames. That’s why an interesting video may have more downvotes than it has upvotes. To make the site content more diverse and to account for computer-generated previews, the local rating is used: each video has three generated previews that are also rotated, and only the most attractive are chosen through natural selection. Then there’s a voting system with its upvotes and downvotes, but its technical implementation is almost identical to that of a rotation system. But we’re here not to listen to tales, but to share technical details. To make a long story short, the whole LAMP stack was replaced with a site generator. Nginx was used to serve static content. What was left was to implement CTR calculation. SEO As the number of videos on the site was around 100K, it was OK to go with persistent in-memory storage, available options being , and . Redis Aerospike Tarantool Due to decent functionality and friendly support of the Russian-speaking developers from Mail.Ru Group, I chose Tarantool. I was still using MySQL to store video IDs for BBCode, lists of categories, content description and other information necessary for site generation. As the database was scarcely used, it was allocated minimum required memory. A few more details about Tarantool (I’ll be using T* for short). Numerous articles have been written about it, so I’ll try to explain how to use it in real life, omitting the installation and setup steps. But first we’ll need some boring theory to understand how things work. T* keeps all data in spaces that are analogous to SQL tables or MongoDB collections. As a table consists of rows and a collection of documents, so a space is made up of tuples. A tuple is comprised of elements or fields. I’m used to calling tuple elements fields, so I’ll stick with this name, which is in line with the . Unlike table rows, tuple fields have no names, but only ordinal numbers. However, as you’ll see, it doesn’t really make any difference. documentation Each tuple must have a primary key. Primary index can be of the following types: TREE, HASH, BITSET or RTREE. You can also define a secondary index on a space, which allows for unique selections that are impossible in Redis. To store ratings, we’ll create a space called . To do that programmatically, we need to execute the following commands in the console: stats box.cfg{} –- loading the default configurationbox.schema.space.create(“stats”) –- creating a new space Let’s check whether our space has been created: box.space—--- stats:temporary: falseengine: memtx… Assigning the space to the variable: stats stats = box.space If we were creating a schema for our database or MongoDB, we’d go with something like this: key — primary key, same as a video ID clicks_1 — image 1 clicks clicks_2 — image 2 clicks clicks_3 — image 3 clicks clicks_sum_1 — total image 1 clicks clicks_sum_2 — total image 2 clicks clicks_sum_3 — total image 3 clicks show_1 — same for impressions … ctr_1 — image 1 CTR for the last specified period ctr_2 — image 2 CTR for the last specified period ctr_3 — image 3 CTR for the last specified period ctr_sum_1 — image 1 CTR for the whole period ctr_sum_2 — image 2 CTR for the whole period ctr_sum_3 — image 3 CTR for the whole period ctr — CTR across all images for the last specified period ctr_sum — CTR across all images for the whole period The first column is the field number. Let’s assign each a constant number: -- the first field is a primary keyclicks_1 = 2clicks_2 = 3. . .ctr_sum = 22 Creating a primary key of the HASH type in our space: stats:create_index(‘primary’, {type = ‘hash’, parts = {1, ‘NUM’}}) Making sure it did work: stats.index---- 0: &0unique: trueparts:— type: NUMfieldno: 1id: 0space_id: 513name: primarytype: HASHprimary: *0… In case of success, we’re going to create a function that increments the field, but let’s first insert a few records for debugging: clicks_1 stats:insert{1,0,0,0,0,0,0}stats:insert{2,0,0,0,0,0,0}stats:insert{3,0,0,0,0,0,0} Checking one of the inserted records: stats:select{2}---- — [2, 0, 0, 0, 0, 0, 0]… Great, everything works! Now let’s write the code for incrementing the field: stats:update(2,{{‘+’,2,1}})stats:select{2}- [2, 1, 0, 0, 0, 0, 0]stats:update(2,{{‘+’,2,1}})- [2, 2, 0, 0, 0, 0, 0] The command has the following parameters: update primary key — key of the record for which update is performed list of actions, where each item is a triplet (list of size 3): action type (addition in our case), number of the field to change, and a number More information on the command is available in the . update documentation As you can see, with each the data in the second field of the record with primary key 2 gets incremented by 1. Let’s present it in a more readable form. Earlier in the text, we did the following: stats:update clicks_1 = 2 Let’s execute the command like this: stats:update(2,{{‘+’,clicks_1,1}})- [2, 4, 0, 0, 0, 0, 0] Wrapping it up in a function: function click_inc(key)stats:update(key,{{‘+’,clicks_1,1}})end Checking that the function works: click_inc(2)stats:select{2}---- — [2, 5, 0, 0, 0, 0, 0]…click_inc(2)stats:select{2}---- — [2, 6, 0, 0, 0, 0, 0]… Let’s add an image number to our function (images are counted from zero): function click_inc(key, img_num)stats:update(key,{{‘+’,clicks_1 + img_num, 1}})end After making sure it works as expected, let’s move the function to a separate file called and expand it a little: click.lua function click_inc(key, img_num)if img_num >3 thenreturn falseendbox.space.stats:update(key, {{‘+’,clicks_1 + img_num,1}})return trueend As you can see, the logic here is pretty straightforward: the first argument is a video ID, the second is the number of its preview. Let’s now see how it can be used in real life. In a web project, this function can be called in three different ways: through a user API: from PHP/Python/Perl/Java and other scripts through tarantool-http, where Nginx will be proxying all the requests, or a custom Lua script using http.lib or another web server (for example, Xavante) directly from Nginx by using the module nginx_upstream We opted for the last one. My article is too long as it is, so you can check this by the T* developers on how to install and configure the module. article So, our file will look like this: click.lua #!/usr/bin/tarantool box.cfg{log_level = 5;listen = 10001;} click_1 = 2; function click_inc(key, img_num)if img_num >3 thenreturn 0endbox.space.stats:update(key, {{‘+’,click_1 + img_num,1}})return 1end Let’s check how it works: curl — data ‘{“method”:”click_inc”,”params”:[2,1], “id”:0}’ {“id”:0,”result”:[[1]]} http://127.0.0.1:8081/echo Connecting to a running T* instance: console=require(“console”)console.connect(“127.0.0.1:10001”)tarantool: connected to 127.0.0.1:10001- truestats = box.space.statsstats:select{2}- — [2, 7, 0, 0]… We can also increment the counter of the second image: curl — data ‘{“method”:”click_inc”,”params”:[2,2], “id”:1}’{“id”:0,”result”:[[1]]} http://127.0.0.1:8081/echo Checking the result: stats:select{2}- — [2, 7, 1, 0]… We’ve learned how to easily implement click counting. Time to see how to count impressions properly. Each page containing previews that belong to one category should, in theory, call the function (that increments the impression counter) a hundred times (let’s pretend one category page contains a hundred previews from that category). As we all understand, it’s suboptimal, though. Another option would be to generate a variable in the body of an HTML page. show_inc <script>show_pictures=”1,2,3,4,5” /* list of IDs of all images to show */</script> Then we could send this list with the help of AJAX. But, apart from the ID of an image, here we need to pass in its impression option as well, which might result in the following list: “1–1, 2–1, 3–1, 4–2”, where the number after the hyphen specifies the impression option of an image. Unfortunately, Lua doesn’t have any functions analogous to in PHP, so I found this code snippet on the Internet: explode() function split(inputstr, sep)if sep == nil thensep = “%s”endlocal t={} ; i=1for str in string.gmatch(inputstr, “([^”..sep..”]+)”) dot[i] = stri = i + 1endreturn tend Let’s iterate over the table in a loop with the following function: function values(t)local i = 0return function()i = i + 1;return t[i]endend for it in values(tt) doshow_inc(it, 2)end As you’ve probably guessed, is almost the same as , except that we’re using instead of . That’s why we can create a more versatile function called : show_inc click_inc show_1 click_1 stat_inc function stat_inc(key, field, img_num)if img_num >3 thenreturn 0endbox.space.stats:update(key, {{‘+’,field + img_num,1}})return 1end As we’re calculating two types of CTR — for the period that started at the time of the last generation and for the whole period — let’s create the function that will be called through Nginx: click function click(key, img_num)stat_inc(key, clicks_1, img_num)stat_inc(key, clicks_sum_1, img_num)end The function will look as follows: show function show(key_list)list = split(key_list, ‘,’)for it in value(list)dopos = string.find(it, “-”);key = string.sub(it, 0, pos-1);img_num = string.sub(it,pos+1)stat_inc(key, shows_1, img_num)stat_inc(key, shows_sum_1, img_num)endend This way we’re calculating both clicks and impressions. Thanks for reading this article! is how hackers start their afternoons. We’re a part of the family. We are now and happy to opportunities. Hacker Noon @AMI accepting submissions discuss advertising & sponsorship To learn more, , , or simply, read our about page like/message us on Facebook tweet/DM @HackerNoon. If you enjoyed this story, we recommend reading our and . Until next time, don’t take the realities of the world for granted! latest tech stories trending tech stories