TL; DR. After reading this article. You will be able to build a model to generate 5-star Yelp reviews like those. Samples of generated review text (unmodified) <SOR>I had the steak, mussels with a side of chicken parmesan. All were very good. We will be back.<EOR><SOR>The food, service, atmosphere, and service are excellent. I would recommend it to all my friends<EOR><SOR>Good atmosphere, amazing food and great service.Service is also pretty good. Give them a try!<EOR> I will show you how to, Acquire and prepare the training data. Build the character-level language models. Tips when training the model. Generate random reviews. Training the model could easily take up a couple of days even on GPU. Luckily the pre-trained model weights are available. So we could jump directly to the fun part to generate reviews. Getting the Data ready The is freely available in JSON format. Yelp Dataset After downloading and extracting, you will find 2 files we need in the folder, dataset review.json business.json Those two files are quite large, especially the file (3.7 GB). review.json Each line of the file is a review of JSON string. The two files do not have the JSON start and end square brackets “[ ]”. So the content of the JSON file as a whole is not a valid JSON string. Plus it might be difficult to fit the whole file content to the memory. So, let’s first convert them to CSV format line by line with our helper script. review.json review.json python json_converter.py ./dataset/review.jsonpython json_converter.py ./dataset/business.json After that, you will find those two files in folder, dataset Those two are valid CSV files we can open by library. pandas Here is what we are going to do. We only extract review texts from the businesses that have ‘ ’ tag in their categories. 5-stars Restaurant Next, let’s remove the new line characters in reviews and any duplicated reviews. To show the model where is the start and end of a review. We need to add special markers to our review texts. So one line in the finally prepared review will look like this as you expected. "<SOR>Hummus is amazing and fresh! Loved the falafels. I will definitely be back. Great owner, friendly staff<EOR>" Build the model The model we are building here is a , meaning the minimum distinguishable symbol is a character. You may also come across the word- level model where the input is the word tokens. character-level language model There are some pros and cons for the . character-level language model Pro: Don’t have to worry about unknown vocabulary. Able to learn large vocabulary. Con: End up with very long sequences. Not as good as word level language models at capturing between how the earlier parts of the sentence also affect the later part of the sentence. long-range dependencies And character level models are also just more to train. computationally expensive The model is quite similar to the official , except we are stacking RNN cells allows storing more information throughout the hidden states between the input and output layer. It generates more realistic Yelp reviews. demo code lstm_text_generation.py Before showing the code for the model, let’s peek a little deeper on how stacking RNN works. You may have seen in the standard neural network.(That is the layers in Keras) Dense The first layer takes the input to compute the activation value , that stack next layer to compute the next activation value . x a[1] a[2] Stacking RNN is a bit like the standard neural network and “unrolling in time”. For notation means activation asslocation for and means . a[l]<t> layer l, <t> timestep t Let’s take a look how an activation value is computed To compute , there are two input, and a[2]<3> a[2]<2> a[1]<3> is the activation function, wa[2] and ba[2] are the layer 2 parameters. g As we can see, to stack RNNs. The previous RNN need to return all the timesteps a<t>to the subsequent RNN. By default, an layer such as in Keras only returns the last timestep activation value a<T>. In order to return all timesteps’ activation values, we set the parameter to . RNN LSTM return_sequences True So here is how we build the model in Keras. Each input sample is a one-hot representation of 60 characters, there are total 95 possible characters. Each output is a list of 95 predicted probabilities for each character. And here is the graphical model structure to help you visualize it. Training the model The idea to train the model is simple, we train it with the input/output pair. Each input is 60 characters, and the corresponding output is the immediately following character. In the data preparing step, we created a list of clean 5-star reviews text. Total 1,214,016 lines of reviews. To simplify the training, we are only going to train on reviews equal or less than 250 characters long. Which end up with 418,955 lines of reviews. Then we shuffle the order of the reviews so we don’t train on 100 reviews for the same restaurant in a row. We read all reviews as a long text string. Then create a python dictionary (i.e., a hash table) to map each character to an index from 0–94 (total 95 unique characters). The text corpus has a total of 72,662,807 characters. It is hard to process it as a whole. So let’s break it down into chunks of 90k characters each. For each chunk of a corpus, we are going to generate pairs of inputs and outputs. By shifting the pointer from beginning to end of the chunk, one character at a time if step set to 1. Training one chunk for one epoch takes 219 seconds on GPU (GTX1070), so training the full corpus will take about 2 days. 72662807 / 90000 * 219 /60 / 60/ 24 = 2.0 days Two Keras callbacks come handy, and . ModelCheckpoint ReduceLROnPlateau helps us save the weights everytime it improves. ModelCheckpoint callback automatically reduces learning rate when the metric stops decreasing. The main benefit of it is that we don’t need to manually tune the learning Rate. Its main weakness is that its learning rate is always decreasing and decaying. ReduceLROnPlateau loss Code to train the model for 20 epochs looks like this. It will take one month or so as you might guess. But training for about 2 hours already produces some promising results in my case. Feel free to give it a try. Generate 5-star reviews Whether you jump right to this section or you have read through the previous ones. Here is the fun part! With the pre-trained model weights or one you trained by yourself, we can generate some interesting yelp reviews. Here is the idea, we “seed” the model with initial 60 characters and ask the model to predict the very next character. The “sampling index” process will add some variety to the final result by generating some randomness with the given prediction. If the temperature is very small, it will always pick the index with highest predicted probability. To generate 300 characters with following code Summary and Further reading In this post, you know how to build and train a character-level text generation model from beginning to end. The source code is available on my as well as the pre-train model to play with. GitHub repo The model shown here is trained in a many to one fashion. There is also another optional implementation in many to many fashion. Consider the input sequence as characters of length 7 and the expected output is . You can check it out here, . “The cak” “he cake” char_rnn_karpathy_keras _Yelp_review_generation - How to generate realistic yelp restaurant reviews with Keras_github.com Tony607/Yelp_review_generation Originally published at www.dlology.com . For more practical deep learning experiences.