Building a Serverless Microservice CRUD RESTful API with MongoDB

In this post we will build a Serverless Microservice that exposes create, read, update, delete (CRUD) operations on a fully managed MongoDB NoSQL database. We will be using the newly release to package the 3rd party libraries needed to integrate with MongoDB. We will use Amazon API Gateway to create, manage and secure our REST API and it’s integration with AWS Lambda. The Lambda will be used to parse the request and perform the CRUD operations. I’m using my own open sourced code and scripts, and AWS to test, package and deploy the stack. Lambda Layers Serverless Application Model Serverless Microservice CRUD REST API with MongoDB Serverless is gaining more and more traction and can complete or complement containers. In serverless there are still servers, it’s just that you don’t manage them, you pay per usage and it auto scales for you. It’s also event-driven in that you can focus more on business logic code, and less on inbound integration thanks to growing number of Lambda event sources triggers such as Alexa, S3, DynamoDB and API Gateway built and maintained by AWS. To make it more interesting, in this post I’m using a simplified version and subset of the pattern that I presented at and in you could use to implement real-time counters at scale with DynamoDB, from billions of web analytics events in Kinesis Streams as used in our in-house . Here the focus will be creating the front end for that rather than backend, i.e. a serverless CRUD REST API using API Gateway, Lambda and MongoDB. AWS re:Invent 2016 the AWS Big Data Blog post RAVEN Data Science Platform MongoDB and my History with it There are many NoSQL databases out there, such as Amazon DynamoDB which is a key-value and document store and integrates easily with the rest of the serverless stacks in AWS for which there are many blog posts, books, documentation and video courses including . As we have covered DynamoDB before lets talk about using MongoDB instead. If we look at some of the it is doing very well at number 5 overall and number 1 for document-stores databases, and has some big names using it. my own DB rankings I’ve had some history with MongoDB. When we begun our journey of deploying at , a tech-for-good crowdfunding and fundraising platform that raised over $5 billion for good causes from 26 million users, and acquired for $121M by in 2017. I chose to call the product PANDA, and the second system we built was an offline batch training and inference engine (back then embedding data science in product was extremely rare and it was more local data science on a laptop, no Apache Spark, serverless or stable docker either). Those batch scores, inference or predictions would then be inserted at regular intervals in to MongoDB, and we would serve the predictions, recommendations and suggestions via our PANDA REST API that our front end services and mobile app would call. The beauty of this architecture, I came up with, was that we decoupled data science serving from frontend products, allowing us to move at different speeds and update the backends without them noticing. machine learning in production in 2013 JustGiving Blackbaud At the time I chose MongoDB for the backend data store as it was a NoSQL database giving us flexibility to store data in JSON, retrieve it at very low latency, and could easily scale out through sharding. This was a great AI success story at JustGiving, and was the start of many subsequent case studies, keynotes, , and . We no longer use MongoDB but PANDA is still used and new exciting features and experiments being added regularly. I got inspired to write this post from reading Adnan Rahić’s but here I’m using the more up to date serverless features from Re:Invent, my current favourite language Python, and the open-source AWS Serverless Application Model ( ). books recognitions post on building a serverless API using node.js and MongoDB SAM MongoDB Setup MongoDB Atlas is what we will be using as it has a free tier with no credit card, and is a fully managed service. Setting up a New Cluster 1. In your browser, go to the 2. Choose 3. Type your , , , 4. Check 5. Choose 7. Choose for 6. Choose for 7. Choose a region with the * For those in America choose * For those in Europe choose * For those in Asia chose or 8. For leave it at 9. For leave defaults10. For type 11. Choose * You will get captcha too * This will begin creating the cluster MongoDB web interface Start free email first name last name password I agree to the terms of service. Get started free Off Global Cluster Configuration AWS Cloud Provider & Region free tier available us-east-1 eu-central-1 ap-southeast-1 ap-south-1 Cluster Tier M0 (Shared RAM, 512 MB Storage) Additional Settings Cluster Name user-visits Create cluster Configuring and Connecting to your Cluster Lets now create a database user that the Lambda function can use, allow the Lambda to access the database by whitelisting the IP range and connect to the cluster. 1. Choose the tab and 2. Choose sub-tab3. In the SCRAM Authentication * For type * For type a secure password * For choose 4. Choose sub-tab5. Choose 6. Choose * Security risk — you will see that CIDR has been added allowing any system to access the database, generally a very bad security practice but fine for a proof of concept with demo data here.7. Choose 8. Choose the tab9. Choose in the Sandbox window10. Choose under 11. Install the Python dependent packages , and or with Security MongoDB Users Add New User username lambdaReadWriteUser password User privilege Read and write to any databases IP Whitelisting Add IP Address Allow access from Anywhere 0.0.0.0/0 Confirm Overview Connect Short SRV connection string Copy the connection string compatible with your driver pymongo dnspython bson $ sudo pip install pymongo dnspython bson sudo pip install -r requirements.txt Connecting to MongoDB Locally using Python Create a Python script called and type or paste the following which stores the mongo_config.py db_username = "lambdaReadWriteUser"db_password = " "db_endpoint = "user-visits-abcde.mongodb.net/test?retryWrites=true"db_port = "27017" The is the host in the Short SRV, i.e. the part after the symbol. Here is going to be the name of the database, replace it with the desired name. db_endpoint @ test Security recommendations There are some security risks with the above, so for production deployments I recommend:* Use MongoDB AWS VPC peering Peering Connection (you will need to used payed M10+ instances)* Do not use in the IP white list, rather launch the Lambda in a VPC and restrict it to that Classless Inter-Domain Routing (CIDR) or IP Range* Do not use to store the password instead use KMS to encrypt it and store it in the Lambda environment variables. 0.0.0.0/0 mongo_config.py Lets now create the four CRUD methods. Creating the PUT Method Create and run Python script called and type or paste the following mongo_modify_items.py Here I use the repository design pattern and create a that abstracts and centralizes all the interactions with MongoDB including the connection and insertion of JSON records. When the is instantiated we create the Short SRV connection string we saw earlier using the variables in and parameters , . class MongoRepository MongoRepository mongo_config.py mongo_db table_name In the method I first shape the check the data has an then create an entity_id with and to serve as update filter similar to a primary key. The is the update action, here we overwrite the existing value for the update filter as it’s a PUT so should be idempotent — calling it many time has the same effect. Here I’m using to show you the operator but you could equally use simpler to add the JSON document to a MongoDB collection, see . insert_mongo_event_counter_json() EventId entity_id EventId EventDay {“$set”:{‘EventCount’: event_data.get(‘EventCount’, 1)}} EventCount update_one() $set insert_one() Mongo documentation When you run this you should get the following in the console {'n': 1, 'nModified': 0, 'opTime': {'ts': Timestamp(946684800, 2), 't': 1}, 'electionId': ObjectId('7fffffff0000000000000001'), 'ok': 1.0, 'operationTime': Timestamp(1546799181, 2), '$clusterTime': {'clusterTime': Timestamp(946684800, 2), 'signature': {'hash': b'~\n\xa5\xd9Ar\xa5 \x06f\xbd\x8e\x9d\xc39;\x14\x85\xb6(', 'keyId': 6642569842735972353}}, 'updatedExisting': True} Process finished with exit code 0 Notice the meaning that the update was successful and indicating that one record was added and no others modified. ’ok’: 1.0 ’n’: 1, ‘nModified’: 0 13. In your browser, go back to the 14. Choose under on the left navigation bar15. Choose tabYou should see something like this MongoDB web interface user-visits Clusters Collections Run it again and you will see that the which is the expected behavior. EventCount : 1 Creating the GET Method Let’s now add another two method called and to query_mongo_by_entityid() query_mongo_by_entityid_date() class MongoRepository def query_mongo_by_entityid(self, entity_id):results = self.event_collection.find({'EventId': entity_id})print("Query: %s found: %d document(s)" % (entity_id, results.count()))return dumps(results.sort("EventDay", pymongo.ASCENDING)) def query_mongo_by_entityid_date(self, entity_id, entity_date):entity_id = {'EventId': entity_id, 'EventDay': {"$gt": int(entity_date)}}results = self.event_collection.find(entity_id)print("Query: %s found: %d document(s)" % (entity_id, results.count()))return dumps(results.sort("EventDay", pymongo.ASCENDING)) queries MongoDB by and sorts the results by in ascending order. query_mongo_by_entityid() EventId EventDay queries MongoDB by and for greater than the specified parameter, and sorts the result by in ascending order. query_mongo_by_entityid_date() EventId EventDate EventDay Creating the POST Method Let’s now add another method called to upsert_mongo_event_counter_json() class MongoRepository def upsert_mongo_event_counter_json(self, event_data):entity_id = {'EventId': event_data.get('EventId', ''),'EventDay': int(event_data.get('EventDay', 0))}if event_data.get('EventId', '') != '':self.event_collection.update_one(entity_id,{"$inc":{"EventCount":event_data.get('EventCount',1)}},upsert=True).raw_resultelse:print("No EventId, skipping record") and call it in the main with def main():print(mongo_repo.upsert_mongo_event_counter_json(event_sample)) you will now see that the run it again and you will see that the will increase by . This is because the increments the by the value in the dict if it is specficied otherwise it defaults to . EventCount : 2 EventCount 1 $inc EventCount EventCount event_sample 1 Creating the DELETE Method Finally we need a method to delete the data too. def delete_mongo_event_counter_json(self, entity_id):return dumps(self.event_collection.delete_many({'EventId': entity_id}).deleted_count) Here we chose to delete the records that match the given parameter. entity_id We now know how to connect locally and update records in MongoDB, lets now create the Lambda function with the same Mongo CRUD code as well as additional code needed for parsing the request, forming the response, and controlling the execution flow based on the HTTP Method. Before that let’s create the Role and policies needed by the Lambda. Creating the Serverless Configuration I assume that you have AWS CLI installed and with the keys, and if you have to be on Windows you are running a . You also have Python 3.6+ setup. Linux Bash shell Creating Environment Variables First I create config file called for storing all the environment variables. I like to do this is later on I can configure these in a CI/CD tool and it makes it easier to port to different AWS accounts. common-variables.sh Here I determine the AWS Account ID using the CLI, but you can also hardcode it as shown in comments, you can do the same with the region. You will also see that I have some Layers variables we will be using shorty to create the layer with the MongoDB packages that the Lambda will need for CRUD access to MongoDB. Creating a Lambda Execution Role First lets create a Lambda IAM Role than will allow the Lambda to write to CloudWatch. Create a file called with the following JSON content assume-role-lambda.json Then we create a shell script called with the following create-role.sh Here we create a new Role called with the AWS Managed Policy attached. This will allow us to store the logs in CloudWatch for any debugging and monitoring. lambda-mongo-data-api AWSLambdaBasicExecutionRole Run the shell script with to create the Lambda execution role and attach the policy. ./create-role.sh Creating the SAM Template AWS is a framework that allows you to build serverless applications on AWS, that includes creating IAM Roles, API Gateway and Lambda resources. I prefer SAM over other popular frameworks like which is the frontrunner, as SAM is supported by AWS and not based on node.js but Python. It uses SAM Templates to define the serverless applications and uses AWS CLI to build and deploy it, which is based on CloudFormation. Serverless Application Model (SAM) Serverless Create a file called with the following lambda-mongo-data-api.yaml From top to bottom, I first define the type of SAM template with a description, then we define a set of parameters that are passed in at deploy time from . If the values are no passed in then they fall back to a default value:* default to python3.6* to the default value specified as * the region* the name we will give to the layer* as each layer has a version each time you publish a new versionEach of these parameters is a placeholder that is replaced in the YAML with the CloudFormation function. common-variables.sh PythonVersion AccountId 000000000000 AWSRegion LayerName LayerVersion [!Sub](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-sub.html) We then have the CORS settings, Lambda configuration handler, code, name and other settings. Then use the IAM Role we just created. I prefer to create the Roles separate from SAM as they can be reused more easily as they will not get deleted with the serverless stack. I also have an environment variable called . I’m then listing the full methods explicitly for security reasons, but you could shorten it to . demo GET POST PUT DELETE Method: ANY Creating the Lambda Function Next lets create the Lambda function called with the following: lambda_crud_mongo_records.py There are three classes here from top to bottom. * The is a set of utility methods for parsing the query string, body and returning a response.* The we talked about earlier which abstracts and centralizes all the MongoDB interactions.* The controls the main flow and calls different Mongo methods based on the method, request parameters, and request body. Here the methods are , , and . For the method there are two types of queries we can do on Mongo depending if a is provided or not. class HttpUtils class MongoRepository class Controller GET DELETE POST PUT GET startDate I’ve coded most of it defensively, so things like dealing with invalid JSON and non-numbers in the request, will not bring down the Lambda. Creating the Packages Once deployed to AWS, the Lambda will need the Python dependent packages , and .These need to be packaged and deployed for it to work. There are two ways this can be done. The first is using a to create the dependent packages and add the Lambda code, and compress them all together as a Zip package. The second and newer way, is the use Lambda Layers which is to have one package with the dependent packages as a Zip, and another one for the Lambda source code itself. pymongo dnspython bson virtualenv Create a Layer with MongoDB Depend Packages What are Layers? They were introduced at RE:Invent 2018 and one of the most useful features for Lambda package and dependency management. In the past any non standard Python library or boto3 packages, had to be packaged together with the Lambda code. This made the package larger than it needed to be and also prevented reuse between Lambdas. With Layers any code can be packaged and deployed separately from the Lambda function code itself. This allows a layer to be reused for different Lambdas and for a smaller Lambda package with your code. Each Lambda function supports up to 5 Layers allowing you to abstract 3rd party dependency packages but also your own organization’s code. Here lets create a package with the three dependent Python packages we need for the Lambda to connect and perform CRUD operations on MongoDB. Create a file called with the following lambda-requirements.txt bson>=0.5.6dnspython>=1.15.0pymongo>=3.6.1 this contains the dependent packages that will be packaged in the Layer. I’ve added this as I don’t want to include other testing packages that are not needed which are in the . requirements.txt Create a script called with the following create-dependent-layer.sh All the environment variables are created from the `common-variables.sh` shell script we ran above. This script first creates a virtualenv and installs the packages we specified in the using several to minimize the amount of code and time it takes to install the dependent packages we are after. lambda-requirements.txt pip3 options The script then copies the dependent packages installed the a virtualenv into a directory which here is folder. Here I list the package folders explicitly as I found that sometimes the packages you pip install have different folder names so get missed out. For example the package is actually stored under the folder and not like you would expect. I iterate over a list of packages and use to copy the files and directories. Note that for the Layer to work it need to follow a directory convention as shown here: ${packages_path}/${python_version}/site-packages/ ../../build/python3.6/site-packages/ dnspython dns dnspython ${packages_array[@]} rsync Lambda Layers Folder Structure and Build The convention for the folder structure is . Python3.6 python/lib/python3.6/site-packages/ We then create the Zip archive with packages. Here the is used to strip off the leading prefix as we only want to go up one level, and create the Zip archive under the folder. This create a archive with the 3rd party packages in the correct folder structure . ${target_package_folder#../} ../ package mongodb-layer.zip python/lib/python3.6/site-packages/ We then copy the Zip archive to S3 using so that it can be used added as a Layer. aws s3 cp Finally we run to publish this as a layer, again using the environment variables we created in the . aws lambda publish-layer-version common-variables.sh Run the script to create the ./create-dependent-layer.sh mongodb-layer.zip You will notice that each time you run this script, it will create a new Layer Version. The current version that the Lambda will use is specified in the . common-variables.sh There are other ways to create the packages used in a Layer such as using a docker container or EC2, but essentially the process is similar also using and . At moment of writing AWS only has a available, but I expect more to be added in the future. virtualenv pip SciPy Now that we have the Layer we need to build the Lambda archive. Create a Package with Lambda We then Zip the Lambda and packages using create-lambda-package.sh Here we create a Zip archive of the two python scripts which has the Lambda code and which has the MongoDB credentials. lambda_crud_mongo_records.py mongo_config.py Run the script to check that the Zip archive get created. ./create-lambda-package.sh We now have a Zip file with both the Lambda as and it’s 3rd party dependent packages as reusable Layer that has already been deployed. This layer can be used by any other Lambda in the account! Lets now look at how we can deploy API Gateway and the Lambda function using SAM. lambda-mongo-data-api.zip mongodb-layer.zip Alternatively you can have SAM create the Zip file, but I prefer to control this process as for example a CI/CD step could create the Zip as an artifact that could be rolled back, and you could also introduce further optimizations to reduce the size or use byte code for example. Building and Deploying the Serverless Microservice Now that we have the Zip packages, either as one fat Zip with the Lambda and it’s 3rd party packages (as it would have been done in 2018 prior to AWS RE:Invent), or one Lambda Zip archive and one Zip for the Layers that we already deployed, lets deploy the full stack. Building and Deploying the Lambda and packages as one Zip Here are the contents of shell script build-package-deploy-lambda-mongo-data-api.sh * packages the artifacts that the AWS SAM template references, creates a `lambda-mongo-data-api-output.yaml` template and uploads them to S3.* deploys the specified AWS SAM / CloudFormation template by creating and then executing a change set. Here that is the API Gateway and Lambda function. You can see that I can passing in some parameters that we saw earlier in the SAM YAML Template: , , , which we specified in the . aws cloudformation package aws cloudformation deploy AccountId LayerName LayerVersion PythonVersion common-variables.sh Now we just need to run the script to create the Lambda Zip, package, and deploy it along with the API Gateway. $ ./build-package-deploy-lambda-mongo-data-api.sh Testing the Deployed API Now that you understand how to deploy the serverless stack and you could test the API Gateway and Lambda in the AWS Management Console or do automated testing on the API or Lambda, but here let’s focus on using an API testing tool called Postman to manually test it is behaving as expected. For the GET or DELETE methods we could use the browser, but we need a tool like or because to test the PUT and POST methods we need to provide a JSON Body in the request. Postman insomnia 1. Sign in to the AWS Management Console and open the 2. Choose under in the Amazon API Gateway navigation pane3. Select under to get the invoke URL * The invoke URL should look like ` * We will use the invoke URL next4. Download and install 5. Launch Postman6. Choose from Building Blocks or from the menu7. In new Save Request Window * In Type * In Type * Choose and select it in the list * Choose API Gateway console Stages APIs/apigateway-dynamo PUT Prod/visits/PUT https://{restapi_id}.execute-api.{region}.amazonaws.com/Prod/visits` Postman Request New > Request request name put-api-test Select a collection or folder to save to: api-test Create Collection “api-test” Save to api-test Testing the Deployed API PUT Method 1. Open a New Postman Tab2. Choose PUT from the methods dropdown3. In type your deployed PUT URL, e.g. ` * Choose under 4. Choose tab5. In the row under body select from the radio buttons and to it’s right6. Type the following Enter Request URL https://vjp3e7nvnh.execute-api.eu-west-1.amazonaws.com/Prod/visits` PUT APIs > user-comments > Stages Body raw JSON (application/json) {"EventId": "2011","EventDay": "20171013","EventCount": 2} 7. Choose 8. Check the response body, if it is then it has been added otherwise you will get an exception message, check the URL, JSON body and method is correct.9. Choose again twice, this should have no effect lets now look at the GET response. Send {“n”: 1, […] Send Testing the Deployed API GET Method 1. Open the same tab in Postman2. Change the method to 3. Append to the URL4. Choose GET /2011 Send You should get the following response body: [{"_id": {"$oid": "5c3281a1b816a500d6a85afc"},"EventDay": "20171013","EventId": "2011","EventCount": 2}] With the PUT method, the value remains constant no matter how many times you call it, what is known as an idempotent operation. EventCount Testing the Deployed API POST Method Now let’s test the POST method that increments a counter by the specified value each time it is called, i.e. not idempotent operation. This could be useful for implementing counters such as real-time page views or scores. 1. Open the same tab in Postman2. Change the method to 3. Remove from the URL * Like the original PUT URL4. Choose tab5. In the row under body select from the radio buttons and to it’s right6. Type the following: POST /2011 Body raw JSON (application/json) {"EventId": "2011","EventDay": "20171013","EventCount": 2} 7. Choose 8. Choose on the left and Send GET Send [{"_id": {"$oid": "5c3282c6b816a500d6a88210"},"EventDay": 20171013,"EventId": "2011","EventCount": 4}] Run it several times and you will see increment, you can also increment it by less or more than if you modify the request JSON body value. EventCount 2 EventCount Testing the Deployed API DELETE Method 1. Open the same tab in Postman2. Change the method to 3. Append to the URL (like the GET)4. Choose DELETE /2011 Send You can also check MongoDB Console and you will see that there are no records 1. Open the 2. In the navigation Select 3. In the tab select 4. In select 5. Under Namespace choose * Mongo will run a query6. You should get Query Results 0 MongoDB Console Clusters Overview user-visits user-visits Collections Dev > user-vists Cleaning up As we have used SAM to deploy the serverless API, to delete it simply run here are the contents of the shell script: ./delete-stack.sh The deletes API Gateway and Lambda. The Layers are deleted one at a time with the for loop. Here the is fixed from the environment variables declared in but could easily be made dynamic by finding the current layer version. aws cloudformation delete-stack ${layer_version} common-variables.sh 1. Open the 2. In the navigation Select 3. In the tab select 4. In select 5. Next to select Icon6. In the window type `user-visits` MongoDB Console Clusters Overview user-visits user-visits Collections Dev > user-vists Delete Drop Collection Final Remarks Well done you have deployed a serverless microservice with a full CRUD RESTful API with a MongoDB NoSQL backend. We have used the newly released Layers to package the MongoDB dependencies and tested it using Postman. I will be adding the on GitHub shortly. full source code If you want to find out more on Serverless microservices and have more complex use cases, please have a look at my video courses and so support me in writing more free technical blog posts. Additional implemented Serverless pattern architecture, source code, shell scripts, config and walkthroughs are provided with my video courses. For beginners and intermediates, the code, configuration and a detailed walk through full Serverless Data API Building a Scalable Serverless Microservice REST Data API Video Course For intermediate or advanced users, I cover the implementation of with original content, code, configuration and detailed walk through. 15+ serverless microservice patterns Implementing Serverless Microservices Architecture Patterns Video Course Feel free to connect and message with me on or . LinkedIn Medium Twitter