AWS Lambda plus Layers is one of the best solutions for managing a data pipeline and for implementing a serverless architecture. This post shows how to build a simple data pipeline using AWS Lambda Functions, S3 and DynamoDB. What this pipeline accomplishes? Every day an external datasource exports data to S3 and imports to AWS DynamoDB table. Prerequisites Serverless framework Python3.6 Pandas docker How this pipeline works On a daily basis, an external data source exports data of the pervious day in format to an S3 bucket. S3 event triggers an AWS Lambda Functions that do process and save the data to . csv ETL DynamoDB Install Serverless Framework Before getting started, Install the . Open up a terminal and type npm install -g serverless to install Serverless framework. Serverless Framework Create a new service Create a new service using the AWS Python template, specifying a unique name and an optional path. $ serverless create --template aws-python --path data-pipline Install Serverless Plugin Then can run the following command project root directory to install Plugin, serverless-python-requirements $ serverless plugin install -n serverless-python-requirements Edit the serverless.yml file to look like the following: plugins:
  - serverless-python-requirements
custom:
  pythonRequirements:
    dockerizePip: non-linux
    layer: true  #Put dependencies into a Lambda Layer. You need to have Docker installed to be able to set dockerizePip : true or dockerizePip : non-linux . Add S3 event definition This will create a bucket which fires the function when an csv file is added inside the bucket. dev.document.files importCSVToDB functions:
  importCSVToDB:
    handler: handler.importCSVToDB
    layers:
      - {Ref: PythonRequirementsLambdaLayer}
    environment:
      documentsTable: ${self:custom.documentsTableName}
      bucketName: ${self:custom.s3bucketName}
    events:
      - s3:
          bucket: ${self:custom.s3bucketName}
          event: s3:ObjectCreated:Put
          rules:
            - suffix: .csv Full sample serverless.yml as below: <a href="https://medium.com/media/1302415b19cd3cb5e75d15241f3a995c/href">https://medium.com/media/1302415b19cd3cb5e75d15241f3a995c/href</a> Add Lambda function Now, let’s update our to create the pandas dataframe from the source csv in S3 bucket, convert dataframe to list of dictionaries and load the dict object to DynamoDB table using method: handler.py update_item <a href="https://medium.com/media/039571ee7f34f2fc3027fdaebb7e7680/href">https://medium.com/media/039571ee7f34f2fc3027fdaebb7e7680/href</a> As you can see from above lambda function, We use Pandas to read csv file, Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas data type for storing tabular 2D data. Let’s deploy the service and test it out! $ sls deploy --stage dev To test the data import, We can manually upload an csv file to s3 bucket or using AWS cli to copy a local file to s3 bucket: $ aws s3 cp sample.csv s3://dev.document.files And there it is. You will get data imported into the DynamoDB table. DocumentsTable You can find complete project in my : GitHub repo yai333/pythonserverlesssample AWS Data Pipeline Alternatively, You can use AWS Data Pipeline to import csv file into dynamoDB table AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. You define the parameters of your data transformations and AWS Data Pipeline enforces the logic that you’ve set up. What is AWS Data Pipeline? - AWS Data Pipeline Learn more Building a Fully Serverless Realtime CMS using AWS Appsync and Aurora Serverless How To Add NodeJs Library Dependencies in a AWS Lambda Layer With Serverless Framework End to End Testing React apps with Selenium Python WebDriver Chrome PyTest and CircleCI Running Selenium and Headless Chrome on AWS Lambda Layers

Building a Serverless Data Pipeline with AWS S3 Lamba and DynamoDB

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

An easy way to manage Serverless project resources by using AWS Resource Groups

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

An easy way to manage Serverless project resources by using AWS Resource Groups

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps