In this article, we’ll be discussing everything you need to know about the basics of AWS Lambda error handling and some popular methods using StepFunctions and X-Ray. Regardless if you’re an AWS Lambda expert or if you’re a new Lambda user, . You’ve probably already encountered Lambda errors that may seem pretty challenging since the mechanism that runs Lambda retries will often make it incredibly difficult to follow up on changes that occur within your serverless application. there’s always something new to learn Serverless is not all about straightforward execution of code on Lambda function, but it’s a . Distributed nodes within this architecture that are activated thanks to asynchronous events are what makes this system. different type of architecture of your entire system Every node has to be designed like a singular part with its own API. To learn how to define all these nodes accurately, you have to know . In addition, it’s necessary to deal appropriately with Lambda retry behavior as well. how to handle Lambda errors Therefore, let’s jump right into , as well as what’s the whole buzz around it all. how do the AWS Lambda retries and errors work Lambda Retry Behavior Lambda functions can fail ( ), and when they do, it’s because of one of these situations: they will — if you’re out of memory, Lambda often terminates with the message The is always the same as You can learn more about resource allocation and AWS Lambda memory here. Lack of memory ‘Process exited before completing the request.’ ‘Memory Size’ ‘Max Memory Used.’ — can happen because of a programming bug, failure of an external API, or if you’ve received an invalid input. Raised unhandled exception — message appears when Lambda closes violently because it ran longer than the pre-configured timeout duration. The maximal value is five minutes, while the default value is set at six seconds. Timeout ‘Task timed out after X seconds’ When failure occurs, and it will occur at some point, you’ll most likely notice : Lambda retries based on these behaviors – if the current events are solely DynamoDB streams and AWS Kinesis Data Streams. When it happens, AWS triggers these failing Lambda functions again until they’re processed successfully or until the data expires, and AWS will block the event’s source until it happens. Stream-based events – In event sources like synchronous invocation or API Gateway using the SDK, the invoked app is responsible for creating response-based retries that it gets from . This case scenario is the least interesting one since it resembles monolithic error handling. Synchronous events Lambda – Lambda invocation happens asynchronously for most event sources. This means that no app will respond to failure. Therefore, the . It’ll trigger the Lambda again with precisely the same event, which happens mostly twice within the upcoming ~3 minutes (in some sporadic cases, it takes up even to six hours, while a different number of continuous retries can happen). Asynchronous events AWS framework will take care of it on its own In case all retries fail, it’s mandatory to record this event instead of throwing it away. That’s why the crucial Dead Letter Queue (DLQ) feature enables the configuration of DLQ via AWS SQS that’ll receive this type of event.Consequences of AWS Lambda Retry Behavior Every Lambda can be , while the didn’t know about it happening. To successfully perform the same operation several times, Lambda has to be ‘idempotent.’ executed multiple times with the same input “caller” means that no added effects will take place when it’s run by the same input more than one time. Idempotency It’s worth mentioning that . A standard model is the API network: when a request doesn’t receive a response, the same request will be sent repeatedly. serverless functions aren’t the only example when it comes to idempotency terms For example, in Serverless architectures, it can come to a similar case when Lambda gets a timeout before receiving a response. Even though it’s a highly unexpected thing to happen, in some cases, a wrong retry handling can be the cause of severe problems (a database (DB) structural violation). What is AWS Lambda Idempotency? The definition of idempotency states that it’s the property of specific operations in computer science and mathematics. It’s applicable several times without disrupting the result beyond the first application. However, it’s still somewhat confusing. What happens if you wish to execute the same operation several times and it’s not actually a retry? Let’s say that Lambda received a user operation log as input, and it’s solely responsible for recording that operation log into a database. In this example, we’ll need to make a difference between Lambda’s trigger input and a retry case since they’re the same because the user has initiated the same operation again. Only when there’s a Lambda retry, you’ll obtain the same ID. To be able to extract it, utilize within Node.js or the appropriate field in any other language. What this method does is it Referring to the Lambda’s request ID as a part of the input itself is the right solution. context.awsRequestId provides the general approach that’ll search for retry executions. It’s not always a convenient solution to utilize the request ID for being a genuine idempotent. You might have noticed within the previous example that this ID should’ve also been saved in the DB. That way, the following invocations would realize whether there’s a need to add a new record. There’s one more solution, and it’s . However, it might add quite a significant overhead. using some in-memory data store Using AWS Step Functions to Build a Control System for AWS Lambda Error Handling AWS Lambda error handling can be done in different ways, like utilizing wrappers. On the other hand, have proved to be for building a serverless application that’ll deal with retries and errors appropriately, making Step Functions an effective solution. You can learn more about AWS Step Functions in our Ultimate Guide to Step Functions. AWS Step Functions incredibly beneficial Take the Next Step Let’s say that the . By combining them all with the same Lambda, the . application has to perform multiple operations in response to an event code will have to check for every operation separately If you’re trying to keep your Lambda idempotent, should it be redone? Remember that this can cause severe headaches. It would help if you learned the difference between monolithic applications and the Step Function example we’ve mentioned. In a monolithic application, the because it’s capable of waiting between them, and that’s something that . application itself can become responsible for forcing retries isn’t possible in Serverless However, with Step Functions, you’re able to run every operation on separate Lambda. In addition, you’re able to define suitable transitions between them for each specific case. You can also control the retries’ behavior – their delay duration as well as their number. That way, you’ll quickly adjust it to be the most suitable one for your particular case. You’ll even disable it when you believe that’s the right step to take. Even if needed by a single Lambda, creating a step machine is possibly the most straightforward solution for disabling unwanted retry behavior. How to Implement Step Functions to Lambda? As you probably know, all available ; the only triggers that are available are API Gateway, including a manual execution utilizing SDK. Step Function triggers are quite limited To successfully deploy this Lambda, you have to , along with the incredible plugin so it could easily . Additionally, you have to ensure that you use and so you’d be capable of defining the state machine like it’s shown in the below example: utilize the Serverless framework ‘serverless-resources-env’ pass the state machine ARN ‘serverless-pseudo-parameters’ ‘serverless-step-functions’ You can see that the artificial choice of implementing an SNS event is made , and it’s accessible as input by the initial step Lambda. Everything will become idempotent since we named the state machine’s execution as the invoker In case a retry happens to this invoker Lambda, purposely to trigger the state machine deliberately ‘Lambda request ID.’ AWS will give it the same request ID. After that, AWS won’t be executing the state machine again because it has the same name. In theory, the state machine’s execution name is a part of its input as well. Even though this solution is advantageous in numerous case scenarios, you should know that it’ll also add a significant complexity overhead, further affecting the system’s overall observability and debugging. Things to Note Regarding Step Functions Error Handling Mechanism It’s essential to comprehend that the Step Function’s error handling mechanism is quite different than the AWS Lambda error handling mechanism. For each Task state, a timeout duration placeholder can be set, and in case the Task isn’t completed in time, error will be generated. This particular timeout is unlimited in a way. anStates.Timeout Also, in a typical case of a Task that executes a Lambda, the case won’t be the same. Lambda’s actual timeout duration can be determined solely by its pre-configured value, and it can’t get any longer by utilizing this method. Therefore, it’s essential to ensure that you’ve configured the Task timeout so it’d be equal to the timeout of Lambda. The Task’s retry behavior is disabled by default, and it can be configured in a certain way. AWS Lambda Error Processor Sample Application The Error Processor sample application shows the utilization of AWS Lambda to handle events that are coming from the AWS CloudWatch Logs subscription. Now, function if a . The subscription within this application will monitor a function’s log group for all entries with the word . In response, it’ll The processor function will then retrieve the full log-stream and trace data for the request that have caused this error, and it’ll store them so it could use them later. CloudWatch Logs will allow you to invoke a Lambda log entry matches a particular pattern ERROR invoke a processor Lambda function. Function code can be found in these files: Processor – processor/index.jsRandom error – random-error/index.js You can quickly deploy the sample within a few minutes via AWS CloudFormation and AWS CLI. Event Structure and Architecture This sample application utilizes these AWS services: – which will store application output and deployment artifacts. Amazon S3 – collect logs, but when a log entry matches a filter pattern, it’ll also invoke a function. Amazon CloudWatch Logs – Generates a service map, indexes traces needed for search, and collects trace data. AWS X-Ray – Sends all the trace data to the X-Ray, sends logs to CloudWatch Logs, and runs a function code. AWS Lambda A Lambda function will when found within the application. If CloudWatch Logs detect the word within the function’s logs, it’ll provide the processor function with an event for processing. generate errors randomly ERROR CloudWatch Logs message event The data has details about the log event when it’s decoded. The function will use all these details to successfully and so it would obtain the ID of the particular request that has caused this error. identify the log stream parses the log message Decoded CloudWatch Logs event data The processor function will utilize information obtained from the CloudWatch Logs event to download the X-Ray trace and the full log stream for a request Both will be Moreover, to successfully allow the trace time and log stream to finalize, the function will wait for a short period before it starts accessing the data. that has caused an error. stored within the AWS S3 bucket. AWS X-Ray Instrumentation The application utilizes AWS X-Ray to t and all the X-Ray utilizes the received trace data from functions to create a that is of significant help for This particular service map showcases the random error function that generates errors for some specific requests. Additionally, it showcases the processor function that calls CloudWatch Logs, Amazon S3, and X-Ray. race function invocations calls that functions make to AWS. service map error identification. These two configured Node.js functions serve within the template and are instrumented with the AWS X-Ray SDK (Node.js) in code. Along with active tracing, Lambda tags will add a to all incoming requests, and they’ll send a trace with to AWS X-Ray. Moreover, the random error function utilizes X-Ray SDK to record the request ID and the user information within annotations. These annotations are attached to the trace, so active tracing tracing header timing details you could use them to locate the specific request’s trace. The processor function will obtain the request ID from the CloudWatch Logs event, and it’ll utilize the AWS SDK for JavaScript to search X-Ray for that particular request. It also utilizes , which are instrumented by the X-Ray SDK to download the log stream and the trace. After that, it’ll store them in the output bucket. The X-Ray SDK will record all these calls, and they’ll appear within the trace as subsegments. AWS SDK clients AWS CloudFormation Template and Additional Resources The application is implemented within the two Node.js modules, and it’s deployed with shell scripts and AWS CloudFormation template. This template will create the , , and all the following : processor function the random error function supporting resources – is an IAM role that allows function with permission to access other AWS services. – is an additional function whose objective is to invoke the random error function to create a specific log group. – is another AWS CloudFormation custom resource that’ll invoke the primer function during deployment, so it would make sure that this particular log group exists. – is a subscription for the log stream triggering the processor function when word is successfully logged. – is a specific permission statement on the processor function which allows invocation via CloudWatch Logs. – is an output storage location (processor function). Execution role Primer function Custom resource CloudWatch Logs subscription ERROR Resource-based policy Amazon S3 bucket If you’re trying to with CloudFormation successfully, the template will come up with an that’ll . All Lambda functions will always come with a specific CloudWatch log group that’ll . Additionally, this log group won’t be created until the function has been invoked for the first time. work around the limitations of integrating Lambda additional function run during deployments store the output from all function executions Creating a subscription that depends on the log group’s sole existence, the application needs to utilize a 3rd Lambda function to invoke the random error function. This template also includes the primer function inline code. Each AWS CloudFormation custom resource is capable of invoking it during its deployment. properties will ensure that the and the are DependsOn resource-based policy log stream created before the subscription. How Dashbird can help handle your AWS Lambda Errors faster and easier fundamentally change how we develop, deploy, and monitor applications. As you now know, services such as AWS Lambda also come with their own limits and idiosyncrasies: limited memory and execution time, retry-behavior, and many others may create side-effects that can easily become . Serverless architectures monitoring nightmares Composing multiple services for compute, data storage, queues, etc. magnifies the problem. The number of potential issues is multiplied by the interactions and dependencies throughout the cloud stack. Running such architectures at scale is even more challenging. At each level of traffic, we cannot expect the stack will behave homogeneously. Perhaps AWS Lambda functions will scale faster than a database, for example. Dashbird is designed to while achieving a high degree of visibility and quality in any serverless architecture. provide developers with ways to easily navigate such complex problems to enhance monitoring and operating specifically for AWS services at scale. By , Dashbird all your code exceptions, timeouts, configuration errors, and other anomalies , and sends you a immediately if there’s an error or something is about to break. Dashbird was created by serverless developers, for serverless developers continuously collecting and filtering your log data automatically detects in real-time notification Wrapping up AWS Lambda error handling in serverless architecture may seem pretty confusing, but as much as it’s hard to comprehend how it can affect your entire system, it’s vital to thoroughly understand it. It’s important that you know how to , and the . Every retry counter field within the context parameter is undoubtedly a feature that’s been missing. manage AWS Lambda retries behavior successfully same goes for Step Functions Besides the techniques mentioned in this article, there are various other methods that will help with AWS Lambda error handling, and utilizing wrappers is only one example. The architecture with Step Functions that we’ve discussed today is quite useful in many cases, and AWS Lambda error handling is one of them. Even though it helps control Lambda retries appropriately, it also encourages the , which is an excellent practice within the world of Serverless. separation of elements Also published at: https://dashbird.io/blog/aws-lambda-error-handling-step-functions/
Share Your Thoughts