I am getting issues when logging to cloudwatch where it appears intermittent. Not all errors are logging to Cloudwatch and I am getting Read-only file system Exceptions in the default lambda log group.
The architecture of the system involves a lambda (dotnet core 2.1) being triggered by and receiving a message from a SQS queue. The lambda then makes a http requests to a 3rd party and uploads the response to S3. There are 1000+ messages on the queue at a time and I have set the lambdas reserved concurrency to 100 to deal with this which processes 1 record at a time, which all appears fine.
I am using Nlog to format and filter logging in the lambda, the following config shows that I only have AWSTargets and no file system targets. I have added async wrappers as an attempt to stop these issues but still no luck.
<?xml version="1.0" encoding="utf-8" ?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
throwConfigExceptions="true">
<extensions>
<add assembly="NLog.AWS.Logger" />
</extensions>
<targets>
<target name="CloudwatchTraceTargetAsync" xsi:type="AsyncWrapper" overflowAction="Block">
<target name="CloudwatchTraceTarget" type="AWSTarget" logGroup="/aws/lambda/adx-dev-batch-processor-consumer-trace" region="eu-west-2" layout="${longdate}|${uppercase:${level}}|${logger}|${message} ${exception:format=tostring}"/>
</target>
<target name="CloudwatchErrorTargetAsync" xsi:type="AsyncWrapper" overflowAction="Block">
<target name="CloudwatchErrorTarget" type="AWSTarget" logGroup="/aws/lambda/adx-dev-batch-processor-consumer-error" region="eu-west-2" layout="${longdate}|${uppercase:${level}}|${logger}|${message} ${exception:format=tostring}"/>
</target>
</targets>
<rules>
<logger name="*" minlevel="Trace" maxlevel="Trace" writeTo="CloudwatchTraceTargetAsync" enabled="false" />
<logger name="*" minlevel="Error" maxlevel="Error" writeTo="CloudwatchErrorTargetAsync" enabled="true" />
</rules>
</nlog>
The entrypoint to the lambda is as follows. I am constructor injecting loggers into each of my services and loggings as normal.
public class QueueConsumerLambda
{
private const string AspnetCoreEnvironmentVariable = "ASPNETCORE_ENVIRONMENT";
private IServiceProvider _serviceProvider;
[LambdaSerializer(typeof(Amazon.Lambda.Serialization.Json.JsonSerializer))]
public async Task ProcessQueueMessage(SQSEvent sqsEvent, ILambdaContext context)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
var environmentName = Environment.GetEnvironmentVariable(AspnetCoreEnvironmentVariable);
LogManager.Configuration = new XmlLoggingConfiguration($"nlog.Consumer.{environmentName}.config");
var logger = LogManager.GetCurrentClassLogger();
try
{
logger.Trace($"QueueConsumer.ProcessQueueMessage - started with {sqsEvent.Records.Count} records.");
var config = new ConfigurationBuilder()
.SetBasePath(Environment.CurrentDirectory)
.AddJsonFile($"appsettings.{environmentName}.json", false, false)
.AddEnvironmentVariables()
.Build();
_serviceProvider = ConfigureServices(config);
var dataRetrievalController = _serviceProvider.GetRequiredService<QueueConsumerController>();
foreach (var record in sqsEvent.Records)
{
logger.Trace($"QueueConsumer.ProcessQueueMessage - messageId: {record.MessageId}, eventSource: {record.EventSource}, record.Body: {record.Body}");
var messageBody = JsonConvert.DeserializeObject<SqsMessageModel>(record.Body);
await dataRetrievalController.GetData(messageBody, record.ReceiptHandle);
}
stopwatch.Stop();
logger.Trace($"QueueConsumer.ProcessQueueMessage - finished and it took {stopwatch.Elapsed.Seconds} seconds.");
}
catch (Exception ex)
{
logger.Error(ex, "Stopped program because of exception");
}
finally
{
LogManager.Shutdown();
}
}
private static IServiceProvider ConfigureServices(IConfiguration config)
{
var services = new ServiceCollection()
.AddLogging(loggingBuilder =>
{
// configure Logging with NLog
loggingBuilder.ClearProviders();
loggingBuilder.AddConfiguration(config.GetSection("Logging"));
loggingBuilder.AddNLog(config);
})
// registering more services
}
}
I am successfully getting logs into the separate error log group but not all of them. Some errors are not getting logged and I know this because the expected output file from the http request is not in S3 and when I debug locally I am hitting the expected exceptions. My only guess as to what is happening is the Read-only file system error that is being logged to the default log group is a symptom. It is also worth noting that not always the same error is missing, if I process the same requests the next day a different number of errors are not being logged.
I am confused, not sure if I have my logging set up correctly in the context of a lambda function or if I am hitting cloudwatch limits or any other reason. So any insight is appreciated. Let me know if more information is required