Text To PDF conversion using AWS Lambda and S3 using java

Text To PDF conversion using AWS Lambda and S3 using java

Prerequisites -:

  • AWS Account

  • Medium Knowlege of java and maven

  • IDE (I am using VS Code)

  • Basic understanding of AWS

Flow Diagram -:

Before we go in coding, first We will create s3 bucket and lambda function once that done, we will go for java code.

Steps -:

  1. Create the S3 bucket.

  2. Create the lambda function

  3. Setup the trigger point in lambda function.

  4. Add the “S3BucketFullAccess” policy in your lambda Function.

  5. Write a java program and create jar file to upload in lambda function.

  6. Test (Upload the file and wait for conversion to pdf).

Step 1 -: How to create S3 bucket?

  1. Please log in to your AWS account and search for S3 bucket and open it.

  1. Click on “Create Bucket”. There we will give bucket name that should be unique after we will leave other setting as default and will scroll down and click on create bucket button.

Once bucket created you can see under “General purpose buckets”. Now we are done creation with creation of bucket next will create lambda function.

Step 2 -: How to create lambda function?

  1. Search for “Lambda Function”.

  2. Click on “Create Function”. Please give the function and then select the runtime as “Java 17” by default it will “Node 22.7” and will leave other setting as default will scroll down and click on “Create Function”.

    Note -: This can be changed in future.

  3. Once function created you can see under “Function”. after we will create the trigger

Step 3 -: How to setup trigger in lambda function?

what is use of trigger -:

Trigger helps us run the Lambda function whenever a new .txt file is uploaded to the S3 bucket. After converting it to a PDF, it will be pushed back to the S3 bucket.

  1. Select the function you created. and then click on “Add Trigger”.

  2. From dropdown search for “S3” and select that.

  1. Select your bucket for that you want to set the trigger and “Event Type” you can leave as default.

    Now we will give “suffix”. that will tell for which file upload we need to trigger the lambda function here in our case we will give “.txt” and click on “Add” button.

    Note -: Please do not leave suffix as empty as we selected even type as “All Object create events” due It can create infinite loop.

    once added you can see in your function.

Step 4 -: How to add policy in lambda function role?

Whenever you create a Lambda function, a role is automatically created in IAM. This role will start with the name of your Lambda function.

Now Select that “Role Name”. then click on “Manage permissions” and then select “add policies”.

Now search s3 and select the “AmazonS3FullAccess” and scroll down and select the “add permission”.

Once Policy added you can see under permissions.

Step 5 -: Write a Java program to convert txt file in pdf file.

  1. Please create one spring boot project with “web starter” dependency.

  2. Open your “pom.xml” file and add below dependency under “dependencies” that required to connect with s3 bucket, lambda function and will use itext7 for txt to pdf conversion.

            <dependency>
                 <groupId>software.amazon.awssdk</groupId>
                 <artifactId>s3</artifactId>
                 <version>2.20.0</version>
             </dependency>
             <dependency>
                 <groupId>com.amazonaws</groupId>
                 <artifactId>aws-lambda-java-core</artifactId>
                 <version>1.2.2</version>
             </dependency>
             <dependency>
                 <groupId>com.amazonaws</groupId>
                 <artifactId>aws-lambda-java-events</artifactId>
                 <version>3.10.0</version>
             </dependency>
             <dependency>
                 <groupId>com.itextpdf</groupId>
                 <artifactId>itext7-core</artifactId>
                 <version>9.0.0</version>
                 <type>pom</type>
             </dependency>
    
  3. Now add below plug in to create jar file with dependency and class.

                 <plugin>
                     <groupId>org.apache.maven.plugins</groupId>
                     <artifactId>maven-shade-plugin</artifactId>
                     <version>3.2.4</version>
                     <executions>
                         <execution>
                             <phase>package</phase>
                             <goals>
                                 <goal>shade</goal>
                             </goals>
                             <configuration>
                                 <createDependencyReducedPom>false</createDependencyReducedPom>
                             </configuration>
                         </execution>
                     </executions>
                 </plugin>
    
  4. Now we implement “RequestHandler” that is provided by lambda function.

    Note -: For creating jar you can use “mvn clean package” command in your terminal/cmd.

     import java.io.BufferedReader;
     import java.io.File;
     import java.io.InputStreamReader;
     import java.nio.charset.StandardCharsets;
     import java.util.HashMap;
     import java.util.Map;
     import org.springframework.boot.SpringApplication;
     import org.springframework.boot.autoconfigure.SpringBootApplication;
    
     import com.amazonaws.services.lambda.runtime.Context;
     import com.amazonaws.services.lambda.runtime.RequestHandler;
     import com.amazonaws.services.lambda.runtime.events.S3Event;
     import com.itextpdf.kernel.pdf.PdfDocument;
     import com.itextpdf.kernel.pdf.PdfWriter;
     import com.itextpdf.layout.Document;
     import com.itextpdf.layout.element.Paragraph;
    
     import software.amazon.awssdk.core.ResponseInputStream;
     import software.amazon.awssdk.core.sync.RequestBody;
     import software.amazon.awssdk.services.s3.S3Client;
     import software.amazon.awssdk.services.s3.model.GetObjectRequest;
     import software.amazon.awssdk.services.s3.model.GetObjectResponse;
     import software.amazon.awssdk.services.s3.model.PutObjectRequest;
    
     @SpringBootApplication
     public class TextToPdfApplication implements RequestHandler<S3Event, Map<String, Object>>{
         private final S3Client s3Client = S3Client.builder().build();
         Map<String, Object> response = new HashMap<>();
    
         public static void main(String[] args) {
             SpringApplication.run(TextToPdfApplication.class, args);
         }
    
         @Override
         public Map<String, Object> handleRequest(S3Event event, Context context) {
    
             // Get the bucket name and Object from the event
             // Note -: Object is name of the file that uploaded in the bucket
             String bucketName = event.getRecords().get(0).getS3().getBucket().getName();
             String Object = event.getRecords().get(0).getS3().getObject().getKey();
    
             context.getLogger().log("Bucket Name: " + bucketName);
             context.getLogger().log("Object Name: " + Object);
    
             // Download the file from the S3 bucket
    
             context.getLogger().log("Downloading the file from the S3 bucket");
    
             GetObjectRequest objectRequest = GetObjectRequest.builder().bucket(bucketName).key(Object).build();
             ResponseInputStream<GetObjectResponse> texObjectResponse = s3Client.getObject(objectRequest);
             context.getLogger().log("Download completed from s3 bucket");
    
             try {
                 BufferedReader fileContent = new BufferedReader(
                         new InputStreamReader(texObjectResponse, StandardCharsets.UTF_8));
                 context.getLogger().log("Starting the conversion of the file to PDF");
                 //Lambda Function can not directly store the file in s3 bucket so for that we are converting 
                // file to pdf and storing in temp file and after that we are uploading file
                 String outputFileName = "/tmp/"+Object.replace(".txt", ".pdf").replace("+", " ");
                 PdfWriter writer = new PdfWriter(outputFileName);
                 PdfDocument pdf = new PdfDocument(writer);
                 Document document = new Document(pdf);
                 String line;
                 while ((line = fileContent.readLine()) != null) {
                     document.add(new Paragraph(line));
                 }
                 document.close();
                 context.getLogger().log("File converted to PDF successfully");
                 //Removing the /temp/
                 // Please note if you give /temp/{fileName}.pdf it will create two folder and 
                 //under that it will place the pdf file to handle this we are removing /temp/
                 //so it will store in parent folder 
                 String OutPutFile = outputFileName.replace("/tmp/", "");
                 PutObjectRequest putObjectRequest = PutObjectRequest.builder()
                         .bucket(bucketName)
                         .key(OutPutFile)
                         .build();
    
                 s3Client.putObject(putObjectRequest, RequestBody.fromFile(new File(outputFileName)));
                 context.getLogger().log("PDF uploaded to the S3 bucket successfully");
                 response.put("statusCode", 200);
                 return response;
             } catch (Exception e) {
                 context.getLogger().log("Error in downloading the file from the S3 bucket");
                 context.getLogger().log(e.getMessage());
                 response.put("statusCode", 500);
                 return response;
             }
         }
     }
    
  5. Once jar file generated, we need to in your lambda function. then click on “save”

    Note :- You will find your jar file in the "target" folder. This jar file has high memory size, and it is the one we need to upload.

  6. Now file uploaded you can see “code properties” and “last modified” field to check if its uploading successfully.

    Note -: Please refresh page once.

  7. Now we need to update the "Handler" field. We need to provide the path of our class and method. Basically, when you upload the file and an event is triggered, we need to define which method should be invoked here.

    Note -:

    1. com.example.textToPDF.TextToPdfApplication::handleRequest (Please make the change as per your class and method name)

    package name = com.example.textToPDF

    Class name = TextToPdfApplication

    method name = handleRequest

    1. For jave we are using “method reference” ( :: ) to call the method.

    2. For python we need to use only (.)dot to call the method.

Step 6 -: Mandatory Step (If you are going to upload a file larger than 10 MB).

Note -: By default, the timeout is set to 15 seconds and memory to 512 MB. Please change these settings as per your requirement. For bigger files, this is a mandatory step. Here, I have changed the timeout.

Step 7 -: Test

Please upload the file to the S3 bucket and wait a few seconds for the converted PDF to be automatically pushed to the S3 bucket.

Note -: Please click “Upload” button to upload the file.

Upload → add file (select your file) → upload (Scroll down and click on upload)

Github -: Shivendra-99/textToPDF (https://github.com/Shivendra-99/textToPDF)