

Extracting Text from an Image with AWS Textract and .NET
source link: https://nodogmablog.bryanhogan.net/2023/02/extracting-text-from-an-image-with-aws-textract-and-net/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Extracting Text from an Image with AWS Textract and .NET
Want to talk with other .NET on AWS developers, ask questions, and share what you know? Join us on Slack!
Download full source code.
Extracting text from an image is a common task for many applications. In this and the next few blog posts, I’ll show how to do this with the AWS Textract service. It requires no machine learning/artificial intelligence expertise, and there is no infrastructure to set up. You use an SDK to call the service, passing the image to analyze. The response contains the text extracted from the image.
This first post will extract text from a single image, the first page of a book.
I use Blazor, to load the image, send it to Textract, display the image on the page, and display the extracted text. Keep in mind when looking at the code that my Blazor skills are pretty limited.
The reason to upload the file to the web server rather than just sending it directly to Textract is to allow me to display the image on the page before processing it (by default, you can’t serve a file that lies outside of the wwwroot
folder).
The attached zip has the full source code, so I won’t go through it all here, instead, I’ll show only the Textract parts.
Using Textract
Create the client -
private IAmazonTextract textractClient = new AmazonTextractClient();
private DetectDocumentTextResponse? detectDocumentResponse;
Send the uploaded image to Textract -
1processingMessage = $"Working on file {sourceImage.Name}...";
2FileStream fileStream = new FileStream(uploadedFilePath, FileMode.Open, FileAccess.Read);
3MemoryStream memoryStream = new MemoryStream();
4await fileStream.CopyToAsync(memoryStream);
5await fileStream.FlushAsync();
6Amazon.Textract.Model.Document document = new Document
7{
8 Bytes = memoryStream
9};
10var detectDocumentTextRequest = new DetectDocumentTextRequest()
11{
12 Document = document
13};
14detectDocumentResponse = await textractClient.DetectDocumentTextAsync(detectDocumentTextRequest);
detectDocumentResponse
contains the extracted text.
I use some simple Blazor code to display the extracted text beside the source image.

Conclusion
Getting started with Textract is easy, but it can do much more than just extracting text from an image.
In follow-up posts, I’ll show some other fun and useful things you can do with Textract.
Download full source code.
Recommend
-
46
Amazon Textract is a service that automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and informa...
-
19
I had a little down time after Google IO and I wanted to scratch a long-term itch I’ve had. I just want to be able to copy text that is held inside images in the browser. That is all. I think it would be a neat feature fo...
-
6
PDF Text Extraction Is Hard Even for AWS Textract Mar 5, 2020 I have always found that serendipity plays a large rol...
-
6
-
7
Intelligently Extract Text & Data with OCR
-
8
Instantly share code, notes, and snippets. Simple Python Script for Extracting Text from an SRT...
-
7
Building an OCR service with Amazon Textract and AWS Lambda Are you looking for a good way to extract text from PDFs and images? What about extracting text from tables? If you have these questions in mind, you...
-
9
AWS adds AI features to Textract, Transcribe and Kendra At re:Invent 2022, the cloud services provider also updated its HealthLake and CodeWhisperer serv...
-
10
Extracting Form Fields from a Multi-Page PDF AWS Textract and .NETWant to talk with other .NET on AWS developers, ask questions, and share what you know?
-
6
Extracting structured data from unstructured text with PaLM 13 Nov 2023 by dzlab In this article, we’ll go over one of the main use cases that LLMs like PaLM are used...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK