(Available from v2.5.0)
With optical character recognition (OCR) it is possible to convert screen content into text. In combination with Sakuli, it is also possible to automate software based on the text shown on screen.
Sakuli containers ship with a complete setup to use the OCR module of Sakuli. OCR module can be used without any further set up.
Installing the OCR module of Sakuli on a workstation or virtual machine is possible, but requires some additional steps and dependencies. We recommend, to use the ready-to-use container for most use cases. A native installation should only be considered, if the containers don’t fit the requirements of your check.
To add the OCR module, please add "@sakuli/ocr": "next"
to your package.json
.
{
"name": "sakuli-project",
"version": "0.8.15",
"scripts": {
"test": "sakuli run test-suite"
},
"dependencies": {
"@sakuli/cli": "next",
"@sakuli/ocr": "next"
}
}
After altering your package.json
, the module can be installed with npm install
on command line.
For more information on how to set up your access the modules, please have a look at the modules documentation.
Sakuli OCR uses Tesseract to read text from the screen. For performance and quality reasons, a native installation of tesseract is required. Please ensure to install a Tesseract version >= 4.1.1. For more information on how to obtain and install Tesseract on your machine, please have a look at the official Tesseract documentation.
To read text from the screen, Sakuli OCR ships with the test step _getTextFromRegion()
. The function returns all text
on the screen including line feeds.
const text = await _getTextFromRegion(new Region());
const regionToRead = new Region(50, 100, 400, 50);
const textOfRegion = await _getTextFromRegion(regionToRead);
new Region(50, 100, 400, 50);
creates a region with the following specification:
The constant textOfRegion
will only contain the text from this specific region of the screen.
With Sakuli, it is also possible to automate software based on the text shown on the screen. _getRegionByText()
provides the possibility to search for a specified text on the screen or in a specified region of the screen. Once the
location of the text is determined, it is possible to e.g. move the mouse to the location and click on it.
await _getRegionByText("Continue").click();
This sample would search the screen for the text “Continue” and subsequently click on it.
const searchRegion = new Region(50, 100, 400, 50);
await _getRegionByText("Continue", searchRegion).click();
new Region(50, 100, 400, 50);
creates a region with the following specification:
In this sample, the search for the text “Continue” would be limited to the specified screen region.
One common use case would be to extract text from a certain region of the screen e.g. from an offer or invoice for further validation.
(async () => {
const testCase = new TestCase("Check offer");
try {
// ...
const offerNumberRegion = await _getRegionByText("offer number").grow(10);
const offerNumber = await _getTextFromRegion(offerNumberRegion);
const expectedOrderNumber = /42-XBC-09453/;
await _assert(Promise.resolve(!!offerNumber.match(expectedOrderNumber)),
`Found ${offerNumber} instead of ${expectedOrderNumber}`);
// ...
} catch (e) {
await testCase.handleException(e);
} finally {
await testCase.saveResult();
}
})();
This check would search for the “offer number” in the pdf, extract the whole offer number and compare it to an expected value.
The quality of OCR results highly depends on the screen content. Here are some aspects to consider when OCR does not recognize the text you expect:
In case you want to have some insights in what tesseract “sees” through a Sakuli automation, you can easily adapt log levels to show details of the OCR process.
_getTextFromRegion()
to the logs_getRegionByText()
to the logs