Why Stamp Invoice?

Paper documents seem to have fulfilled their mission in conveying information to humanity, while today we are almost oriented towards receiving information using digital data such as emails, text messages, files, images, videos, etc. But will these e-data completely replace printed documents? According to our survey, in operations related to contracts, commitments, certifications, and invoices, people still prefer to use printed documents. Perhaps seeing printed copies with signatures or seals actually looks more trustworthy than e-data and related things.

Published:
April 10, 2023
Category:
Development
Client:
TBU

Therefore, analyzing, and extracting information from printed documents also becomes a very practical and challenging need. The first challenge in this series of challenges is classifying the format of the data, which can be divided into many types of classification such as classification based on layout, based on semantics, or more specifically based on document characteristics. of several different brands and organizations such as logos, signatures, stamps, etc.

Preparing a number of available templates to extract information and text is a reasonable direction, but the first problem to face is how to classify a large amount of daily data when human resources are limited?

To share this problem, we provide an effective solution to classify data based on stamps – Stamp Invoice application.

According to our survey, in operations related to contracts, commitments, certifications, and taxes, people still prefer to use printed documents.

How It Works?

Our solution is based on analyzing the visual characteristics of seals appearing in commercial contracts, which represent the commitment and confirmation of the company’s board of directors, in other words this a mark to distinguish which company a document belongs to. This is a reliable feature for classifying documents of different companies. The data goal we aim for is the seal of commitment in Japanese invoices of companies. Visual characteristics of invoice stamps such as shape, color, pattern is clustered and detected as target objects. These objects are extracted, embedded, and stored in the database. 

 We provide a very convenient GUI to register these seals representing different companies. The process of classifying documents by stamp will also include detecting, extracting, embedding the stamp in each document, then we will match the embedding vector and return stamp ID or company ID. Some documents that do not contain a stamp or an unclear stamp will be assigned as Unknown or Need Review. Humans can intervene at this step to make corrections if necessary.

Process & Results

This application has been tested on a set of invoices from several Japanese companies. It is exciting that this simple application is highly effective in classifying documents. It has greatly reduced the time and effort to classify data in the traditional way – with human eyes – in a very lethargic state.

It should be added that we still have to research and analyze data to provide more classification options based on specific objects for the data rather than just focusing on stamps and Japanese documents. We also go further in analyzing and extracting targeted information than just classifying documents.