Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training not giving the expected results #1321

Open
RamadanHussein opened this issue Oct 15, 2024 · 8 comments
Open

training not giving the expected results #1321

RamadanHussein opened this issue Oct 15, 2024 · 8 comments

Comments

@RamadanHussein
Copy link

RamadanHussein commented Oct 15, 2024

hi ,
I'm trying to train EasyOCR on Arabic new fonts, we created a dataset of 1200 image with the labels after training the new model used to check some images where the results are very poor , the yaml file, sample of the images and labels attached
Anyone can support in this or guide me if I'm doing something wrong

log_dataset.txt
log_train.txt
opt.txt

easyOCr.zip

@romanvelichkin
Copy link

romanvelichkin commented Oct 18, 2024

Your config doesn't contain saved_model - are training from scratch?

So there can be many different options why it doesn't work for you...

  1. Why num_class: 103? Isn't that amount of symbols you're going to train? You have much less symbols than 103.
  2. You have low lr (learning rate).
  3. It's not clear how much validation data you have.
  4. 1300 images could be not enough - for a small specific task I have train dataset of 8000 images and validation dataset of 4000 images, and I'm still far from perfect.
  5. Train and valid data can be different, so you need to increase train data by adding more relevant data.

I found that data generation used in EasyOCR can be not efficient enough. I found that using images I'm working with is much more sufficient way to increase performance.
I automated creation of train data from my images. I run easyocr over my images and then cut it according to results. Then I can fix images or detected text the way I need, also all fail cases are becoming obvious.

My confing file, that works for me:
ru_filtered_config.txt

@RamadanHussein
Copy link
Author

Apricate your reply and help , I increased the sample to 10K for training and ~4k for validation also updated the configuration and start to see better results ,what do you think any other room for enhancement to get better results ?
and want to ask about "I automated creation of train data from my images. I run easyocr over my images and then cut it according to results. Then I can fix images or detected text the way I need, also all fail cases are becoming obvious."
How this can be done ,any support ?

ar_filtered_config.txt

@romanvelichkin
Copy link

romanvelichkin commented Oct 22, 2024

I can't really tell how much there can be room for improvement. It depends on model - how much stuff it can learn and memorize. But, from my experience, I think it can handle tens of thousands images if not hundreds of thousands.

I automated creation of train data from my images

  1. Method helps if you're not training model from scratch, so it recognizes something already.
    1.1. You feed bunch of images with text to EasyOCR.
    1.2. You get scan result for each image that contains detected and recognized data: bounding box coords, text, confidence.
    1.3. Extract bounding box coordsand text from scan result.
    1.4. Cut from original image small piece according to bounding box Using OpenCV or Pillow.
    1.5. Write text data into text file, so it will have cut_file_name and text data for that file.
    1.6. Now you can look for fail cases: what was recognized wrong and fix it.
    1.7. Augment data if needed and add to train or test data.

  2. You can teach model to understand different fonts. Create .doc file with as many, words, symbols and fonts as you need. Save each page as image. Use method I privided in above.

  3. You can use data generators: https://github.com/Belval/TextRecognitionDataGenerator.

@romanvelichkin
Copy link

romanvelichkin commented Oct 22, 2024

Code I wrote to cut images as described in my method. You have to modify it a bit to make work, because it's adapted to my project structure.

import ast
import os
import pandas as pd
from PIL import Image


def get_coords(box_string):
    # Create list from string
    box_coords = ast.literal_eval(box_string)

    x_left = 9999
    x_right = -1
    y_top = 9999
    y_bottom = -1
    
    for corner_coords in box_coords:
        x = round(corner_coords[0])
        y = round(corner_coords[1])
        
        if x < x_left:
            x_left = x

        if x > x_right:
            x_right = x

        if y < y_top:
            y_top = y

        if y > y_bottom:
            y_bottom = y

    return x_left, x_right, y_top, y_bottom


def extract_images(scan_result_filepath, img_dir_path=PATH_IMAGE, extract_dir=PATH_EXTRACT):
    scan_result_filename = os.path.basename(scan_result_filepath)
    scan_result_name= os.path.splitext(scan_result_filename)[0]
    
    pdf_name_length = scan_result_name.find('_')
    pdf_name = scan_result_name[:pdf_name_length]

    img_name = scan_result_name

    img_filepath = os.path.join(img_dir_path, pdf_name + '.pdf', img_name + '.jpg')
    print(img_filepath)

    df = pd.read_csv(scan_result_filepath)
    df = df.rename(columns={'0': 'box', '1': 'value', '2': 'accuracy'})
    print(len(df))
    print()

    text_lines  = []
    
    with Image.open(img_filepath) as im:
        for i in range(len(df)):
            try:
                coords = df.iloc[i, 0]
                x_left, x_right, y_top, y_bottom = get_coords(coords)
                value = df.iloc[i, 1]
                print(x_left, x_right, y_top, y_bottom, value)
    
                im_crop = im.crop((x_left, y_top, x_right, y_bottom))
                im_crop_path = os.path.join(extract_dir, scan_result_name + '_' + str(i) + '.jpg')
                im_crop.save(im_crop_path, quality=90, optimize=True)
            
                text_lines.append(scan_result_name + '_' + str(i) + '.jpg,' + value)
            except:
                pass

    extract_txt_filepath = os.path.join(extract_dir, scan_result_name + '.txt')
    with open(extract_txt_filepath, 'w') as file:
        for line in text_lines:
            try:
                file.write(line + '\n')
            except:
                pass


for pdf_filename in os.listdir(PATH_SCAN_RESULT):
    pdf_dir = os.path.join(PATH_SCAN_RESULT, pdf_filename)
    for csv_filename in os.listdir(pdf_dir):
        scan_result_filepath = os.path.join(pdf_dir, csv_filename)
        extract_images(scan_result_filepath)

@TiagoFSoares
Copy link

Hello, @romanvelichkin.

I’ve been fine-tuning the model specifically to better detect the * symbol, and I’ve achieved low training and validation losses (around 0.0001 for both). However, I’m facing an issue: even during validation, where the predicted label matches the ground truth, the confidence scores remain consistently low (always below 0.5).

I recognize that my dataset is relatively small—500 images for training and 100 for validation—but I’m actively working to expand it. My primary concern is that when I use the fine-tuned model’s weight the outputs appear completely random (the pretrained weights that I've used are latin_g2.pth). This behavior is puzzling, given that the model has been fine-tuned and the weights were properly saved.

My suspicion is that this might be an overfitting issue. However, if that were the case, I would expect the model to at least perform well on images from the validation dataset. Unfortunately, this isn’t happening.

Do you have any insights into what might be causing these problems or suggestions for how to address them?

@romanvelichkin
Copy link

romanvelichkin commented Nov 27, 2024

However, I’m facing an issue: even during validation, where the predicted label matches the ground truth, the confidence scores remain consistently low (always below 0.5).

So you need model to have high confidence score, right? I didn't train model that many times, so I can't tell how much data you need to make model more confident.

Make sure that train and val data are somewhat similar and not completely different. Low confidence score can be because of that.

I would advice to increase input resolution for scanner. I personally prefer 2560 - it gives fine results with average inference time. I've checked it with even bigger resolutions - then model can detect smaller elements much better. But inference speed falls down very bad.

Increase amout of data - use data generation https://github.com/Belval/TextRecognitionDataGenerator, use lots of augmentations.

Also I found that models used in EasyOCR are large so they have to be trained for a long time. Try to increase amount of epochs. Usually you need to train that many epochs until val accuracy won't start becoming worse.
You can also reduce validation rate if training will start taking too much of real time.

Try to experiment with learning rate and batch size. Golden standart is 32 images per batch. For my tasks I train easyocr recognizer with 64 batch size.

I didn't get that part:

when I use the fine-tuned model’s weight the outputs appear completely random

It doesn't show same result on val data as you get during training?

Keep in mind, that even with well trained recognizer, EasyOCR may not work properly for small symbols. You also need to train CRAFT detector to detect those symbols. Detector finds pieces of an image with symbols and sends them to recognizer. If detector is not trained to find * symbols, it won't send them to recognizer - no matter how well it was trained.

My suspicion is that this might be an overfitting issue.

Overfitting means validation loss/accuracy must decrease during training. It doesn't look like your case.

@TiagoFSoares
Copy link

Allow me to clarify:

I fine-tuned a model starting from the latin_g2.pth weights (let’s call the resulting model fine_tuned.pth).

When using the latin_g2.pth weights for inference, the results are generally good, although some symbols are not detected correctly. However, when using the fine-tuned weights (fine_tuned.pth), the output becomes significantly worse. Here’s an example:

Ground Truth: Highland " Weaver Overpowered
Using latin_g2.pth:
[([[0, 8], [172, 8], [172, 64], [0, 64]], 'Highland', 0.9999913177335946), ([[197, 0], [597, 0], [597, 58], [197, 58]], 'Weaver Overpowered', 0.9959445322353768)]

Using fine_tuned.pth:
[([[0, 8], [172, 8], [172, 64], [0, 64]], 'Hghtn', 0.09663682101367746), ([[197, 0], [597, 0], [597, 58], [197, 58]], 'wvrJ@3rwe', 0.12083536141679106)]

For the fine-tuning process, I used the dataset referenced in the EasyOCR repository as a test. However, the results are still not satisfactory.

Below is the configuration file (config.yaml) I used for fine-tuning
number: '0123456789'
symbol: "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ €"
lang_char: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzªÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿĀāĂ㥹ĆćČčĎďĐđĒēĖėĘęĚěĞğĨĩĪīĮįİĶķĹĺĻļĽľŁłŃńŅņŇňŒœŔŕŘřŚśŞşŠšŤťŨũŪūŮůŲųŸŹźŻżŽžƏƠơƯưȘșȚțə̇ḌḍḶḷṀṁṂṃṄṅṆṇṬṭẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾếỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợỤụỦủỨứỪừỬửỮữỰựỲỳỴỵỶỷỸỹ€"
experiment_name: 'test'
train_data: 'C:\Users\EasyOCR\dataset'
valid_data: 'C:\Users\EasyOCR\dataset\new_val' #all_data/en_val
manualSeed: 1111
workers: 6
batch_size: 32 #32
num_iter: 800
valInterval: 2
saved_model: 'C:\Users\EasyOCR\trainer\saved_models\latin_g2.pth' #'saved_models/en_filtered/iter_300000.pth'
FT: True
optim: 'adam' # default is Adadelta
lr: 1
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'new_train' # this is dataset folder in train_data
batch_ratio: '1'
total_data_usage_ratio: 1.0
batch_max_length: 34
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False

Model Architecture

Transformation: 'None'
FeatureExtraction: 'VGG'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 256
hidden_size: 256
decode: 'greedy'
new_prediction: False
freeze_FeatureFxtraction: True
freeze_SequenceModeling: True

Let me know your toughts on this.

@romanvelichkin
Copy link

I’ve been fine-tuning the model specifically to better detect the * symbol
Ground Truth: Highland " Weaver Overpowered

I think problem appears because you're fine-tuning for *, but expect good results for text. Recognizer is over-tuned on your specific dataset.

  1. Check if detector actually finds * symbol. Look scan results - does that symbol have bounding boxes or it's not presented in results at all. In second case you have to fine-tune detector not recognizer. Probably original model can recognize * well enough.

  2. If problem not in detector then try to fine-tune recognizer more gentle.

2.1. Check how fast you reach high accuracy on val data - if you reach 90% in just a few epochs and then you train for 1000 more epochs just to get 2% more, this can lead not to fine-tuning, but complete tuning of model. Especially if your dataset is small. Model is pretty big, it has to learn 500 images very fast.

2.2. Try to train with reduced learning rate and reduce amount of epochs.

2.3. Train it on all your data, not just * symbols.

2.4. Increase dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants