blog image

Automation of the Excel to PDF conversion with Python and AutoIt

We had a requirement to migrate specific Excel files containing VB scripts which were signed with signatures which already expired. These files were used to calculate insurance formulas relevant for decision process. With the approach to reimplement functionalities in the new portal they wanted to have them in the new portal as PDF files. Plus the CRM entity Action with all respective fields expose as a PDF report merged with the pdf resulting from excel.

 This is a example of such a entity in the CRM which needs to be exported into single PDF. Data from UI /DB plus the calculated values from Excel by VB scripts doing insurance math.

 sanierungactionsrnsht

The resulting report comprising information from this entity should look in this format :

sanrep1fin

The task consisted from following subtasks : 

  1. Prepare script which converts Nx1000 pcs of Excel files with VB and signature from version 2003 to 2023 /xls-xlsx/
  2. Convert the xlsx Excel file to PDF each sheet as separate a PDF
  3. Merge the report pdf with pdf generated from excels
  4.  automate it

To complete this task we decided for python with win32.com , pythoncom libs for converting and another python script for merging the pdf files based on given logic.

During implementation we had challenge, altough we used settings :

 excel = win32com.client.Dispatch("Excel.Application")
 excel.Visible = False
 excel.DisplayAlerts = False
 excel.Interactive = False

 we still had different error message poping up due to signed macros with expired signature.

 sanirerr1

We did not find solution how to supress these messages, so we decided to use automate tool to click this modal dialogboxes. We have chosed AutoIt https://www.autoitscript.com/site/ freeware BASIC-like scripting language designed for automating the Windows GUI and general scripting. It uses a combination of simulated keystrokes, mouse movement and window manipulation in order to automate tasks in a way not possible or reliable with other languages.

Now the solution :

Python Script to convert the xls to xlsx and then each sheet export as pdf

import os
import win32com.client
import pythoncom
import re
import shutil
from datetime import datetime
def sanitize_filename(filename):
    """Remove or replace invalid characters for filenames."""
    return re.sub(r'[<>:"/\\|?*]', '_', filename)
def convert_xls_to_xlsx_and_pdf(input_folder, output_folder, processed_folder, sheet_names):
    pythoncom.CoInitialize()
    excel = None
    def start_excel():
        nonlocal excel
        if excel is not None:
            try:
                excel.Quit()
            except:
                pass
        excel = win32com.client.Dispatch("Excel.Application")
        excel.Visible = False
        excel.DisplayAlerts = False
        excel.Interactive = False
        return excel
    try:
        excel = start_excel()
        # Create output and processed folders if they don’t exist
        if not os.path.exists(output_folder):
            os.makedirs(output_folder)
        if not os.path.exists(processed_folder):
            os.makedirs(processed_folder)
        for filename in os.listdir(input_folder):
            if filename.lower().endswith(".xls"):
                input_path = os.path.join(input_folder, filename)
                file_size = os.path.getsize(input_path) / (1024 * 1024)  # Size in MB
                print(f"Processing {filename}, size: {file_size:.2f} MB")
                temp_xlsx_filename = sanitize_filename(os.path.splitext(filename)[0] + ".xlsx")
                temp_xlsx_path = os.path.join(input_folder, temp_xlsx_filename)
                
                # Step 1: Convert .xls to .xlsx
                workbook = None
                try:
                    workbook = excel.Workbooks.Open(os.path.abspath(input_path))
                    workbook.SaveAs(os.path.abspath(temp_xlsx_path), FileFormat=51)
                    print(f"Converted {filename} to {temp_xlsx_filename}")
                except Exception as e:
                    print(f"Error converting {filename} to xlsx: {str(e)}")
                    if "RPC" in str(e):  # Restart Excel on RPC error
                        print("Restarting Excel due to RPC error...")
                        excel = start_excel()
                    continue
                finally:
                    if workbook is not None:
                        try:
                            workbook.Close(SaveChanges=False)
                        except Exception as e:
                            print(f"Error closing workbook for {filename}: {str(e)}")
                # Step 2: Convert specified sheets to PDF
                try:
                    workbook = excel.Workbooks.Open(os.path.abspath(temp_xlsx_path))
                    available_sheets = [workbook.Worksheets(i + 1).Name for i in range(workbook.Worksheets.Count)]
                    for sheet_name in sheet_names:
                        if sheet_name in available_sheets:
                            try:
                                sheet = workbook.Worksheets(sheet_name)
                                pdf_filename = sanitize_filename(f"{os.path.splitext(filename)[0]}_{sheet_name}.pdf")
                                pdf_path = os.path.join(output_folder, pdf_filename)
                                sheet.ExportAsFixedFormat(
                                    Type=0,
                                    Filename=os.path.abspath(pdf_path),
                                    Quality=0,
                                    IncludeDocProperties=True,
                                    IgnorePrintAreas=False
                                )
                                print(f"Converted {sheet_name} from {temp_xlsx_filename} to {pdf_filename}")
                            except Exception as e:
                                print(f"Error converting {sheet_name} in {temp_xlsx_filename}: {str(e)}")
                        else:
                            print(f"Sheet {sheet_name} not found in {temp_xlsx_filename}")
                except Exception as e:
                    print(f"Error processing {temp_xlsx_filename} for PDF conversion: {str(e)}")
                    if "RPC" in str(e):
                        print("Restarting Excel due to RPC error...")
                        excel = start_excel()
                finally:
                    if workbook is not None:
                        try:
                            workbook.Close(SaveChanges=False)
                        except Exception as e:
                            print(f"Error closing workbook for {temp_xlsx_filename}: {str(e)}")
                # Step 3: Move original .xls to processed folder
                processed_path = os.path.join(processed_folder, filename)
                try:
                    shutil.move(input_path, processed_path)
                    print(f"Moved {filename} to {processed_path}")
                except shutil.Error as e:
                    print(f"Error moving {filename} to processed folder: {str(e)}")
                
                # Clean up temporary .xlsx file
                if os.path.exists(temp_xlsx_path):
                    try:
                        os.remove(temp_xlsx_path)
                        print(f"Deleted temporary file {temp_xlsx_filename}")
                    except Exception as e:
                        print(f"Error deleting {temp_xlsx_filename}: {str(e)}")
    except Exception as e:
        print(f"Unexpected error in convert_xls_to_xlsx_and_pdf: {str(e)}")
    finally:
        if excel is not None:
            try:
                excel.Quit()
            except Exception as e:
                print(f"Error quitting Excel: {str(e)}")
        pythoncom.CoUninitialize()
def move_processed_directory(processed_folder, new_base_name="processed"):
    """Move the processed directory to a timestamped location."""
    if not os.path.exists(processed_folder):
        print(f"Processed folder '{processed_folder}' does not exist. Skipping move.")
        return
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    parent_dir = os.path.dirname(processed_folder)
    new_folder_name = f"{new_base_name}_{timestamp}"
    new_folder_path = os.path.join(parent_dir, new_folder_name)
    try:
        shutil.move(processed_folder, new_folder_path)
        print(f"Moved '{processed_folder}' to '{new_folder_path}'")
    except shutil.Error as e:
        print(f"Error moving '{processed_folder}': {str(e)} (Target may already exist)")
    except Exception as e:
        print(f"Unexpected error moving '{processed_folder}': {str(e)}")
if __name__ == "__main__":
    # Example usage
    input_folder = r"C:\temp\Sanierungsexcel"  # Replace with your input folder path
    output_folder = r"C:\temp\Sanierungsexcel\pdfs"  # Replace with your output folder path
    processed_folder = r"C:\temp\Sanierungsexcel\processed"  # Replace with processed folder path
    sheet_names = ["Formular", "MobiPro", "KMU", "Formulaire", "PMU", "Modulo"]  # Replace with your target sheet names
    # Process .xls files: convert to .xlsx, export to PDF, move original
    convert_xls_to_xlsx_and_pdf(input_folder, output_folder, processed_folder, sheet_names)
    # Move the processed directory to a timestamped location
    move_processed_directory(processed_folder) 

The AutoIT script to find modal dialogues of this 

#include <MsgBoxConstants.au3>
#include <File.au3>
#include <GUIConstantsEx.au3>  ; For $GUI_ENABLE constant
; Log file for debugging
Global $sLogFile = @ScriptDir & "\script_log.txt"
; Function to log messages
Func LogMessage($sMessage)
    _FileWriteLog($sLogFile, $sMessage)
EndFunc
; Initial message
MsgBox($MB_SYSTEMMODAL, "Sanierungsmigration!", "Starting!")
; Set options
Opt("WinTitleMatchMode", 2) ; Partial title matching
Opt("WinWaitDelay", 100)    ; Faster response
While 1
    ; Check for standard dialog windows (#32770)
    Local $aWindows = WinList("[CLASS:#32770]")
    For $i = 1 To $aWindows[0][0]
        Local $hWnd = $aWindows[$i][0] ; Window handle
        Local $sTitle = WinGetTitle($hWnd)
        Local $sClass = WinGetClassList($hWnd)
        LogMessage("Detected dialog - Handle: " & $hWnd & ", Title: " & $sTitle & ", Class: " & $sClass)
        ; Check if dialog is related to Excel or Visual Basic
        If StringInStr($sTitle, "Microsoft Excel") Or StringInStr($sTitle, "Microsoft Visual Basic") Or $sTitle = "" Or StringRegExp($sTitle, "^0x[0-9A-Fa-f]+$") Then
            LogMessage("Processing potential target dialog - Handle: " & $hWnd & ", Title: " & $sTitle)
            ; MsgBox($MB_SYSTEMMODAL, "Sanierungsmigration!", "Found dialog - Handle: " & $hWnd & ", Title: " & $sTitle)
            ; Activate the dialog
            WinActivate($hWnd)
            If WinWaitActive($hWnd, "", 5) Then
                ; Try clicking buttons by checking text
                Local $aButtonIDs = ["Button1", "Button2", "Button3"]
                For $sButtonID In $aButtonIDs
                    Local $sButtonText = ControlGetText($hWnd, "", $sButtonID)
                    LogMessage("Button " & $sButtonID & " text: " & $sButtonText & " on dialog: " & $sTitle)
                    If StringInStr($sButtonText, "11Continue") Or StringInStr($sButtonText, "Weiter") Or StringInStr($sButtonText, "End") Or StringInStr($sButtonText, "Save") Or StringInStr($sButtonText, "Yes") Or StringInStr($sButtonText, "Save changes and discard signature") Then
                        If ControlClick($hWnd, "", $sButtonID) Then
                            LogMessage("Clicked button with text: " & $sButtonText & " on dialog: " & $sTitle)
                            ExitLoop ; Exit button loop after clicking
                        Else
							ControlClick($hWnd, "", "Button1")
                            LogMessage("Failed to click button " & $sButtonID & " on dialog: " & $sTitle)
                        EndIf
                    EndIf
                Next
                ; Fallback: Try clicking Button1
                If Not StringInStr($sButtonText, "Continue") And Not StringInStr($sButtonText, "Weiter") Then
                    If ControlClick($hWnd, "", "Button1") Then
                        LogMessage("Clicked fallback Button1 on dialog: " & $sTitle)
                    Else
                        LogMessage("No clickable button found on dialog: " & $sTitle)
                    EndIf
                EndIf
            Else
                LogMessage("Failed to activate dialog - Handle: " & $hWnd & ", Title: " & $sTitle)
            EndIf
        EndIf
    Next
    ; Check for non-standard Excel dialogs (bosa_sdm_msExcel)
    Local $aExcelWindows = WinList("[CLASS:bosa_sdm_msExcel]")
    For $i = 1 To $aExcelWindows[0][0]
        Local $hWnd = $aExcelWindows[$i][0]
        Local $sTitle = WinGetTitle($hWnd)
        Local $sClass = WinGetClassList($hWnd)
        LogMessage("Detected non-standard Excel dialog - Handle: " & $hWnd & ", Title: " & $sTitle & ", Class: " & $sClass)
       ;  MsgBox($MB_SYSTEMMODAL, "Sanierungsmigration!", "Found non-standard Excel dialog - Handle: " & $hWnd & ", Title: " & $sTitle)
        WinActivate($hWnd)
        If WinWaitActive($hWnd, "", 5) Then
            Local $aButtonIDs = ["Button1", "Button2", "Button3"]
            For $sButtonID In $aButtonIDs
                Local $sButtonText = ControlGetText($hWnd, "", $sButtonID)
                LogMessage("Button " & $sButtonID & " text: " & $sButtonText & " on dialog: " & $sTitle)
                If StringInStr($sButtonText, "11Continue") Or StringInStr($sButtonText, "Weiter") Or StringInStr($sButtonText, "End") Or StringInStr($sButtonText, "Save") Or StringInStr($sButtonText, "Yes") Or StringInStr($sButtonText, "Save changes and discard signature") Then
                    If ControlClick($hWnd, "", $sButtonID) Then
                        LogMessage("Clicked button with text: " & $sButtonText & " on dialog: " & $sTitle)
                        ExitLoop
                    Else
                        LogMessage("Failed to click button " & $sButtonID & " on dialog: " & $sTitle)
                    EndIf
                EndIf
            Next
            ; Fallback: Try clicking Button1
            If Not StringInStr($sButtonText, "Continue") And Not StringInStr($sButtonText, "Weiter") Then
                If ControlClick($hWnd, "", "Button1") Then
                    LogMessage("Clicked fallback Button1 on non-standard dialog: " & $sTitle)
                Else
                    LogMessage("No clickable button found on non-standard dialog: " & $sTitle)
                EndIf
            EndIf
        Else
            LogMessage("Failed to activate non-standard dialog - Handle: " & $hWnd & ", Title: " & $sTitle)
        EndIf
    Next
    Sleep(100) ; Check every 100ms
WEnd 

and finally script to merge the generated pdfs consisting of BIP report from the CRM entity and converted excel sheets into pdfs :

import os
import re
import
PyPDF2
import shutil
 
def merge_pdfs(pdf_list, output):
    print(f"Starting to merge
{len(pdf_list)} PDFs into {output}")
    pdf_writer = PyPDF2.PdfWriter()
 
    for pdf in pdf_list:
       
print(f"Adding pages from: {pdf}")
        pdf_reader =
PyPDF2.PdfReader(pdf)
        for page in
range(len(pdf_reader.pages)):
           
pdf_writer.add_page(pdf_reader.pages[page])
 
    with open(output, 'wb') as out_pdf:
       
pdf_writer.write(out_pdf)
    print(f"Merged PDF saved as:
{output}\n")
 
def find_related_pdfs(directory, id_pattern):
    related_pdfs = []
   
    print(f"Searching for PDFs with
ID pattern: {id_pattern} in directory: {directory}")
    for filename in os.listdir(directory):
        # Check for .PDF
or .pdf files and if the filename contains the ID pattern
        if
filename.lower().endswith('.pdf') and (id_pattern in filename):
           
related_pdfs.append(os.path.join(directory, filename))
           
print(f"Found related PDF: {filename}")
 
    return related_pdfs
 
def find_files_with_pattern(directory, pattern):
    matching_files = []
   
    print(f"Searching for files
with pattern: {pattern} in directory: {directory}")
    for filename in os.listdir(directory):
        if pattern in
filename:
           
matching_files.append(os.path.join(directory, filename))
           
print(f"Found matching file: {filename}")
   
    return matching_files
 
# Directory where your PDF files are located
pdf_directory = r'C:\temp\Sanierungsexcel\pdfs'  # Your
directory
output_directory =
r'C:\temp\Sanierungsexcel\pdfsfinal'  # Directory for merged PDFs
finished_pdfs_directory =
r'C:\temp\Sanierungsexcel\finishedpdfs'  # Directory for processed PDFs
search_pattern = 'MOBI_SANIERUNGS_ACTIVITY_REPORT_DEU'
 
# Create finished PDFs directory if it doesn't exist
os.makedirs(finished_pdfs_directory, exist_ok=True)
 
# Find files matching the search pattern
activity_report_files =
find_files_with_pattern(pdf_directory, search_pattern)
 
for activity_report in activity_report_files:
    # Extract ID from the filename using
regex
    match =
re.search(r'_(\d+-[A-Z0-9#]+)#MOBI_SANIERUNGS_ACTIVITY_REPORT',
activity_report)
    if match:
        id_value =
match.group(1)  # This captures the ID
       
print(f"Found ID: {id_value} in file: {activity_report}")
 
        # Find all
related PDFs for this ID
        related_pdf_files
= find_related_pdfs(pdf_directory, id_value)
 
        # Check if there
is at least one other related PDF
        if
len(related_pdf_files) > 1:  # Ensure there are at least two related
PDFs
           
# Create a list to hold the PDFs in the desired order
           
pdf_files = []
 
           
# Create a set to track already added files
           
added_files_set = set()
 
           
# Add the main activity report if it hasn't been added yet
           
if activity_report not in added_files_set:
               
pdf_files.append(activity_report)
               
added_files_set.add(activity_report)
 
           
# Add Formular or Formulaire PDFs, ensuring no duplicates are added
           
formular_pdfs = [file for file in related_pdf_files if "Formular" in
file or "Formulaire" in file]
           
for file in formular_pdfs:
               
if file not in added_files_set:
                   
pdf_files.append(file)
                   
added_files_set.add(file)
 
           
# Add any remaining related PDFs (that are not Formular or Formulaire),
ensuring no duplicates are added
           
remaining_pdf_files = [file for file in related_pdf_files if file not in
added_files_set]
           
pdf_files.extend(remaining_pdf_files)
 
           
# Output PDF file name based on ID
           
output_file = os.path.join(output_directory,
f'SANIERUNG_REPORT_{id_value}.pdf')
 
           
# Merge the PDFs
           
merge_pdfs(pdf_files, output_file)
 
           
# Move processed PDFs to finished directory
           
for pdf in pdf_files:
               
shutil.move(pdf, os.path.join(finished_pdfs_directory, os.path.basename(pdf)))
 
        else:
           
print(f"Only one PDF found for ID: {id_value}, skipping merge.\n")
    else:
        print(f"No
valid ID found in file: {activity_report}\n") 

For us the game changer was the discovery of autoIT to click the dialogues which we could not supress. Another solution was to use JAVA licensed library to convert excel to pdf but this one has also its disadvantages. This way we could through avalaible time convert and prepare the data for new user portal.

Enjoy and contact us if you need solve your problems ! 

Skôr ako začneme: nahliadnite do spracovania vašich osobných údajov

Ak navštívite stránku, ktorá zapisuje cookies, v počítači sa vám vytvorí malý textový súbor, ktorý sa uloží vo vašom prehliadači. Ak rovnakú stránku navštívite nabudúce, pripojíte sa vďaka nemu na web rýchlejšie. Náš web vám ponúkne relevantné informácie a bude sa vám pracovať jednoduchšie.

Súbory cookies používame najmä na anonymnú analýzu návštevnosti a vylepšovanie našich web stránok. Ak si nastavíte blokovanie zápisu cookies do vášho prehliadača, je možné, že web sa spomalí a niektoré jeho časti nemusia fungovať úplne korektne. Viac info k spracúvaniu cookies.