Python Web Scraping in SAP Business Objects CMC: Getting the List of Schedules

Recently, I faced a mission: organizing all SAP Business Objects schedules into an Excel file. The manual process was tedious—copying each schedule name and details and pasting them into Excel. With hundreds of schedules, it meant endless copy-pasting.

I scoured the internet for solutions and found that the primary solution is Query Builder. However, exporting the Power Builder results from the web into an Excel file is still challenging.

I also tried the SAP Business Objects API, it could fetch all schedules, but the number of recurring schedules is not the same as what we saw in SAP BO CMC.

So, I turned to Python, and it works! If you're grappling with the same challenge, my code might be the solution you're seeking.

Importing Libraries

import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import time

Here, it’s importing necessary libraries. pandas for data manipulation, selenium for web scraping, BeautifulSoup for HTML parsing, and time for adding pauses between operations.

Setting up SAP BO Connection Details

bo_cmc = 'http://localhost:port/BOE/CMC'
bo_user_name = 'username'
bo_password = 'password'

It defines the SAP BusinessObjects Central Management Console (CMC) URL and login credentials.

Function to Get Soup (HTML Content)

def get_soup(driver, xpath):
    element = driver.find_element(By.XPATH, xpath)
    soup = BeautifulSoup(element.get_attribute("innerHTML"), 'lxml')
    soup.prettify()
    return soup

This function uses Selenium to locate an HTML element using XPath, extracts its inner HTML, and then creates a BeautifulSoup object for parsing.

Functions to Convert HTML to List and DataFrame

def bo_instances_html_to_list(soup):
    n = int(len(soup.select('tr td div')) / 16)
    values = [[] * n for _ in range(n)]
    for i in range(n):
        for j in range(16):
            values[i].append(soup.select('tr td div')[i * 16 + j].text)
    return values

def bo_values_to_df(values):
    df = pd.DataFrame(values)
    df.columns = ['', 'title', 'type', 'status', 'location', 'owner', 'completion_time', 'next_run_time',
                  'submission_time',
                  'start_time', 'duration', 'recurrence', 'expiry', 'server', 'error', 'title2']
    return df

def bo_instance_to_df(soup):
    values = bo_instances_html_to_list(soup)
    df = bo_values_to_df(values)
    return df

These functions convert HTML content of SAP BO instances into a list and then into a Pandas DataFrame.

Selenium Automation Function

def bo_selenium_to_df():
    options = webdriver.ChromeOptions()
    prefs = {'profile.default_content_settings.popups': 0}
    options.add_experimental_option('prefs', prefs)
    options.add_experimental_option("detach", True)
    driver = webdriver.Chrome(options=options)
    driver.get(bo_cmc)
    driver.switch_to.frame("servletBridgeIframe")
    time.sleep(3)

    username_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:USERNAME"]')
    password_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:PASSWORD"]')

    time.sleep(1)
    username_input.send_keys(bo_user_name)
    time.sleep(1)
    password_input.send_keys(bo_password)
    time.sleep(1)

    login_button = driver.find_element(By.XPATH, '//*[@id="_id2:logon:logonButton"]')
    time.sleep(1)
    login_button.click()

    time.sleep(5)
    html_text = driver.execute_script("return document.documentElement.outerHTML")
    contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
    driver.switch_to.frame(contentFrame)
    time.sleep(2)

    innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
    driver.switch_to.frame(innerContent)
    time.sleep(2)

    instance_manager_href = driver.find_element(By.XPATH, '//*[@id="manageList"]/li[1]/a[2]')
    instance_manager_href.click()
    time.sleep(1)

    # instancemanager
    ## switch to innerContent iframe
    html_page = driver.execute_script("return document.documentElement.outerHTML")
    contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
    driver.switch_to.frame(contentFrame)
    time.sleep(1)

    innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
    driver.switch_to.frame(innerContent)
    time.sleep(1)

    frames = []

    soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
    n = int(len(soup.select('tr td div')) / 16)
    df_start = bo_instance_to_df(soup=soup)
    frames.append(df_start)

    # In SAP BusinessObjects (SAP BO), schedules default to displaying 50 schedules per page.
    while n == 50:
        next_page_button = driver.find_element(By.XPATH, '//*[@id="UniversalRepositoryExplorer_goForwardButton"]')
        next_page_button.click()
        time.sleep(2)
        soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
        df = bo_instance_to_df(soup=soup)
        frames.append(df)
        n = int(len(soup.select('tr td div')) / 16)

    df = pd.concat(frames, axis=0)
    driver.close()
    drop_column = ['', 'completion_time', 'start_time', 'duration', 'server', 'error', 'title2']

    for column in drop_column:
        df = df.drop(column, axis=1)

    df['index'] = range(1, len(df) + 1)
    df = df[['index', 'title', 'type', 'status', 'location', 'owner', 'next_run_time',
             'submission_time', 'recurrence', 'expiry']]
    return df

This function uses Selenium to automate the process of logging into SAP BO, navigating to the schedule manager, extracting schedule details, and finally converting them into a Pandas DataFrame.

Executing the Selenium Function

df = bo_selenium_to_df()

This line calls the Selenium function and stores the resulting DataFrame in the variable df.

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement. If you find this information helpful, I invite you to follow me or connect with me on LinkedIn. Happy exploring!👋

Also appears here.