Enterprise AI focuses on integrating cutting-edge AI technologies into enterprise operations to drive efficiency and solve complex business challenges. Large language models (LLMs) have emerged as a powerful tool for addressing business problems that were once too complex or time-intensive to tackle, thus beginning a new chapter for Enterprise AI. In this engineering blog, we highlight some of the Enterprise AI efforts within Celonis. In particular, we will talk about how we leverage AI for enhancing existing Celonis products.
We want to emphasize that projects highlighted here are only a subset of what our AI teams have been working on. Please stay tuned for more from the Celonis Engineering Blog on AI in Celonis in the future.
We are investing in AI capabilities that provide elevated experiences throughout our product suite. These capabilities enable faster time to task completion, lower barriers for non-technical users, and deliver more engaging product experiences. Examples of such AI capabilities include:
AI Assistant for Data Integration, which helps users to quickly extract and transform data from various source systems into the Celonis Process Intelligence Graph through SQL generation and validation.
AI Assistant for Studio, enabling users to build studio components and summarize studio views conversationally (see our ICPM paper on this topic).
Let’s take the AI Assistant for Data Integration to discuss how AI can help with the ETL problem in Celonis. ETL often is a key bottleneck in data-driven analytics, and is often reported as a bottleneck in getting data ready for process mining within Celonis.
To significantly reduce the amount of time it takes to go from extractor connectivity to data available to build on in Object-Centric Process Mining, known as OCPM, we need to automate as much of the pipeline creation as possible. We do that through LLM-assisted SQL generation.
While SQL generation is a relatively heavily studied area, there are many technical challenges in Enterprise ETL to overcome in generating high-quality ETL SQLs for different scenarios:
The schema variety and complexity of different enterprise ERP/CRM and ad-hoc source database systems and their different SQL syntax flavors require us to build a generic solution that can be applicable even if LLMs do not understand those source systems natively;
There is often customized logic in both data extraction from a source system and data transformation between a source system and a target OCPM schema. LLMs may not understand these customized logics natively;
The security and safety of using generated SQL is a top concern besides accuracy. For example, the generated SQL needs to be safe to run against the source system.
We are building the ETL assistant backend that interfaces with existing Celonis products and addresses those challenges (see the picture above). Users are able to invoke the ETL Assistant through the data integration interface in Celonis EMS platform, where users will be able to ask SQL generation questions and get automatically generated SQL. In the backend, ETL Assistant uses LLM as a service built internally by our ML Infra team. It also leverages existing OCPM contents and services to generate the best ETL SQLs, for example, through dynamic few-shot example selection from our OCPM content pool.
We are building the ETL Assistant backend in two phases: the ETL Extraction Assistant, and the ETL Transformation Assistant. The ETL Extraction Assistant has already been integrated into the EMS extractor frontend, and the AI-assisted extractions editor was revealed in Celosphere 2024 and is now in LA status. Before the AI-assisted extractions editor, the extractions configuration relies on a complex and static wizard-like approach to generate SQL extraction queries. Moreover, our data integration service employs a query parsing grammar to validate filters, restricted by a predefined set of functions. This setup poses challenges for extending functionality and incorporating new functions, limiting the flexibility of data extraction and restricting customer access to other database functions. We are fixing this problem and lifting the current limitations by introducing a new AI-assisted extraction mode. This will allow our customers to either enter extraction queries directly into a new SQL input textbox or to leverage the Extraction assistant to automatically generate the SQL extraction query. The generated or user-provided SQL query will go through both an LLM-based and a rule-based validation engine to ensure its safety and security before being executed. Our initial customer feedback has been very positive, and users love the flexibility and convenience of the AI-assisted extractor product.
Stay tuned for further product updates on the ETL Assistant, and other product updates from Celonis AI.