Software

SAP: Advancing enterprise AI research with first real ERP dataset

August 21, 2025. The power of generative AI in the context of text has enormous utility – from writing emails to answering questions to writing wedding speeches. AI models trained for text work, such as large language models (LLMs), have increased this utility, and they continue to improve in the area of natural language.

Share this Post
Symbolic image software development / pixabay TyliJura

Contact info

Silicon Saxony

Marketing, Kommunikation und Ă–ffentlichkeitsarbeit

Manfred-von-Ardenne-Ring 20 F

Telefon: +49 351 8925 886

Fax: +49 351 8925 889

redaktion@silicon-saxony.de

Contact person:

However, if you go beyond this and apply these models to structured, tabular data that is essential for the operational tasks of companies, you face some challenges. This imbalance is partly due to the availability of training data. Texts for training models are plentiful and are often pulled from the internet. Tabular data, on the other hand, especially those with multiple linked tables, are scarce.

To translate the advances of AI into the enterprise context, researchers involved in training and comparing the performance of these models in the enterprise environment need realistic tabular data. For this reason, SAP has developed “Sales Autocompletion Linked Business Tables” (SALT). This is a specially compiled data set with anonymized data from a customer’s ERP system.

SALT was developed specifically to support researchers working on AI models for practical business contexts. SALT is accessible via Hugging Face and GitHub.

Challenges: Sourcing and handling company data

Until now, making realistic company data such as SALT available to the research community has not been an easy undertaking. Privacy, confidentiality and commercial interests make it difficult to procure large, cleansed, high-quality company datasets for training and benchmarking models for specific use cases. This means that the gap between the data that researchers work with and the actual company data is growing.

Adding to the problem of lack of availability is the fact that company data is complex. First of all, business data is usually stored in several interconnected tables. For example, an entry in a sales order may be linked to numerous tables, such as customer numbers linked to a supplier table with address data. Secondly, tables are inherently heterogeneous in terms of the types of data they can contain. For example, one field may be a text field, while another may contain numeric or categorical values. Finally, business data often has significant imbalances in terms of columns. This means that a certain product category, for example, may be included in up to 90 percent of all customer orders, while others are rare.

The best way to help researchers develop business models for these challenges is to provide precise business data.

SALT – the new dataset

Precise business data is in short supply in AI research. The SALT dataset remedies this by providing the research community with the first real ERP dataset. SALT uses actual industry data from an ERP system in which customer orders are recorded. To maintain confidentiality, the data has been minimally processed.

“There is a gap between academia and industry when it comes to data. This is not so easy to close for data protection reasons,” says Tassilo Klein from the Research/SALT department at SAP. “But we want the research community to work on real problems, not just simulated ones.”

ERP systems help companies to manage their core business processes such as finance and expense management. With millions of entries and extensive interlinked relational tables, mostly from the sales domain, the SALT dataset replicates customer interactions in an ERP system. With real-world company data, SALT provides a perfect foundation for models to better understand the characteristics of company data and validate their performance through benchmarking. SALT should also help researchers develop better base models for connected business data.

If successful, all of this will advance automation in organizations, as many business processes rely heavily on data in structured tabular formats. Although this data plays a crucial role in the day-to-day operations of companies, revolutionary generative AI has not yet succeeded in fully unlocking its potential.

“SALT is a first step in providing researchers with authentic representative industry data that provides a small insight into actual company data. For now, we are starting with just one customer and one use case,” explains Johannes Hoffart, Chief Technology Officer of Business AI at SAP. “However, we plan to release more data sets covering a wider range of customers and use cases. Together with SALT, this can then serve as a basis for pre-training, customization and benchmarking of models.”

Another motivation for publishing this data is cooperation with universities.

“At SAP, we hope to collaborate with partners from academia who can normally only publish their results in open repositories,” says Klein. “Another hope is that this data set will encourage more people to test and validate new methods that help base models to better handle tabular enterprise data.”

What SAP is doing

In addition to engaging in the open research community with SALT, SAP is developing the SAP Foundation Model to process tabular enterprise data. This AI model specifically for tabular data is intended to shorten the time to value for predictive tasks based on tabular data. The underlying model should be able to work immediately with tabular data with little or no additional training data. The PORTAL paper, which was published in conjunction with SALT, offers a first glimpse of what this model could look like.

Knowledge graphs play an important role here. They work on the basis of metadata – the who, what and when of data – through which links between information can be used. This enables a structured, networked representation of the data that AI models can easily understand and use. Using SAP Knowledge Graph, the SAP Foundation Model can be scaled to a variety of different use cases and customized through minor fine-tuning.

– – – – – –

Further links

👉 www.sap.com 

Photo: pixabay

You may be interested in the following

Contact info

Silicon Saxony

Marketing, Kommunikation und Ă–ffentlichkeitsarbeit

Manfred-von-Ardenne-Ring 20 F

Telefon: +49 351 8925 886

Fax: +49 351 8925 889

redaktion@silicon-saxony.de

Contact person: