Openai

Bar chart showing cumulative ABAP understanding success rates by model and feedback round.

SAP’s ABAP-1 Loses Every ABAP Benchmark, Even “Explaining”

Previous post (code generation benchmark): Benchmarking LLMs for ABAP Live benchmark results (old + new): abap-llm-benchmark.marianzeis.de In my first evaluation (based on the TH Köln benchmark paper), I extended the original setup with additional models and focused on a very concrete question: how well can LLMs generate ABAP code that actually compiles and passes ABAP Unit tests? I also tested SAP’s model ABAP-1, and it performed very poorly for code generation. To be fair: SAP also states this in the documentation. ABAP-1 is primarily meant for explaining ABAP code not for reliably generating full working implementations. ...

Bar chart showing cumulative ABAP code generation success rates by model and feedback round.

Benchmarking LLMs for ABAP: Why ABAP-1 Isn't a Code Generator (Yet)

Live benchmark results: abap-llm-benchmark.marianzeis.de In a lot of SAP webcasts and webinars, especially around AI, the question comes up very early: which model are you using, and which one do you recommend? For CAP and UI5 the answer is usually pretty simple: use the current best model from Anthropic. If you add good context via MCP servers from the community or SAP, you are basically fine. There is just a lot of public knowledge available, and most of it is in JavaScript/TypeScript, which LLMs handle extremely well. ...

Line chart of average monthly question quality scores (LLM-rated) over time.

Has the Quality of SAP Community Questions Gotten Worse? A Data-Driven Perspective

This post is a mirrored copy of my LinkedIn article, kept here so it remains searchable and independent from external platforms. You can still find the original on LinkedIn: LinkedIn Pulse article. The SAP Community is and will remain the central point of entry for all issues and discussions relating to SAP. Over the past few years (and since the migration at the beginning of the year) I have had a feeling that the quality of questions in the technology area (the area I follow most closely) has gradually declined. ...