During our time in the MSc of AI & Data Science programme at Frankfurt School of Finance & Management, we got to work on two real corporate AI projects for our Company Cooperation module in the curriulum. One with CLAAS, an agricultural machinery company. The other with RWE, a major European energy provider. Both pushed us far beyond what any textbook could teach.
Imagine a tractor breaking down in the middle of harvest season. Every hour it sits idle costs the farmer money. Workshops need to give fast, accurate repair quotes, and that means knowing exactly which parts are needed before the technician even opens the hood. That was our challenge with CLAAS. Our team of four had to build a system that could look at a machine’s reported issue and predict which components would likely need replacing. Sounds straightforward. It wasn’t.
We started with over two million invoice and inspection records pulled from different systems. They were messy, inconsistent, and full of gaps. Before we could even think about machine learning, we had to clean everything up. The trickiest part was what we called „temporal linking“. We had to match the exact moment an issue was reported with the moment a replacement part was ordered. Timing matters a lot here; connect the wrong records and your model learns the wrong patterns. We also had to build in strict compatibility checks. A predicted part is useless if it doesn’t actually fit the machine model in question. This phase took weeks. It wasn’t glamorous. But it was everything.
Once the data was ready, we trained an XGBoost multi-label classification model. We chose this approach because repairs rarely involve just one part. Machines fail in patterns. A worn bearing might take a seal and a gasket with it. The model needed to learn those combinations. Watching it start to make accurate predictions felt genuinely exciting. But we didn’t stop at handing over a working model. We dug into which features drove the predictions and why. That analysis gave CLAAS a clear roadmap for scaling the solution and improving how they collect data going forward.
While the CLAAS team worked on spare parts, another group of us took on a completely different problem at RWE. Power plants run on detailed engineering diagrams called P&ID drawings. These show every pipe, valve, pump, and sensor in the facility. The problem? Thousands of these diagrams only exist as scanned PDFs of old paper drawings. Finding specific equipment codes called KKS keys inside them takes hours of manual searching. Our job was to automate that process entirely.
Technical drawings are enormous. One of our test files was nearly 10,000 by 14,000 pixels. Most AI models simply can’t handle that. They crash or produce garbage output. Our solution was to slice each image into overlapping tiles of 1024×1024 pixels. The overlap was intentional; we didn’t want to accidentally cut a code in half at a tile boundary. But overlapping tiles created a new headache: duplicate detections. The same text would get flagged multiple times. We fixed this with Non-Maximum Suppression, a technique that keeps only the most confident detection and throws out the rest.
We started with Tesseract for text recognition. It’s a well-known OCR tool. But it struggled badly with the specialized fonts used in engineering drawings. We switched to PaddleOCR, a deep-learning-based engine, and the improvement was immediate and dramatic. Even then, we hit another wall. Blueprint space is tight, so KKS codes were often split across two lines. We had to write custom logic that measured the distance between text fragments and checked whether combining them produced a valid code pattern. It felt like teaching the system to read between the lines literally.
The RWE pipeline ended up achieving a 95.5% precision rate for key detection. We also added a symbol recognition module using YOLOv8 to identify 32 types of electrical components. That was a satisfying bonus. Both projects left us with the same core insight: real industrial data is messy by nature. It doesn’t arrive clean, labeled, and ready to use. Whether you’re dealing with fragmented repair invoices or faded paper blueprints from the 1980s, off-the-shelf tools will only take you partway there. The rest comes down to domain knowledge, creative problem-solving, and the patience to work through the unglamorous parts. We didn’t just build models. We built something that actually works in the real world, and that made all the difference.


Abhishek Sharma,
Master in Applied Data Science, Class of 2026
As part of the Company Cooperation module in their studies, Abhishek participated in a project with RWE.