| Student: | Malin Wen Xia | Abstract: High-quality datasets are essential for reliable and trustworthy AI systems. In the context of AI-driven 3D model generation, OpenSCAD - a script-based parametric CAD language - offers a unique opportunity due to its textual and modular structure, but its publicly available datasets often suffer from low quality, redundancy, and legal ambiguities. This thesis investigates how OpenSCAD datasets can be systematically assessed and curated to improve their suitability for AI training. A comprehensive quality assessment pipeline was developed and applied to 72,197 OpenSCAD projects (containing 189,158 .scad files in total) sourced from GitHub, Printables, and Thingiverse. The pipeline integrates legal, technical, and semantic evaluation stages, including license filtering and PII detection, dependency resolution and compilation checks, semantic and heuristic analysis, deduplication, and comment-based labeling. Each step is logged and documented to ensure reproducibility. After processing, the dataset was reduced to 55,573 high-quality projects with one .scad file each, significantly improving reliability, relevance, semantic richness, legal compliance, and interpretability, while maintaining design diversity. Compilation and dependency checks guaranteed syntactic validity, semantic filters removed low-information or mesh-like files, and deduplication minimized redundancy. NLP-assisted labeling and comment analysis enhanced readability and usability for downstream AI applications. The results demonstrate that domain-specific, automated quality assessment transforms raw OpenSCAD collections into curated, AI-ready datasets, operationalizing Trustworthy AI principles in practice. This work provides both a replicable methodology and a conceptual framework for quality assurance in code-based 3D design data, laying the foundation for downstream model evaluation and adaptation to other CAD languages or 3D formats. |
|---|---|---|
| Email: | malin.xia@tum.de | |
| Status: | STARTING | |
| Supervisor: |
Files
Documentation
--> GitRepo <--
Workflow
Start
- Topic specification
- Definition of work packages
- Composition of a project proposal and time plan
- Project Talk with Prof. Diepold
- Registration of the thesis
- Creation of a wiki page (supervisor)
- Creation of a gitlab repository or branch
- Access to lab and computers
Finalization
- Check code base and data
- Check documentation
- Provide an example notebook that describes the workflow/usage of your code (in your repo)
- Proof read written composition
- Rehearsal presentation
- Submission of written composition
- Submission of presentation
- Recording of presentation / Presentation in group meeting
- Final Presentation