Content for workshops on computer vision @ HPI's AI Service Center
-
Updated
Nov 9, 2024 - Jupyter Notebook
Content for workshops on computer vision @ HPI's AI Service Center
Implementation of TSDS: Data Selection for Task-Specific Model Finetuning. An optimal-transport framework for selecting domain-specific and task-specific training data to improve LLM finetuning and instruction tuning.
Harnessing Large Language Models for Curated Code Reviews
HyperView curates datasets and provides model introspection in hyperbolic and Euclidean geometries.
Image description/tagging tool
[ACL 2024 (Findings)] ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
AI-assisted curation and organization for large media datasets
NAACL 2025 | How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?
An image deduplication GUI, made for image generation models dataset deduplication using CLIP.
Annotate any LeRobot dataset with a swarm of agents. Turn raw demos into training-ready data for robotic foundation models.
Manage and process paired RGB and depth images, with options to view, export, and exclude images using various colormaps.
Keyboard-driven desktop tool for reviewing, cleaning, editing, and organizing object-detection datasets (image folder + CSV labels).
Pipeline for querying and turning NASA's ADS publications metadata into curated, analysis-ready datasets, topic maps, and citation networks.
Multi-axis VLM-as-judge data curation for robot demonstrations. Five specialist critics (visual / kinematic / task / strategy / safety) score every LeRobot episode with rationale and timestamp evidence. Model-agnostic — any OpenAI-compatible vision endpoint.
Your dataset discovery and curation buddy.
Biomedical Image Processing BAP (Scientific Research Project) - Piri Reis University
Comprehensive framework for curating and validating biomedical datasets for clinical AI applications
Dynamic cluster-based data sampling for efficient and long-tail-aware vision-language model pre-training.
AIWG training-complete framework — corpus-to-dataset pipeline with SKILL.md agentic surface and optional Python runtime backend. Marketplace plugin for AIWG.
A dataset curator for lora training for person loras
Add a description, image, and links to the dataset-curation topic page so that developers can more easily learn about it.
To associate your repository with the dataset-curation topic, visit your repo's landing page and select "manage topics."