cv

Applied Scientist / Research Engineer — Multimodal & Foundation Models · Research-to-System Translation. Curriculum vitae (PDF download available above).

Basics

Name Simone Rossetti
Label Applied Scientist / Research Engineer
Email simone[dot]rossetti[at]live[dot]com
Phone (+39)[space]349[space]105[space]9384
Url https://rossettisimone.github.io/
Summary Multimodal and foundation model specialist with first-author publications at NeurIPS, ICCV, and ECCV. Translates research into robust systems: from methodology design (data-efficient training, cross-modal alignment, evaluation frameworks) to implementation (distributed training pipelines, model serving, CI-based validation). Combines evaluation rigor with system robustness. Industrial R&D orientation with four years leading applied research from prototype to field-validated AI systems.

Work

  • 2026.03 - Present
    Independent
    Transition — R&D in embodied AI
    Pursuing research and development in embodied AI. Focus: multimodal foundation models (Vision-Language, VLA), large-scale Transformers, probabilistic modeling; embodied AI and generative modeling; system-level pipelines and production-grade intelligent systems.
    • Goal: continue R&D in embodied AI; contribute to next-generation AI platforms and intelligent embodied systems.
  • 2021.10 - 2026.02
    Co-Founder and Lead Applied Researcher
    DeepPlants S.r.l.
    Defined technical vision for multimodal AI-driven decision-support systems and led a 5-person R&D team, advancing prototypes from TRL1 to TRL5 within 12 months.
    • Designed data-efficient training strategies and implemented distributed PyTorch pipelines (Hugging Face, OpenCLIP, xFormers), reducing annotation requirements by 70% while maintaining ≥85% performance across heterogeneous conditions.
    • Developed large-scale multimodal evaluation suites (~50K VQA, ~25K structured QA) and established reproducible training and evaluation frameworks with CI-based validation for internal R&D standards.
    • Conducted masked pretraining and finetuned vision-language models (Qwen3-VL, LLaMA 3); implemented containerised serving (Ollama, vLLM, SGLang) and agentic pipelines (LangGraph, RAG) with instrumentation for latency profiling.
    • Defined technical methodologies and experimental validation strategies for EU-funded research initiatives in AI-driven sustainable agriculture.
  • 2021.01 - 2021.10
    Research Fellow
    AlcorLAB – Sapienza University (DIAG)
    Designed and implemented distributed multi-GPU training pipelines for spatiotemporal models on AVA and YouTubeVIS benchmarks.
    • Conducted systematic ablation studies to quantify architectural trade-offs under memory and latency constraints, establishing evaluation protocols for cross-dataset robustness.

Education

  • 2021.11 - 2025.01

    Rome, Italy

    PhD
    Sapienza Università di Roma
    Computer Science Engineering
    • Advisors: Pirri F.; Amerini I.
    • Thesis: Reducing supervision in semantic segmentation through advancements in Bayesian prior modelling (UNITesi 2025)
  • 2019.10 - 2021.10

    Rome, Italy

    MSc
    Sapienza Università di Roma
    Artificial Intelligence and Robotics
    • Master's thesis on fast instance segmentation and tracking for YouTube-VIS 2021
  • 2015.10 - 2019.03

    Rome, Italy

    BSc
    Università degli Studi Roma Tre
    Computer Engineering
    • Bachelor's thesis on iterative learning control (ILC) algorithm for 2 DOF robotics arm in MATLAB/SIMULINK

Certificates

DeepLearn '22
Advanced Training 2022-01-01
ICVSS '22
Advanced Training 2022-01-01

Publications

Skills

Multimodal & Vision-Language
Vision-Language Models (CLIP, BLIP, Qwen-VL/LLaMA/GPT)
Vision-Language-Action (PaLM-E, RT-X, GR00T-N1)
Segment Anything (SAM), Object Detection, Instance Segmentation, Action Recognition, Multimodal Fusion
Learning Paradigms
Weakly- and Unsupervised Segmentation
Self-Supervised (DINO, SwAV, SeLA), Contrastive (SimCLR, MoCo, BYOL)
Masked Modeling (BERT, MAE), Diffusion, VAEs, GANs
Training & Distributed Optimization
Multi-GPU (PyTorch DDP, FSDP, DeepSpeed ZeRO, Megatron-LM TP)
LoRA, PyTorch Lightning, Hugging Face, xFormers
Evaluation & Benchmarking
Benchmark Design, Ablation Studies, Cross-Dataset Robustness
Weights & Biases, MLflow, Hydra, OmegaConf
Model Serving & Agentic Integration
LangGraph, LangChain, LlamaIndex, SGLang, Ollama, vLLM
Milvus, Qdrant, Multimodal RAG, Docker, FastAPI, CI/CD
Tooling & Workflow Automation
Python, PyTorch, TIMM, OpenCLIP, Torchvision, OpenCV, Albumentations
TensorBoard, Git, Linux

Languages

Italian
Native (C2)
English
Fluent (C1)

Projects