Progettazione e sviluppo di una soluzione modulare basata su intelligenza artificiale generativa per l'integrazione di MLLM in ambienti enterprise

The growing adoption of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has opened new perspectives for the integration of generative artificial intelligence into enterprise environments. This thesis presents the design and development of ModulAIr: a modular platform oriented toward the use of MLLMs, capable of handling diverse conversational, visual, and document-based tasks, with native support for tool calling and multiple providers (OpenAI, Hugging Face, Ollama). The architecture, implemented in FastAPI and containerized with Docker, integrates a central orchestrator that manages sessions, history, metrics, and costs, leveraging Redis for low-latency persistence and InfluxDB for telemetry. The system is provider-agnostic and agent-based: through AutoGen, it enables multi-agent planning, task decomposition, and external tool invocation, including a fine-grained billing and logging subsystem. Developed and validated during a curricular internship at APRA Information Technologies [1979], the project was tested in real-world application scenarios. Validation includes UML diagrams, real use cases, and benchmarks on the /chat, /rag, /vision, and /agents modules, highlighting history consistency, end-to-end traceability, and component modularity. ModulAIr is proposed as a reusable foundation for enterprise applications, operational assistants, and advanced knowledge retrieval environments. Keywords: Artificial Intelligence, Generative Artificial Intelligence, Python, LLM, MLLM, FastAPI, AutoGen, Modular Orchestration, Tool Calling, Agent-based Systems, Logging, Billing, Multimodality, Retrieval-Augmented Generation (RAG), Redis, InfluxDB, Docker, REST API, Multi-provider AI, Conversational Agents, Local Models, Remote API-based Models, Enterprise Integration, Hugging Face, OpenAI, Ollama.

La crescente diffusione dei modelli linguistici di grandi dimensioni (LLM) e dei modelli multimodali (MLLM) ha aperto nuove prospettive per l’integrazione dell’intelligenza artificiale generativa in contesti aziendali. Questa tesi presenta la progettazione e lo sviluppo di ModulAIr: una piattaforma modulare orientata all’uso di MLLMs, capace di gestire diversi task conversazionali, visivi e documentali, con supporto nativo al tool-calling e a più provider (OpenAI, Hugging Face, Ollama). L’architettura, implementata in FastAPI e containerizzata con Docker, integra un orchestratore centrale che gestisce sessioni, cronologia, metriche e costi, sfruttando Redis per la persistenza a bassa latenza e InfluxDB per la telemetria. Il sistema è provider-agnostico e agent-based : tramite AutoGen abilita pianificazione multi-agente, decomposizione dei compiti e invocazione di strumenti esterni, includendo un sottosistema di billing e logging a grana fine. Nell’ambito di un tirocinio curricolare presso APRA Information Technologies APRA Information Technologies [1979], il progetto è stato sviluppato e validato su scenari applicativi reali. La validazione comprende diagrammi UML, casi d’uso reali e benchmark sui moduli /chat, /rag, /vision e /agents, evidenziando coerenza dello storico, tracciabilità end- to-end e modularità dei componenti. ModulAIr si propone come base riutilizzabile per applicazioni enterprise, assistenti operativi e ambienti avanzati di knowledge retrieval. Keyword: Intelligenza Artificiale, Intelligenza Artificiale Generativa, Python, LLM, MLLM, FastAPI, AutoGen, Orchestrazione Modulare, Tool Calling, Agent-based Systems, Logging, Billing, Multimodalità, Retrieval-Augmented Generation (RAG), Redis, InfluxDB, Docker, API REST, Multi-provider AI, Conversational Agents, Modelli Locali, Modelli Remoti Via API, Enterprise Integration, Hugging Face, OpenAI, Ollama