Audio Assistant in MISP

The aim is to enhance the user experience of a MISP user with audio interaction.

What user have I had in mind:
A) Novice MISP user who is overwhelmed with the information presented. Reading is more cognitive demanding than listening, so this allows a novice user to go through more events using the same energy (at least for me) and increasing learning effectivity (at least for me).

B) CTI handler who shares attention focus and is working with other tools. By using audio, the screen/keyboard/mouse are free to interact with other tools.

Current state:

  • The content of an event (scope) is either read to the user via browser Speech API, which is entirely client-side. Second option is to pipe scope data via an LLM and have the output of the LLM being read, which is currently done via a locally run LLM.

Next steps:

  • Now: Test better models
  • Now: Extend scoping and include attributes
  • Later: Extend to take audio from user into account (bi-directional)

Note: This project currently has the lowest priority of 3 hackathon projects I am doing. If you like it, let me know, which helps setting priorities.

FYI, currently tested model is llama3.2:1b - despite very cheap to run (~1.3GB) produced summarization adds value. Let me know if you have time to test it with a slightly more potent model.

Using Gemma3:4B is a significant improvement, costs: 4.3GB

Plus exposing config parameters for plugin via “Administration” > “Server Settings & Maintenance” > “AudioExplain”:

  • Plugin.AudioExplain_ollama_url
  • Plugin.AudioExplain_ollama_model