Implement data-filtering features in Cocktailparty

Up until now, cocktailparty has focused solely on streaming vast amounts of cybersecurity data like passiveDNS or Certificate Transparency log. This can become overwhelming for end-users as it can amount to several hundred of GB of data per day.

To truly embody the ‘Cocktail Party effect’, this task plans to add support for filtering to ensure that users can zero in on relevant data amidst the noise, just like its namesake.

Their are several possible sub-tasks:

  • Implement GenStage into cocktailpary to create data processing module, and get backpressure as a bonus,
  • Implement regex filtering on text,
  • Implement json filtering on de-serialized JSON streams,
  • Implement poppy bloom filters support via ex_poppy for added privacy - and bigger filtering lists (like lists from typosquatting-finder)
1 Like