Index Engines’ CyberSense®: Machine-Learning Approach

Index Engines’ CyberSense takes a unique approach in the implementation of machine learning by utilizing feature engineering techniques with a backend collection of 10 diversified, shallow, tree-based models. These models collaborate to accurately predict the presence of ransomware and other data corruption attacks.

CyberSense’s machine-learning approach analyzes and categorizes the infection strategies of over 1,500 different instances of ransomware into attack vector classifications. These classifications, as opposed to individual ransomware characteristics, enable CyberSense to recognize ransomware data attacks it has not previously encountered. previous encountered.

The feature information that CyberSense’s machine-learning algorithms digest includes combinations of the following: file similarity, Shannon entropy measures and thresholds, file creation/deletion/modification, file type/extension mismatches, file type changes, file corruption detection, file type frequency distributions, and well-known ransomware extensions. Additionally, some of the feature generation is also novel, such as the file similarity measure which can statistically distinguish the difference between normal file edit changes from the encryption used in ransomware attacks.  The feature information is based on full-content indexing of the data.  Index Engines has discovered via empirical studies that our use of file content-based features, as opposed to a file metadata only approach, is critical to accurately determining the presence of a data attack, while keeping false positives at a minimum.

CyberSense can scan files and databases on file networks or within backup images.  When implementing CyberSense to analyze backup data, the feature information is not only extracted from file and backup metadata, but also the data content. Leveraging the backup eliminates the need to scan any host computers or network servers. Index Engines’ support for direct backup image processing is based on an efficient, high-performance, multi-threaded architecture that enables backup stream processing to exceed 1TB/hour per node with the right hardware configuration. CyberSense can accurately predict the presence of ransomware by comparing just two backupsets from any one host, with a single backupset prediction algorithm planned for a future release.

When CyberSense scans live network servers, the feature information is extracted from multiple scans of the servers. CyberSense utilizes these analytics to accurately predict the presence of ransomware by comparing only two consecutive indexes of a server with a single index prediction algorithm planned for a future release.