Catalyst Data Profiling
File-level knowledge you need to make decisions on unstructured data

The Catalyst Data Profiling Engine processes all forms of unstructured files and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and, optionally, what key terms are in it.

High-level summary reports allow instant insight into enterprise storage providing never-before knowledge of data assets. Through this process, mystery data can be managed and classified, including content that has outlived its business value or that which is owned by ex-employees and is now abandoned on the network.

IE advantage

Data profiling, or as some know it - file analysis, relies on an enterprise class index of metadata from user files and email databases such as last modified or accessed time, number of duplicates, size, owner, location, file type and more. Indexing occurs at unprecedented speed and efficiency, tackling environments that measure data in petabytes. Integration with Active Directory allows added intelligence to make decisions about active/inactive users, departmental groups and more.

Optionally, data profiling can look beyond metadata and go deep within documents and email finding content supporting keyword searches or even confidential information such as personally identifiable information (PII) or compliance audits for sensitive content misplaced behind the firewall in PSTs or on the wrong server. Once data is classified the Catalyst’s disposition module can be utilized to take action on the data.

Benefits of Data Profiling

  • Flexible Deployment: Catalyst is available as software that supports connection to network shares, desktops, email servers, SharePoint,  backup tape and other sources.
  • Easy Discovery of Data Sources:  Connect to network sources quickly and easily.  Specify the IP address range of the sources for processing, or leverage the AD integration to automate the discovery of network shares. 
  • Unprecedented Performance:  Catalyst delivers the industry’s fastest indexing speeds on the market today - 1 TB/hour/node. 
  • Intelligently Clean up “Admin” Owned Data: Catalyst allows for intelligent analysis of these files allowing for the ownership to be cleaned up and reassigned to the rightful user or department. 
  • Dynamic Reporting:  Can be used to further filter the results and refine the analysis based on your needs.  Pre-defined reports exist that focus on data age, location, owner, access times, file types, size and more. 
  • Powerful Active Directory Integration: Users belong to groups in AD and Catalyst can summarize and profile by these groups, including the inactive user group for analysis of ex-employees data.
  • Automated Action Queries and Reports: Store pre-defined reports and set up a schedule for reports to be run.  All reports can be logged and managed for historical views into the data
  • Deep Content Analysis: Using full content indexing files and email (Exchange, Notes, etc.) can be processed and full text profiled for keywords or sensitive content including Social Security or credit card numbers. 
  • Flexible Disposition Options: can manage the disposition of the content including defensible deletion, copying and moving while ensuring metadata is not corrupted, and archiving of sensitive content with long-term value. 
  • Extreme Scalability:  Data profiling can help manage small user shares that start at 5TB and scale to the largest enterprise repositories that consisting of petabytes of unstructured user data. 
  • ACLS Security: Integrates with Active Directory to support security assessments and audits.  Allows organizations to detect sensitive documents and determine who has access to them, or investigate employees and determine what they have access to.