A cloud-based (CRM) data storage system with query support is provided. The system includes a memory storing processor-executable routines and a processor configured to execute routines to initiate a data backup operation for customers of the system. The processor is configured to receive source data having at least one of data schema, records/CRM data, and blob data and to transform the source data into a columnar storage format to generate converted source data and corresponding index files. Further, versioning data for the source data, converted source data and corresponding index files for each customer are stored. The processor is configured to receive a data query having query parameters from customers and to identify index files and/or data files to be scanned from the converted source data, and to scan the identified index files and/or data files to generate data query results.
G06F 16/2458 - Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A system includes a data management server and a data store. The data store uses an external file system to store the data of a client virtual machine (VM) which uses an internal file system. The server comprises an agent VM. Responsive to receiving a request to retrieve a file from a filesystem snapshot, the agent VM determines a path of the internal files system of the client VM and mounts a virtual file system at the agent VM based on the determined path. The virtual file system represents the internal file system of the client virtual machine. The agent VM uses a block device protocol to translate the internal file address to an external file address of the external file system. The agent VM retrieves the file stored in the data store based on the external file address and provides the retrieved file to the target device.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
3.
System and method for dynamically managing data protection recovery point objective SLA
The invention provides a dynamic data-backup scheduling system. The system includes a memory storing one or more processor-executable routines and a processor communicatively coupled to the memory. The processor is configured to execute one or more processor-executable routines to determine a dynamic backup schedule for one or more computing machines based upon data change patterns of the respective computing machines. The processor is further configured to access user inputs corresponding to data backup for the one or more computing machines. The user inputs comprise at least one of a tolerable data loss threshold, critical data type, file sensitivity data, and exceptions data time period information. The processor is further configured to create one or more service level agreement (SLA) configurations for data backup based on the user inputs. The processor is further configured to access historical data corresponding to one or more data backup cycles of the one or more computing machines. The historical data comprises one or more of time series changes, the addition and/or change of data information for pre-determined data sets, and corresponding file metadata of the data sets. The processor is further configured to implement one of the active and passive modes of data backup using the user inputs and the historical data. The processor is further configured to trigger a backup job for one or more computing machines based on a comparison between the SLA configurations and dynamic information received about data addition and/or data change of the computing machines in the active mode. The processor is further configured to trigger a backup job for one or more computing machines based on a comparison between the SLA configurations and historical time series change and/or addition of data of the computing machines in the passive mode.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A system includes a duplicative data store different from a cloud storage. The data store stores duplicative copies of data that are used by one or more virtual machines. The system receives a request to retrieve from the cloud storage data associated with a VM. The cloud storage is configured to store data in a first chunk granularity larger than a second chunk granularity of the duplicative data store. The system determines whether a duplicative copy of the requested data is stored in the data store, and responsive to determining the duplicative copy of the requested data is stored in the data store, the system retrieves the duplicative copy of requested data from the data store. The system bypasses a retrieval of the chunk from the cloud storage, and provides the duplicative copy as a response to the request.
The invention relates to a system and method for restoring data of a virtual machine. The system includes a memory for storing one or more processor-executable routines, and a processor communicatively coupled to the memory to execute the one or more processor-executable routines to access a plurality of virtual machine snapshots of the virtual machine from a storage. The snapshots include block-level snapshots. The processor is configured to scan the plurality of virtual machine snapshots to identify one or more malicious files present in the snapshots, and generate a report with details of malicious files present in the snapshots. The processor is configured to implement a first workflow to patch the identified malicious files while restoring the data. The processor is further configured to implement a second workflow to identify snapshots that comprise block signatures similar to the malicious files and to mark the identified snapshots as malicious snapshots.
A system for dynamic resource allocation during a data backup and/or restore of a backup data is presented. The system includes a resource allocation map generator configured to generate a resource allocation map for the current data backup and/or restore based on a mathematical model, real-time operating data corresponding to operating states of one or more resources, and historical data corresponding to data back-up and restore of one or more historical datasets. The system further includes a resource allocation recommender configured to generate a recommendation for resource allocation for the current data backup and/or restore based on the resource allocation map and a threshold value corresponding to a particular resource. The system moreover includes a resource allocator configured to dynamically initiate a change in resource allocation based on the generated recommendation. A related method is also presented.
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
7.
SYSTEM AND METHOD FOR PERFORMING A BACKUP OPERATION
A backup agent for performing a backup operation is provided. The backup agent comprises a memory storing one or more processor-executable routines and a processor communicatively coupled to a data storage system and configured to access unstructured data stored in therein. The processor comprises a master node and a plurality of proxy nodes. The master node is configured to perform a backup operation by generating a plurality of threads to perform a scan operation; wherein during the scan operation a plurality of batches of data stored on a data storage device is scanned. The processor is further configured to create a global queue of the plurality of batches of data upon completion of the scan and assign the plurality of batches of data from the global queue to a plurality of proxy nodes and/or to the master node. The master node and each proxy node are configured to perform an upload operation by uploading the assigned batch of data on the cloud network; wherein the master node and the proxy nodes operate concurrently.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
8.
SYSTEM AND METHOD FOR PERFORMING ANTIVIRUS SCAN OF A VIRTUAL MACHINE
A system for performing an antivirus scan of backup snapshots of a virtual machine is provided. The system includes a memory storing one or more processor-executable routines and a processor communicatively coupled to the memory. The processor is configured to execute the one or more processor-executable routines to access a plurality of backup snapshots of the virtual machine and perform a complete antivirus scan of one or more snapshots to identify a reference snapshot. The reference snapshot includes one or more malicious files. The processor is further configured to compare file system metadata of a first snapshot with respective file system metadata of the reference snapshot to identify files that have changed between the first and reference snapshots and perform the antivirus scan of the identified changed files to detect one or more malicious files and to generate an updated list of the malicious files. The processor is further configured to access a second snapshot and the updated list of malicious files with corresponding metadata of the files and perform first, second and third screenings of the second snapshot to detect an infected snapshot. The processor is further configured to compare disk offsets and checksum of corresponding data at the disk offsets, metadata and file checksum respectively of the malicious files with corresponding files of the second snapshot to perform the first, second and third screenings in a sequential manner. The processor is further configured to repeat the first, second and third screenings for the plurality of backup snapshots of the virtual machine.
A data backup system configured to enable point in time recovery is presented. The data backup system is configured to enable point in time recovery using a full backup storage space and a unique log backup storage space thereby enabling parallel full backups and archive log backups. A related method is also presented.
A data restore system is provided. The data restore system includes a backup data storage configured to store data for a client and a data restore module configured to receive a restore trigger from the client and to initiate restore operation for selected data from the backup data storage in response to the received trigger. The data restore module is further configured to receive information regarding the selected data to be restored and access a metadata store to receive metadata information for the selected data and provide the metadata information and the downloaded data blocks to a controller to facilitate sorting of the downloaded data blocks based on the files they belong to and store the downloaded restored data to a target data storage. The data restore module is further configured to interact with the checkpointing module to track the progress of restore operation in persistent storage and to minimize rework when restore operation is restarted from interrupt.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
A file system and a related method are presented. The file system includes a data storage including a plurality of data blocks; a merge index including a plurality of namespace entries, wherein the plurality of namespace entries include a plurality of blockmap entries and a plurality of local reference entries; a deduplication database including a plurality of deduplication indices and a plurality of global reference entries for a plurality of datasets; and an indexing system configured to generate the plurality of namespace entries and the plurality of global reference entries.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
A file system and a related method are presented. The file system includes a data storage including a plurality of data blocks; a merge index including a plurality of namespace entries, wherein the plurality of namespace entries include a plurality of blockmap entries and a plurality of local reference entries; a deduplication database including a plurality of deduplication indices and a plurality of global reference entries for a plurality of datasets; and an indexing system configured to generate the plurality of namespace entries and the plurality of global reference entries.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
System for delivering an event journal during a back-up session in a distributed file system is presented. The system includes an event intake module, a load balancer, a plurality of object creation modules, a journal manager, and a journal service module. Each object creation module of the plurality of object creation modules further includes an ingestor and a drainer. A related method is also presented. The system and method provide for reliable and time-ordered delivery of events in the event journal.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
A data management system manages concurrent readers and writers for large file scans. The data management system may read a plurality of data chunks of the file starting from different offsets and generate a bounded number of read requests, which causes a data chuck identifiable by a data offset to be loaded into a data buffer. The system may queue the loaded data chunks for generating write requests to release the loaded data chunks. One or more write requests are generated responsive to one or more data chunks being associated with a consecutive order of data offsets being successfully loaded to data buffers. The system may write data chunks released from the buffer-rounded reading stage to the data storage in a checkpointed writing stage. The checkpointed writing stage creates a checkpoint based on the data offset of the data chunks that have been completely transferred to the data storage.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/176 - Support for shared access to filesFile sharing support
15.
SYSTEM AND METHOD FOR DATA BACK- UP USING A PROXY POOL
A data back-up system configured to back-up one or more data sets from one or more devices to a data back-up server by using a proxy pool is presented. The data back-up system includes a load-balancer configured to distribute the one or more data sets across the proxy pool. The load balancer includes a data receiver configured to receive types of files, number of files, and total size of each file in the one or more data sets; a load estimator configured to estimate a weighted average load of each data set based on the number of files, the total size of each file, a compressibility factor for each file type, and an encryption factor for each file type; and a load distributor configured to distribute the one or more data sets as a plurality of workloads across the proxy pool.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
16.
SYSTEM AND METHOD FOR DATA DEDUPLICATION USING A BACKUP DATA BLOCK
A system and a method for client-side deduplication system for a plurality of backup streams, generated by a backup and recovery client from a client database, is presented. The system includes a stream handler configured to generate a unique file name for an underlying file of each backup stream of the plurality of backup streams based on one or more data blocks in each backup stream. The system further includes a file creator configured to create a data file corresponding to each backup stream of the plurality of backup streams in a local cache of a client database server, wherein each data file has a file name corresponding to the unique file name generated by the stream handler. The client-system furthermore includes a dedupe module configured to dedupe subsequent backup streams based on the data files in the local cache.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/215 - Improving data qualityData cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
17.
SYSTEM AND METHOD FOR REFERENCE-AWARE APPLICATION RECREATION IN CONTAINER DEPLOYMENT ENVIRONMENTS
A system for reference-aware application recreation in a container deployment environment is presented. The system includes a reference detection module configured to detect and store one or more reference paths corresponding to each resource type of a plurality of resources in the container deployment environment; a resource ordering module configured to generate a recreation sequence by ordering the plurality of resources based on corresponding references at the reference paths, such that a referenced resource is recreated before the referring resource; and an application recreation module configured to recreate an application based on the recreation sequence. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
18.
SYSTEM AND METHOD FOR DYNAMIC SCHEDULING OF BACKUPS IN SCALE-OUT COMPUTE ENVIRONMENTS
A data management system may schedule data backup operations. The system receives a request for backing up data stored in one or more disks in a data source and identifies one or more proxy slots for backing up the data. For at least one of the disks, the system maps the disk to each of the one or more proxy slots and maps each mapped proxy slot to a data store of the one or more data stores. The system estimates a scan duration time for backing up data from the disk using each mapped proxy slot with each mapped data store corresponding to the mapped proxy slot, selects a proxy slot based in part on the estimated scan duration time as the backup proxy slot for the disk, and instructs the selected proxy slot for backing up the data from the corresponding disk.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A data management system may maintain a total-size counter for the file system. The total-size counter represents a sum of data size related to snapshots backed up to the file system. The data management system may perform data operation cycles to the file system. Each data operation cycle corresponds to a snapshot that includes files. Each data operation cycle may include incrementing the total-size counter by the data size of the files in the snapshot exchanged with the file system and adding, to a snapshot record, the amount of increment in incrementing the total-size counter as an increment-size counter. A data management system may perform a correction operation to correct the total-size counter. The correction operation may change the total-size counter by a difference between the total of the increment-size counters in the snapshot records and the total data size of file data exchanged with the file system.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
A system to optimize scheduling of a data backup and/or restore of a backup data in a data backup/restore environment is presented. The system includes a training module configured to train an artificial intelligence (AI) model based on historical data corresponding to data backup and/or restore of one or more training datasets. The system further includes a time estimator configured to estimate an estimated time taken for the data backup and/or restore of the backup data to a data backup server or a restore location based on the trained AI model and operating data corresponding to operating states of one or more resources in the data backup/restore environment. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
A system for performing root-cause analysis of cost and/or usage anomalies in a shared data protection environment is presented. The shared backup environment includes a backup/restore system configured to backup data in a storage server and/or restore data from the storage server. The system is configured to perform the root-cause analysis based on storage server data and backup/restore system telemetry data.
A system for secure recovery of an application group in a container deployment environment is presented. The system includes a backup controller configured to access an application group token and generate a corresponding backup token. The system further includes a backup module configured to initiate a backup based on the backup token and create a corresponding recovery point on a backup server. The system further includes a recovery access token module configured to access a recovery access token for a determined recovery point. The system further includes a recovery controller configured to generate a recovery token corresponding to the determined recovery point based on the recovery access token, and a recovery module configured to initiate a recovery of the application group from the backup server in a destination cluster based on the recovery token. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
23.
Space-efficient change journal for a storage system
A space-efficient change journal for a storage system is provided. The storage system includes a memory having computer readable instructions stored therein. The system further includes a processor configured to access a log structure merge (LSM) tree-based metadata index having metadata for the storage system. The LSM tree-based metadata index includes indices placed in a plurality of indexing layers and one or more indices are merged within the indexing layers in response to updates to metadata, or as a background task. The processor is configured to identify one or more indices of the LSM tree-based metadata index as entries of a change journal of the storage system. The processor is further configured to maintain the change journal of the storage system based upon the identified entries.
Techniques and mechanisms described herein provide for global deduplication in a cloud-based storage system. According to various embodiments, a global segment reference map can be created for data segments when a data segment has not been previously added to the global segment reference map. For each data segment not added to the global segment reference map, those data segments can be deleted from a cloud storage location.
A system for dynamically optimizing redundant backup of one or more data sets of a plurality of data sets from a client device to a tertiary storage is presented. The system includes a user input module, a parameter comparison module, a backup path selector, and a redundant backup module. The system is configured to dynamically switch between two backup paths including: (A) direct redundant backup of the data set from the client device to the tertiary storage, or (B) back up of the data set from the client device to a secondary storage and redundant backup of the data set from the secondary storage to the tertiary storage. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
26.
Cloudcache implementation for an object storage-based file system
The present invention discloses a file storage system including an object storage for storing data blocks for a client, a merge index database to store metadata corresponding to the stored data blocks using a merge index, a cloudcache located on a premise of the client, and a cloudcache implementation module communicatively coupled to the object storage, the cloudcache and the merge index database. The cloudcache implementation module facilitates data backup and restore operations for the client in accordance with a data retention policy, where one or more data blocks are stored on the cloudcache and a sync operation is performed between the cloudcache and the object storage. A backup of the data blocks is performed to the cloudcache in a backup operation, data blocks are compacted, and a restore operation for data blocks stored on the cloudcache are performed in accordance with the data retention policy.
A server manager for detecting ransomware includes a server interface to retrieve, from a storage device, a backup of a plurality of files stored by a client device. A ransomware detection module includes a statistical filter to generate a standard pattern of file activities of the client device for a time period. A statistical behavior analysis is performed on the backup of the plurality of files based on the standard pattern to identify a portion of the backup corresponding to a statistical anomaly different from the standard pattern. The statistical anomaly corresponds to an abnormal file activity. An entropy detector generates an entropy score for the portion of the backup. The entropy score represents a randomness of a distribution of bits in a block of a file in the portion of the backup. It is determined whether the backup includes the ransomware based on the generated entropy score.
A system for performing ransomware scan is presented. The system includes a snapshot access module configured to access a base snapshot corresponding to a dataset. The system further includes a log access module configured to access a log of modified metadata and/or data blocks from a data back-up server corresponding to a subsequent snapshot versus the base snapshot. The system moreover includes an incremental block module configured to download one or more incremental metadata and/or data blocks from the data back-up server based on the log of modified metadata and/or data blocks. The system further includes a snapshot write module configured to write the one or more incremental metadata and/or data blocks on the base snapshot to generate an incremental snapshot. The system furthermore includes a ransomware scan module configured to scan the incremental snapshot to check for ransomware. A related method is also presented.
G06F 21/56 - Computer malware detection or handling, e.g. anti-virus arrangements
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A system for scanning a file system is presented. The system includes a memory storing one or more processor-executable routines; and a processor communicatively coupled to the memory. The processor is configured to execute the one or more processor-executable routines to execute a file system scan using a depth-first concurrent scan method; create one or more checkpoints during the file system scan based on one or more predefined time intervals; and restart a scan from a latest checkpoint of the plurality of checkpoints. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
A data management server may receive data associated with a blockchain unit generated on a blockchain. The data received may include on-chain data and off-chain data. The data management server may create a data collection associated with the blockchain unit. The data collection may include the received data that is stored in one or more entries of transactions associated with the blockchain unit. One of the entries of transactions may include the on-chain data that is stored on the blockchain. The data management server may store, off-chain, the data collection associated with the blockchain unit. The data management server may generate an off-chain address for a user to retrieve the data collection. The off-chain address allows the user to review one of the entries of transactions off-chain. The off-chain address may be a global namespace that can be accessed without a particular system or file structure
An incremental backup agent performs backup operations that synchronize database on client side to a server database. In one embodiment, such backup operations are incremental backups, where the agent may identify differences between the current directory with the latest backed up version. The agent may issue a direct RPC using SMB protocols or NFS protocols to fetch all entries of directories with metadata in a single RPC call, instead of issuing one call to fetch metadata for each directory entry. The agent may identify changes with efficiency by performing checksum changes in a DFS manner. Starting from a root directory, the agent may generate a checksum for each directory and compare the checksums on the client side with the retrieved fingerprints, and if the backup agent identifies that the fingerprints match, the backup agent may then go to a deeper level and compare the fingerprints for child directories.
A data management system manages concurrent readers and writers for large file scans. The data management system may read a plurality of data chucks of the file starting from different offsets and generate a bounded number of read requests, which causes a data chuck identifiable by a data offset to be loaded into a data buffer. The system may queue the loaded data chucks for generating write requests to release the loaded data chunks. One or more write requests are generated responsive to one or more data chunks being associated with a consecutive order of data offsets being successfully loaded to data buffers. The system may write data chucks released from the buffer-rounded reading stage to the data storage in a checkpointed writing stage. The checkpointed writing stage creates a checkpoint based on the data offset of the data chucks that have been completely transferred to the data storage.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/176 - Support for shared access to filesFile sharing support
33.
System and method for context-aware application management in container deployment environments
A system for context-aware application group management in a container deployment environment is presented. The system includes a memory storing one or more processor-executable routines and a processor. The processor is configured to execute the one or more processor-executable routines to receive an instance identifier (ID) corresponding to an application group based on a re-registration request for the application group by the user; identify an instance corresponding to the instance ID from an instance database and a context based on the instance from a context database; generate a re-registration token comprising a context ID corresponding to the context, and initiate a re-registration workflow to create a new instance of the application group based on the re-registration token and the context ID. A related method is also presented.
A system for automatically identifying an application group in a container deployment environment is presented. The system includes a reference detection module configured to detect and store one or more reference paths corresponding to each resource type of a plurality of resources in the container deployment environment. The system further includes a resource classification module configured to assign a resource class to each resource type of the plurality of resources. The system moreover includes a resource grouping module configured to group the plurality of resources into one or more resource groups, for each namespace, based on the corresponding resource type, resource class, and one or more reference paths. The system furthermore includes an application group definition module configured to generate an application group definition based on the one or more resource groups. A related method is also presented.
A file system and a related method are presented. The file system includes an object storage configured to store file data for one or more files and a plurality of namespace entries corresponding to file data and/or metadata of the one or more files as one or more objects. Each namespace entry of the plurality of namespace entries includes an operation type conducted on the file data and/or metadata captured in a particular snapshot and a version number corresponding to the particular snapshot. The file system further includes an indexing system configured to generate the plurality of namespace entries; store the plurality of namespace entries as one or more objects in the object storage; and identify, in response to a search query, one or more files for retrieval from the object storage based on a list of the plurality of namespace entries sorted on the version numbers.
A file system and a related method are presented. The file system includes a data storage including a plurality of data blocks; a merge index including a plurality of namespace entries, wherein the plurality of namespace entries include a plurality of blockmap entries and a plurality of local reference entries; a deduplication database including a plurality of deduplication indices and a plurality of global reference entries for a plurality of datasets; and an indexing system configured to generate the plurality of namespace entries and the plurality of global reference entries.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
A data management system may maintain a total-size counter for the file system. The total-size counter represents a sum of data size related to snapshots backed up to the file system. The data management system may perform data operation cycles to the file system. Each data operation cycle corresponds to a snapshot that includes files. Each data operation cycle may include incrementing the total-size counter by the data size of the files in the snapshot exchanged with the file system and adding, to a snapshot record, the amount of increment in incrementing the total-size counter as an increment-size counter. A data management system may perform a correction operation to correct the total-size counter. The correction operation may change the total-size counter by a difference between the total of the increment-size counters in the snapshot records and the total data size of file data exchanged with the file system.
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/174 - Redundancy elimination performed by the file system
A data restore system is provided. The data restore system includes a backup data storage configured to store data for a client and a data restore module configured to receive a restore trigger from the client and to initiate restore operation for selected data from the backup data storage in response to the received trigger. The data restore module is further configured to receive information regarding the selected data to be restored and access a metadata store to receive metadata information for the selected data and provide the metadata information and the downloaded data blocks to a controller to facilitate sorting of the downloaded data blocks based on the files they belong to and store the downloaded restored data to a target data storage. The data restore module is further configured to interact with the checkpointing module to track the progress of restore operation in persistent storage and to minimize rework when restore operation is restarted from interrupt.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
A file system and a related method are presented. The file system includes a data storage including a plurality of data blocks; a merge index including a plurality of namespace entries, wherein the plurality of namespace entries include a plurality of blockmap entries and a plurality of local reference entries; a deduplication database including a plurality of deduplication indices and a plurality of global reference entries for a plurality of datasets; and an indexing system configured to generate the plurality of namespace entries and the plurality of global reference entries.
G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/174 - Redundancy elimination performed by the file system
A system to optimize scheduling of a data backup and/or restore of a backup data in a data backup/restore environment is presented. The system includes a training module configured to train an artificial intelligence (AI) model based on historical data corresponding to data backup and/or restore of one or more training datasets. The system further includes a time estimator configured to estimate an estimated time taken for the data backup and/or restore of the backup data to a data backup server or a restore location based on the trained AI model and operating data corresponding to operating states of one or more resources in the data backup/restore environment. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A resource allocation map generator generates a resource allocation map for a current data backup and/or restore based on a mathematical model, real-time operating data corresponding to operating states of one or more resources, and historical data corresponding to data back-up and restore of one or more historical datasets. A resource allocation recommender generates a recommendation for resource allocation for the current data backup and/or restore based on the resource allocation map and a threshold value corresponding to a particular resource. A resource allocator dynamically initiates a change in resource allocation based on the generated recommendation.
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
42.
System and method for redundant backup of datasets
A system for dynamically optimizing redundant backup of one or more data sets of a plurality of data sets from a client device to a tertiary storage is presented. The system includes a user input module, a parameter comparison module, a backup path selector, and a redundant backup module. The system is configured to dynamically switch between two backup paths including: (A) direct redundant backup of the data set from the client device to the tertiary storage, or (B) back up of the data set from the client device to a secondary storage and redundant backup of the data set from the secondary storage to the tertiary storage. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
43.
SYSTEM AND METHOD FOR REFERENCE-AWARE APPLICATION IDENTIFICATION IN CONTAINER DEPLOYMENT ENVIRONMENTS
A system for identifying a plurality of resources that define an application in a container deployment environment is presented. The system includes a reference detection module configured to detect and store one or more reference paths corresponding to each resource type. The system includes a resource identification module configured to receive at least one information corresponding to an application definition from a user and identify each resource corresponding to at least one information. The system includes an application definition module configured to (a) scan one or more references at a reference path of each identified resource to identify one or more additional referenced resources; (b) repeat step (a) for the one or more additional referenced resources until all the resources of the plurality of resources that define the application are identified; and (c) generate an application definition based on all the resources identified. A related method is also presented.
A system for reference-aware application recreation in a container deployment environment is presented. The system includes a reference detection module configured to detect and store one or more reference paths corresponding to each resource type of a plurality of resources in the container deployment environment; a resource ordering module configured to generate a recreation sequence by ordering the plurality of resources based on corresponding references at the reference paths, such that a referenced resource is recreated before the referring resource; and an application recreation module configured to recreate an application based on the recreation sequence. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
45.
System and method for secure recovery of application group in container deployment environments
A system for secure recovery of an application group in a container deployment environment is presented. The system includes a backup controller configured to access an application group token and generate a corresponding backup token. The system further includes a backup module configured to initiate a backup based on the backup token and create a corresponding recovery point on a backup server. The system further includes a recovery access token module configured to access a recovery access token for a determined recovery point. The system further includes a recovery controller configured to generate a recovery token corresponding to the determined recovery point based on the recovery access token, and a recovery module configured to initiate a recovery of the application group from the backup server in a destination cluster based on the recovery token. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
46.
Folder scan system and method for incremental backup
A folder scan system configured to identify modified folders in a storage module including a plurality of folders during an incremental backup scan is presented. The folder scan system is configured to identify modified folders using a learning-based technique. A related method is also presented.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
47.
Data guardianship in a cloud-based data storage system
Techniques and mechanisms described herein provide for verification of data across cloud-based and on-premises data storage systems. According to various embodiments, a backup client implemented on a first compute node can store a data file in a backup data repository. A data guardianship can store first data file state information describing the data file in a key-value store accessible via the internet. A data verification instance can analyze the backup data repository to verify that the data file is stored intact in the backup data repository.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A system and a method for on-demand search of a large data-set is presented. The system includes a data indexer, an index writer, and an index reader. The data indexer is configured to index the data set. The index writer is configured to create a multi-level directory including a plurality of directories having one or more hash partitions. The index writer is further configured to generate a hash table for each directory and write data from the indexed data set into a corresponding hash partition of a directory. The index reader is configured to identify and query a hash partition in each directory based on a search term and a corresponding hash table for the directory. The index reader is further configured to retrieve one or more relevant records, and present the one or more relevant records to a user.
A system and a method for client-side deduplication system for a plurality of backup streams, generated by a backup and recovery client from a client database, is presented. The system includes a stream handler configured to generate a unique file name for an underlying file of each backup stream of the plurality of backup streams based on one or more data blocks in each backup stream. The system further includes a file creator configured to create a data file corresponding to each backup stream of the plurality of backup streams in a local cache of a client database server, wherein each data file has a file name corresponding to the unique file name generated by the stream handler. The client-system furthermore includes a dedupe module configured to dedupe subsequent backup streams based on the data files in the local cache.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/215 - Improving data qualityData cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
50.
System and method for data back-up using a proxy pool
A data back-up system configured to back-up one or more data sets from one or more devices to a data back-up server by using a proxy pool is presented. The data back-up system includes a load-balancer configured to distribute the one or more data sets across the proxy pool. The load balancer includes a data receiver configured to receive types of files, number of files, and total size of each file in the one or more data sets; a load estimator configured to estimate a weighted average load of each data set based on the number of files, the total size of each file, a compressibility factor for each file type, and an encryption factor for each file type; and a load distributor configured to distribute the one or more data sets as a plurality of workloads across the proxy pool.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
51.
Restoration of warm and cold data blocks based on storage time by batches
A data restoration system including a data management server. The data management server receives one or more data restoration requests for restoring a plurality of data blocks. The data management server determines, based on metadata associated with the data blocks, a first subset of warm data blocks corresponding to warm-tier data and a second subset of cold data blocks corresponding to cold-tier data. The data management server retrieves the warm data blocks in the first subset and restores the warm data blocks in the first subset. The data management server groups the cold data blocks based in part on storage times of the cold data blocks to generate a plurality of cold-tier data retrieval requests. The data management server retrieves the cold data blocks by batches, each batch corresponding to one of the cold-tier data retrieval requests. The data management server restores the cold data blocks in the second subset.
A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 16/13 - File access structures, e.g. distributed indices
A system includes a data store and a data management server. The data store stores a plurality of backup snapshots that capture states of a device at different times. The data management server receives a request to restore the device that is potentially malware affected. The data management server retrieves a first backup snapshot of the device. The data management server determines that a first file stored in the first backup snapshot is malware affected. The data management server checks one or more corresponding versions of the first file captured in one or more previous backup snapshots to identify a clean version of the first file. The data management server determines that a second file stored in the first backup snapshot is clean. The data management server restores data in the device.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 21/56 - Computer malware detection or handling, e.g. anti-virus arrangements
A system for selectively restoring data from a data back-up server is presented. The system includes a data access module configured to access a stateN of the data from a primary data source at a point N. The system further includes a log access module configured to access a log of modified meta-data and data blocks (MMDBs), from the primary data source or the data back-up server, corresponding to a data back-up point previous to the point N. The system furthermore includes a data restore module configured to iteratively perform selective restore of the data, based on the stateN and the MMDBs, from the data back-up server to a restore destination, until the data is restored to a stateRP corresponding to a recovery point (RP), as defined by a user. A related method is also presented.
A system includes a data management server and a data store. The data store uses an external file system to store data block of a client virtual machine. The client virtual machine uses an internal file system. The data management server comprises a proxy agent and a staging virtual machine. In response to receiving a request to retrieve a file indexed by the client virtual machine, the proxy agent imports data of the internal file system to a staging virtual machine. The proxy agent determines an internal file address that corresponds to the requested file. The staging virtual machine translates the internal file address to an external file address of the external file system. The staging virtual machine retrieves the file stored in the data store based on the external file address. The staging virtual machine provides the retrieved file to the target device.
G06F 7/00 - Methods or arrangements for processing data by operating upon the order or content of the data handled
G06F 17/00 - Digital computing or data processing equipment or methods, specially adapted for specific functions
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
System for delivering an event journal during a back-up session in a distributed file system is presented. The system includes an event intake module, a load balancer, a plurality of object creation modules, a journal manager, and a journal service module. Each object creation module of the plurality of object creation modules further includes an ingestor and a drainer. A related method is also presented. The system and method provide for reliable and time-ordered delivery of events in the event journal.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
A system identifies and causes transmission of differential data generated during device migration. An administrative server accesses a first backup snapshot of a retiring client device. The first backup snapshot includes a set of files stored in the retiring client device during a first checkpoint. The administrative server transmits the set of files in the first backup snapshot to a replacement client device. A cloud server stores the first backup snapshot and a second backup snapshot of the retiring client device. The second backup snapshot is created during a second checkpoint occurring after transmission of the set of files. The cloud server receives an indication that a user has logged on to the replacement client device and causes a transmission of differential data to the replacement client device. The differential data includes at least one file in the second backup snapshot that is not included in the first backup snapshot.
A system for data backup is provided. The system includes a memory having computer-readable instructions stored therein and a processor configured to execute the computer-readable instructions to receive a request for full data and/or incremental backup for a volume and to perform a full backup of the volume based on a first block size in response to a full data backup request. The processor is configured to generate a digital fingerprint of the full backup and determine if the full backup exists on a backup media based on the generated digital fingerprint and to upload the full backup to the backup media if it is determined that the first backup is unavailable on the backup media and perform an incremental backup of the volume based on a second block size in response to an incremental backup request. The second block size is substantially smaller than the first block size.
G06F 12/00 - Accessing, addressing or allocating within memory systems or architectures
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
59.
Event based aggregation for distributed scale-out storage systems
A system for estimating one or more data storage parameters and/or statistics in a data storage system is presented. The data storage system includes a plurality of storage containers. The system includes a snapshot module, a container stats aggregator, a synchronization module, a global stats aggregator, and storage stats estimator.
A system for dynamic file chunking is provided. The system includes a memory and a processor configured to access one or more files to be chunked for a data backup operation and to identify a type of the one or more files. The type of the file is based upon an extension of the respective file. The processor is configured to analyze storage data associated with each type of files corresponding to a plurality of chunking techniques. The processor is configured to associate each of files with a corresponding data chunk size and a chunking technique class based upon the analyzed storage data and to analyze data backup parameters in-real time during the data backup operation and to update at least one of the data chunk size and the chunking technique for each of the type of files based upon the data backup parameters.
A backup management system may include a data management server, a warm-tier data store, and a cold-tier data store. Snapshots may be captured from various client devices. A data block stored in the warm-tier data store may be referenced by multiple backup snapshots and/or referenced by one or more users. When a data block's total reference count is equal to the cold reference count or equal to or less than a threshold total reference count, the data management server may determine that the data block is ready to be migrated to the cold-tier data store. The data management server may send the data block into a candidate queue. In the queue, data blocks with similar retention periods or similar expected restoration may be grouped as a unit. The unit may be transmitted to the cold-tier data store in a single write request.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A proactive data recovery system is provided. The system includes a memory having computer-readable instructions stored therein and a processor configured to execute the computer-readable instructions to access a data storage platform and to monitor a plurality of parameters indicative of a requirement of data restore and/or recovery for the data storage platform. The requirement corresponds to a predicted occurrence of a disaster event. The processor is further configured trigger backup of data stored in the data storage platform based upon the plurality of parameters to create a restore package and to initiate the data restore and/or data recovery operation for the data storage platform using the restore package in response to the occurrence of the disaster event.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Techniques and mechanisms described herein provide for facilitating communications between one or more client machines and one or more cloud storage providers. According to various embodiments, a virtual machine may communicate with one or more client machines via a standard data storage protocol. The virtual machine may also communicate with one or more cloud storage providers via the internet. The virtual machine may then perform operations such as receiving data from a client machine and storing it to the cloud, retrieving data from the cloud and providing it to the client machine, querying data stored in the cloud, reporting on and verifying data stored in the cloud, and transferring and/or transforming data stored in the cloud.
A system for recommending a disaster recovery failover region of a public cloud service provider is provided. The system includes a memory having computer-readable instructions stored therein and a processor configured to execute the computer-readable instructions to detect a disaster recovery requirement for one or more clients of the public cloud service provider. The one or more clients is predicted to be affected by a disaster. The processor is further configured to monitor one or more disaster recovery (DR) factors associated with geological and meteorological conditions, legal and compliance requirements, network latency and costs for a plurality of disaster recovery regions associated with the public cloud service provider and to recommend a disaster recovery failover region for each of the one or more clients affected by the occurrence of the disaster based on the one or more DR factors.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
Disclosed embodiments include a method (system and non-transitory computer-readable medium) for backing up updated portions of a plurality files having hierarchical relationships through object storage. In one or more embodiments, a file is segregated into chunks, and objects corresponding to the chunks are generated for storage at an object storage. For a chunk, an object for storing the chunk and additional objects for storing mapping information are generated. The mapping information may include path information identifying a path of the file in a hierarchical structure, a file version list identifying a version of the file, a chunk list describing an association between the file and the chunks, a chunk version list identifying a version of the chunk, etc. When a portion of the file is updated, objects corresponding to the updated portion of the file can be generated, and stored at the object storage.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/185 - Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
66.
Global namespace in a cloud-based data storage system
Techniques and mechanisms described herein provide for verifying data across cloud-based and on-premises data storage systems. According to various embodiments, data can be received by a gateway from a client machine and stored in a file data repository accessible via the internet. The stored data can have a common master namespace. Indications of updates to the master namespace can be received. Updated namespace data and metadata can be transmitted to the gateway.
A space-efficient change journal for a storage system is provided. The storage system includes a memory having computer readable instructions stored therein. The system further includes a processor configured to access a log structure merge (LSM) tree-based metadata index having metadata for the storage system. The LSM tree-based metadata index includes indices placed in a plurality of indexing layers and one or more indices are merged within the indexing layers in response to updates to metadata, or as a background task. The processor is configured to identify one or more indices of the LSM tree-based metadata index as entries of a change journal of the storage system. The processor is further configured to maintain the change journal of the storage system based upon the identified entries.
A data storage system allows data to be encrypted and de-duplicated at the same system. By way of example, a server of the data storage system may request a client device which intends to upload a data block to transmit a first fingerprint of the data block to the server. The first fingerprint may be derived from the plaintext of the data block. The server may apply a one-way function to the first fingerprint to generate an encryption key and transmit the encryption key to the client device. The client device uses the encryption key to encrypt the data block and generates a second fingerprint which is derived from the ciphertext of the data block. The server uses both the first fingerprint and the second fingerprint to verify the data block and the legitimacy of the client attempting to upload the data block.
H04L 9/32 - Arrangements for secret or secure communicationsNetwork security protocols including means for verifying the identity or authority of a user of the system
H04L 9/00 - Arrangements for secret or secure communicationsNetwork security protocols
G06F 3/06 - Digital input from, or digital output to, record carriers
A snapshot usage tracking system and method for a versioned storage is provided. The system includes a memory having computer-readable instructions stored therein and a snapshot repository configured to store a plurality of snapshots of a versioned storage. Each of the plurality of snapshots includes one or more data blocks. The system further includes a processor communicatively coupled to the snapshot repository and configured to maintain a set of snapshot counters corresponding to each of previous snapshots created on or before a current snapshot based upon a size of the data blocks that are modified or deleted in the current snapshot.
A system and method for an index-based smart scan for a cloud-computing provider network is provided. The system includes a memory having computer-readable instructions stored therein and a snapshot repository configured to store a plurality of snapshots of a plurality of block storage volumes. Each of the plurality of block storage volumes is configured to perform volume based block storage operations for the cloud-computing provider network. The system further includes a processor communicatively coupled to the snapshot repository. The processor is configured to access contents of each of the plurality of snapshots. Each of the plurality of snapshots includes a point-in-time capture of the respective block storage volume. In addition, the processor is configured to perform a full scan of each of the plurality of snapshots to identify one or more files of the respective block storage volume. The processor is further configured to generate a folder index table for each of the files based upon the scan. The folder index table includes a listing of the file path and an associated modification time for each of the files. Furthermore, the processor is configured to update the folder index table over a pre-determined period of time and locate a file in the plurality of block storage volumes based on the folder index table.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
71.
Systems and methods for adaptive bandwidth throttling
Aspects of the current patent document include systems and methods to adaptive bandwidth throttling, for example, for use in data backup systems and data recovery systems. In embodiments, bandwidth estimation can be performed while sending data. In embodiments, the bandwidth estimation is used in data backups to send data to be backed up. In embodiments, a server performs network bandwidth estimation by receiving relatively small data packets and estimating bandwidth until bandwidth reliability conditions are satisfied.
G06F 15/16 - Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
H04L 12/825 - Adaptive control, at the source or intermediate nodes, upon congestion feedback, e.g. X-on X-off
H04L 29/06 - Communication control; Communication processing characterised by a protocol
H04L 12/911 - Network admission control and resource allocation, e.g. bandwidth allocation or in-call renegotiation
H04L 12/927 - Allocation of resources based on type of traffic, QoS or priority
A smart folder scan system and method is provided. The system includes a memory having computer-readable instructions stored therein and a storage module having a plurality of file folders configured to store data. The system further includes a processor communicatively coupled to the storage module. The processor is configured to access the plurality of file folders stored in the storage module. In addition, the processor is configured to scan and identify one or more modified file folders stored in the storage module. Further, the processor is configured to generate a folder activity table for each of the plurality of file folders based upon the scan. The folder activity table comprises of a listing of the file folders and an associated modification time for each of the file folders. The processor is further configured to generate a skip table database based upon the modification time of each of the file folders. The skip table database includes a listing of one or more file folders to be skipped from a full scan. In addition, the processor is configured to identify one or more file folders for the full scan based upon the folder activity table and the skip table database. Furthermore, the processor is configured to perform a full scan of the identified one or more file folders.
A file system may include an object storage, a merged index, and a distributed database. When a file is stored in the file system, the file may be converted to an object and be stored in the object storage. The deduplication index of the file may be stored in the distributed database. The namespace metadata of the file may be stored in the merged index. The merged index generates namespace entries of the file when the file is created, deleted, and/or modified. A namespace entry may be associated with a specific file and may include a creation version and a deletion version. When a file is deleted or modified, instead of modifying the existing namespace entries, new entries associated with different versions and including different creation or deletion versions are created. The status of a file may be monitored by one or more entries associated with a file.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/21 - Design, administration or maintenance of databases
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 16/13 - File access structures, e.g. distributed indices
A keyphrase extraction system and method is provided. The keyphrase extraction system includes a memory having computer-readable instructions stored therein. The keyphrase extraction system also includes a processor configured to access a document. The processor is configured to identify a plurality of candidate phrases from the document based upon a part-of-speech tag pattern. Each of the plurality of candidate phrases comprises one or more candidate terms. In addition, the processor is further configured to access an external knowledge base to determine a vocabulary frequency count of the one or more candidate terms. The vocabulary frequency count of the one or more candidate terms corresponds to a count of appearance of the respective candidate term in a plurality of documents accessible by the external knowledge base. Further, the processor is configured to estimate a phrase score for each of the plurality of candidate phrases based upon the vocabulary frequency count of the one or more candidate terms of each of the plurality of candidate phrases. Furthermore, the processor is configured to filter the plurality of candidate phrases based upon the estimated phrase score and pre-determined thresholds to determine one or more key phrases present in the document.
A server manager for detecting ransomware includes a server interface to retrieve, from a storage device, a backup of a plurality of files stored by a client device. A ransomware detection module includes a statistical filter to generate a standard pattern of file activities of the client device for a time period. A statistical behavior analysis is performed on the backup of the plurality of files based on the standard pattern to identify a portion of the backup corresponding to a statistical anomaly different from the standard pattern. The statistical anomaly corresponds to an abnormal file activity. An entropy detector generates an entropy score for the portion of the backup. The entropy score represents a randomness of a distribution of bits in a block of a file in the portion of the backup. It is determined whether the backup includes the ransomware based on the generated entropy score.
Aspects of the current patent document include systems and methods to perform search in an index system. In one embodiment, an index system may be implemented in an object storage. A distributed database index is used in conjunction with the object storage. In some cases, data stored in the distributed database may be encrypted and moved to object storage. The object storage stores a plurality of blocks containing words. Each block can contain a large number of words, such as one million words.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
Disclosed embodiments include a method (system and non-transitory computer-readable medium) for backing up updated portions of a plurality files having hierarchical relationships through object storage. In one or more embodiments, a file is segregated into chunks, and objects corresponding to the chunks are generated for storage at an object storage. For a chunk, an object for storing the chunk and additional objects for storing mapping information are generated. The mapping information may include path information identifying a path of the file in a hierarchical structure, a file version list identifying a version of the file, a chunk list describing an association between the file and the chunks, a chunk version list identifying a version of the chunk, etc. When a portion of the file is updated, objects corresponding to the updated portion of the file can be generated, and stored at the object storage.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/185 - Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
78.
Systems and methods for virtual machine file exclusion
The present invention relates generally to backups and more specifically to virtual machine (VM) backups including file exclusion. Aspects of the present invention related to using a specialized buffer to identify a file for exclusion. In embodiments, a file system used by the VM can be used to search for the specialized buffer. In embodiments, when the specialized buffer is located, offsets are noted related to the file associated with the specialized buffer. In embodiments, the offsets are used to zero out blocks associated with the offsets. Thus, the file can be effectively excluded from the backup.
G06F 16/30 - Information retrievalDatabase structures thereforFile system structures therefor of unstructured textual data
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/14 - Details of searching files based on file metadata
79.
Time-based data retirement for globally de-duplicated archival storage
Data from computing devices is backed-up regularly, storing a snapshot of the data and corresponding metadata in a data store. The backup data are stored for a relatively short period of time before being archived to long-term storage. For snapshots with files with identical data that are not archived together, archive storage space and computing resources may be conserved by not storing duplicates of the data. When the data is added to the archive storage, the archive storage location is added to backup reference entries for other files with identical data. When all files referencing an archive storage location are expired from the backup data store, an archive retention period is initiated, and an entry is added to a time-based archive expiration database indicating the storage location and an expiration time for the archived data. At the expiration time, the archived data is designated for deletion from the archive.
G06F 17/30 - Information retrieval; Database structures therefor
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Disclosed embodiments include a method (system and non-transitory computer storage readable medium) for load-balancing a distributed database. The distributed database includes one or more storage machines configured to store a plurality of partitions, where each partition includes key-value pairs. In one embodiment, the distributed database prepares for load-balancing by determining a partition to redistribute (or repartition) and generating smaller partitions of the determined partition. In one aspect, each of the smaller partitions is smaller than the determined partition. The redistribution of the partition can occur, when an amount of requests to access one or more key-value pairs stored in the database increases beyond a predetermined request level or when the size of a partition exceeds a predetermined size. Key-value pairs of the determined partition can be split into different sets of key-value pairs, and each set of key-value pair is copied to a corresponding smaller partition.
A system, method, and non-transitory computer-readable medium provide backup and archive services for computing devices. Typically, multiple backup snapshots are archived together in each archive cycle. A catalogue for the current archive is efficiently created by starting with a copy of the previous archive catalogue and updating it based on metadata associated with the backup snapshots.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
When a backup client sends a request to back up a file to a backup server, the file and an index (e.g., checksum, hash, encryption, etc.) of the file are stored on the backup server in an efficient deduplication storage. If a backup client sends a request to back up a modified version of a file already stored on a backup server, the modified portion of the file is stored. In addition, an index of the modified portion is generated and stored along with the modified portions on the backup server. The indices can be used to reconstruct the file or modified version of the file when retrieved. The efficient deduplication storage method ensures that multiple copies of files or portions of files do not exist on the servers.
G06F 16/174 - Redundancy elimination performed by the file system
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
A system and a method are disclosed for pre-seeding data to backup servers and determining servers for additional backups. Backup data is received from a client device through a first backup request and sent to a primary server for storage. Additional backup data from a second backup request is received. After the initial backup data is stored on the primary server and pre-seeded by the primary server on a secondary server, a status of backup servers associated with the client device is received. The backup servers include the primary server and can include the secondary server. Responsive to the status indicating availability of a server in the backup servers, a recipient server is identified from the backup servers and the additional backup data is sent to the identified recipient server.
G06F 17/30 - Information retrieval; Database structures therefor
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
84.
Efficient deduplicated data storage with tiered indexing
A deduplicated data storage system provides high performance storage to heterogeneous clients that connect to it via a communications network. The deduplicated data storage system provides fast access to deduplication data by caching the most frequently accessed deduplication data in a hyperindex. Updates to the non-cached deduplication data are serialized by use of a store queue and hold queue.
A distributed, cloud-based storage system provides a reliable, deduplicated, scalable and high performance backup service to heterogeneous clients that connect to it via a communications network. The distributed cloud-based storage system guarantees consistent and reliable data storage while using structured storage that lacks ACID compliance. Consistency and reliability are guaranteed using a system that includes: 1) back references from shared objects to referring objects, 2) safe orders of operation for object deletion and creation, 3) and simultaneous access to shared resources through sub-resources.