Techniques are provided for implementing additional compression for existing compressed data. Format information stored within a data block is evaluated to determine whether the data block is compressed or uncompressed. In response to the data block being compressed according to a first compression format, the data block is decompressed using the format information. The data block is compressed with one or more other data blocks to create compressed data having a second compression format different than the first compression format.
Systems and methods for preserving storage efficiency during restoration of data from the cloud are provided. In one embodiment, a CBMAP is maintained that maps cloud block numbers (CBNs) to respective corresponding block numbers of a volume of a data storage system in which previously restored data has been stored by a previously restored file. By making use of the CBMAP during the restoration process, storage of duplicate file data blocks on the volume may be avoided by sharing with a current file being restored a reference to the corresponding file data block previously stored on the volume and associated with the previously restored file. In addition to preserving storage efficiency, use of the CBMAP facilitates avoidance of repeated GET operations for data associated with CBNs previously retrieved from the cloud and stored to the volume, thereby reducing data access costs as well as latency of the restore operation.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Techniques are provided for dynamically implementing quality of service policies using a configurable quality of service provider pipeline. A quality of service policy is defined for throttling I/O operations received by a node based upon whether resources of the node have become over utilized. The quality of service policy is used to dynamically construct a quality of service provider pipeline with select quality of service providers that improve the ability to efficiently utilize resources compared to conventional static polices that cannot adequately react to changing considerations and resource utilization/saturation. With conventional static policies, an administrator manually defines a minimum amount of guaranteed resources and/or a maximum resource usage cap that could be set to values that result in inefficient operation and resource starvation. Dynamically constructing and utilizing the quality of service provider pipeline results in more efficient operation and mitigates resource starvation.
Techniques are provided for multi-tier write allocation. A storage system may store data within a multi-tier storage environment comprising a first storage tier (e.g., storage devices maintained by the storage system), a second storage tier (e.g., a remote object store provided by a third party storage provider), and/or other storage tiers. A determination is made that data (e.g., data of a write request received by the storage system) is to be stored within the second storage tier. The data is stored into a staging area of the first storage tier. A second storage tier location identifier, for referencing the data according to a format utilized by the second storage tier, is assigned to the data and provided to a file system hosting the data. The data is then destaged from the staging area into the second storage tier, such as within an object stored within the remote object store.
One or more systems, devices, computer program products, and/or computer-implemented methods provided herein to use a redundant array of disks. A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory, wherein the computer executable components can comprise a control component that directs, for n physical drives of a redundant array of disks (RAID) storing data for at least n logical volumes, log-structured writing of data of each logical volume of the at least n logical volumes vertically across chunks of only a single physical drive of the n physical drives, wherein the control component further directs writing of parity data at each of the physical drives, which parity data at each physical drive of the subset respectively corresponds to other ones of the physical drives of the n physical drives.
Systems and methods for supporting dynamic disk growth within a storage appliance are provided. According to one embodiment, a portion of a logical size of each of multiple disks (e.g., hyperscale disks or Logical Unit Numbers (LUNs)) are provisioned for use by a storage system as backing for respective file system disks. To accommodate growth, block numbers for the file system disks are pre-allocated within a sparse space of a contiguous sequence of block numbers corresponding to a number of blocks represented by the logical size. Metadata is maintained for the file system disks regarding a range of the pre-allocated block numbers that are available for use. Responsive to a triggering condition, the provisioned portion of a disk is increased and subsequently, responsive to detecting a change in a size of the disk by the storage system, a size of the corresponding file system disk is updated within the metadata.
Techniques are provided for implementing a file system format for persistent memory. A node, comprising persistent memory, receives an operation comprising a file identifier and file system instance information. A list of file system info objects are evaluated to identify a file system info object matching the file system instance information. An inofile, identified by the file system info object as being associated with inodes of files within an instance of the file system targeted by the operation, is traversed to identify an inode matching the file identifier. If the inode comprises an indicator that the file is tiered into the persistent memory, then the inode it utilized to facilitate execution of the operation upon the persistent memory. Otherwise, the operation is routed to a storage file system tier for execution by a storage file system upon storage associated with the node.
Systems and methods for reducing read application in a virtual storage system are provided. According to one embodiment, heuristic data may be tracked and utilized in real-time by a file system of the virtual storage system at the level of granularity of a volume, thereby allowing a fast path flag to be enabled/disabled at a volume level during various phases of operation of a workload. The heuristic data for a given volume may be indicative of a correlation between (i) data blocks stored on the given volume being located within a compressible zone of a zoned checksum scheme and (ii) the respective data blocks containing compressed data and a corresponding checksum. Based on the heuristic data, read requests may be selectively directed to the read path (e.g., a fast path or a slow path) expected to mitigate read amplification when data compression is enabled for a zoned checksum scheme.
Systems, methods, and machine-readable media for monitoring a storage system and correcting demand imbalances among nodes in a cluster are disclosed. A performance manager for the storage system may detect performance imbalances that occur over a period of time. When operating below an optimal performance capacity, the manager may cause a volume to be moved from a node with a high load to a node with a lower load to achieve a preventive result. When operating at or near optimal performance capacity, the manager may cause a QOS limit to be imposed to prevent the workload from exceeding the performance capacity, to achieve a proactive result. When operating abnormally, the manager may cause a QOS limit to be imposed to throttle the workload to bring the node back within the optimal performance capacity of the node, to achieve a reactive result. These actions may be performed independently, or in cooperation.
Systems and methods for performing a zero-copy volume move between nodes of a distributed storage system are provided. In one example, an approach for performing a zero-copy volume move is proposed in which volume data may be maintained in place within a storage pod and need not be copied to move a given volume between the source node and the destination node. In one embodiment, metadata (e.g., a top-most physical volume block number (PVBN) of a node tree representing the volume at issue) of a write-anywhere file system is copied from the source node to the destination node. Since the storage pod is associated with a global PVBN space that is visible and accessible to all nodes of the distributed storage system, as a result of copying the top-mode PVBN of the volume to the destination node, anything below the top-most PVBN will automatically be visible to the destination node.
Systems and methods for reducing read application in a virtual storage system are provided. According to one embodiment, read amplification is reduced when AZCS compression is being utilized by avoiding restarting of a read process via a slow path via a RAID layer of the virtual storage system when a data block associated with a read request and obtained via a first fast path read has been found not to be compressed. Instead, a second fast path read may be performed to obtain the corresponding checksum. Alternatively, or additionally, heuristics may be used to predict the odds of the data block being compressed. For example, when information encoded within a PVBN of the data block that identifies the PVBN as being within a compressed AZCS zone has shown to be sufficiently/insufficiently predictive of the data block being compressed, then a flag may be set to enable/disable fast path reads.
A data management system can include a set of storage media configured to implement a storage space and a set of controllers. The set of controllers can be configured to write to the storage space and to implement a set of nodes. The set of controllers can include a first controller that implements a first node and includes a first persistent memory, a second controller that implements a second node and includes a second persistent memory and a third controller that implements a third node and includes a third persistent memory. The third node can be configured to write third node journal data to the first persistent memory. The first node can be configured to generate first node journal data based on a first request received from a backend, write the first node journal data to the first persistent memory, and replicate the journal data to the second persistent memory.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
13.
Distributed File System that Provides Scalability and Resiliency
A distributed storage management system comprising nodes that form a cluster, a distributed block layer that spans the nodes in the cluster, and file system instances deployed on the nodes. Each file system instance comprises a data management subsystem and a storage management subsystem disaggregated from the data management subsystem. The storage management subsystem comprises a node block store that forms a portion of the distributed block layer and a storage manager that manages a key-value store and virtualized storage supporting the node block store. A file system volume hosted by the data management subsystem maps to a logical block device hosted by the virtualized storage in the storage management subsystem. The key-value store includes, for a data block of the logical block device, a key that comprises a block identifier for the logical block device and a value that comprises the data block.
The disclosed technology enables quicker initialization of a new master node for a cluster when a previous master node fails by tracking node state in the cluster prior to being designated the new master node. In a particular example, a method includes, in a first node, designated as a current master node for the cluster, managing the cluster based on states of the nodes determined by the first node. While the first node is designated the master node, the method includes each of the nodes collecting, and storing locally, the states of the nodes. In response to a failure of the first node, the method includes selecting a second node of the nodes a new master node. Upon being designated the new master node, the method includes the second node managing the cluster of nodes based on the states of the nodes that the second node collected and stored locally.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
H04L 41/0668 - Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
Techniques are provided for lock reservations for shared storage. A reserve command to reserve a storage structure is received by a driver from a node. The reserve command is formatted according to a storage protocol. The driver translates the reserve command into a lease acquire command formatted according to an object store protocol and targeting an object stored within an object store and corresponding to the storage structure. A lease identifier derived from a node identifier of the node is inserted into the lease acquire command. The lease acquire command is routed to the object store for obtaining a lease on the object for granting the node exclusive write access to the object.
G06F 3/00 - Input arrangements for transferring data to be processed into a form capable of being handled by the computerOutput arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
G06F 3/06 - Digital input from, or digital output to, record carriers
16.
DYNAMICALLY SCALING APPLICATION AND STORAGE SYSTEM FUNCTIONS BASED ON A HETEROGENEOUS RESOURCE POOL AVAILABLE FOR USE BY A DISTRIBUTED STORAGE SYSTEM
Systems and methods for scaling application and/or storage system functions of a distributed storage system based on a heterogeneous resource pool are provided. According to one embodiment, the distributed storage system has a composable, service-based architecture that provides scalability, resiliency, and load balancing. The distributed storage system includes a cluster of nodes each potentially having differing capabilities in terms of processing, memory, and/or storage. The distributed storage system takes advantage of different types of nodes by selectively instating appropriate services (e.g., file and volume services and/or block and storage management services) on the nodes based on their respective capabilities. Furthermore, disaggregation of these services, facilitated by interposing a frictionless layer (e.g., in the form of one or more globally accessible logical disks) therebetween, enables independent and on-demand scaling of either or both of application and storage system functions within the cluster while making use of the heterogeneous resource pool.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
Techniques are provided for compacting indirect blocks. For example, an object is represented as a structure comprising data blocks within which data of the object is stored and indirect blocks comprising block numbers of where the data blocks are located in storage. Block numbers within a set of indirect blocks are compacted into a compacted indirect block comprising a base block number, a count of additional block numbers after the base block number in the compacted indirect block, and a pattern of the block numbers in the compacted indirect block. The compacted indirect block is stored into memory for processing access operations to the object. Storing compacted indirect blocks into memory allows for more block numbers to be stored within memory. This improves the processing of access operations because reading the block numbers from memory is faster than loading the block numbers from disk.
Techniques are provided for implementing a distributed control plane to facilitate communication between a container orchestration platform and a distributed storage architecture. The distributed storage architecture hosts worker nodes that manage distributed storage that can be made accessible to applications within the container orchestration platform through the distributed control plane. The distributed control plane includes control plane controllers that are each paired with a single worker node of the distributed storage architecture. The distributed control plane is configured to selectively route commands to control plane controllers that are paired with worker nodes that are current owners of objects targeted by the commands. If ownership of an object has changed from one worker node to another worker node, then subsequent commands will be re-routed to a control plane controller paired with the other worker node now owning the object.
Techniques are provided for input/output operations per second (IOPS) and throughput monitoring for dynamic and/or optimal resource allocation. These techniques provide automated monitoring of resources, such as memory and processor utilization by a container accessing a volume. The automated monitoring is performed in order to generate and execute intelligent recommendations for improved resource utilization. Resource allocations can be scaled up to meet I/O load demand and satisfy service level agreements (SLAs). Resource allocations can be scaled down or adjusted to conserve resources, such as by consolidating containers or pods hosted in multiple virtual machines into a single virtual machine and decommissioning virtual machines no longer hosting containers or pods.
Data traffic management in a computing environment utilizing direct memory access functionality is disclosed. A management agent is configured to operate within a storage node. The management agent includes a storage interface to communicate with a first set of storage devices, a management memory interface to communicate with a first management memory, and an interconnect (IC) interface to communicate with a remote peer node. The management agent controls data traffic between the storage node and the peer node to provide at least mirroring of the first management memory to the peer node and mirroring of a second management memory on the peer node to the storage node. The management agent further controls the data traffic using a traffic control approach selected based on at least a performance evaluation of an IC fabric accessible via the IC interface.
H04L 67/1097 - Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Techniques are provided for persistent memory file system reconciliation. As part of the persistent memory file system reconciliation, high level file system metadata associated with a persistent memory file system of persistent memory is reconciled. Client access to the persistent memory file system is inaccessible until reconciliation of the high level file system metadata has completed. A first scanner is executed to traverse pages of the persistent memory in order to fix local inconsistencies associated with the pages. A local inconsistency of a first set of metadata or data of a page is fixed using a second set of metadata or data of the page. The first scanner is executed asynchronously in parallel with processing client I/O directed to the persistent memory file system.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 13/16 - Handling requests for interconnection or transfer for access to memory bus
In one embodiment, distributed data storage systems and methods are described for integrating a change tracking manager with scalable databases. According to one embodiment, a computer implemented method comprises managing storage of objects and continuously tracking changes of the objects in a distributed object storage database, creating a record for an object having an object name, the object being stored in a bucket of the distributed object storage database, linking the bucket to a peer bucket based on a directive, generating a peer marker field for the record to store one peer marker of multiple different peer markers depending on a relationship between the bucket and the peer bucket; and automatically adding a work item for the object to the secondary index of a chapter database based on the record being created in the bucket and the peer marker for the peer bucket.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
Presented herein are methods, non-transitory computer readable media, and devices for integrating a hybrid model of fine-grained locking and data-partitioning wherein fine-grained locking is added to existing systems that are based on hierarchical data-partitioning in order in increase parallelism with minimal code re-write. Methods for integrating a hybrid model of fine-grained locking and data-partitioning are disclosed which include: creating, by a network storage server, a plurality of domains for execution of processes of the network storage server, the plurality of domains including a domain; creating a hierarchy of storage filesystem subdomains within the domain, wherein each of the subdomains corresponds to one or more types of processes, wherein at least one of the storage filesystem subdomains maps to a data object that is locked via fine-grained locking; and assigning processes for simultaneous execution by the storage filesystem subdomains within the domain and the at least one subdomain that maps to the data object locked via fine-grained locking.
Techniques are provided for enforcing policies at a sub-logical unit number (LUN) granularity, such as at a virtual disk or virtual machine granularity. A block range of a virtual disk of a virtual machine stored within a LUN is identified. A quality of service policy object is assigned to the block range to create a quality of service workload object. A target block range targeted by an operation is identified. A quality of service policy of the quality of service policy object is enforced upon the operation using the quality of service workload object based upon the target block range being within the block range of the virtual disk.
G06F 12/109 - Address translation for multiple virtual address spaces, e.g. segmentation
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/174 - Redundancy elimination performed by the file system
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
42 - Scientific, technological and industrial services, research and design
Goods & Services
Design, deployment and management of artificial intelligence for the purpose of managing business data; Technical consulting in the field of artificial intelligence (AI) software customization; artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence (ai) for use in database management; artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence (AI) for use in machine learning; design, deployment and management of artificial intelligence for the purpose of managing business data and workflows; artificial intelligence training models for business data management; providing online and cloud-based non-downloadable software for deploying, implementing, monitoring, securing, optimizing, analyzing, storing, managing, and troubleshooting artificial intelligence (AI) platforms and internal processes; technical consulting in the field of artificial intelligence (AI) software.
26.
GRANULAR CLOUD RESTORE WITH MULTI STORAGE CLASS SUPPORT
Techniques are provided for performing a storage operation targeting objects stored across multiple storage tiers of a cloud storage environment. A volume may be backed up as objects stored across the multiple storage tiers of the cloud storage environment, such as a standard storage tier directly accessible to the storage operation, an archival storage tier not directly accessible to the storage operation, etc. The storage operation may target the objects, such as where the storage operation is a directory restore operation to restore a directory of the volume. The storage operation can be successfully implemented such as to restore the directory even though objects of the storage operation are stored across the multiple storage tiers of the cloud storage environment.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
27.
DATA FORMAT FOR EFFICIENT MANAGEMENT OF CHECKPOINT SUPPORT
Techniques are provided for a data format for efficient management of checkpoint support. The data format corresponds to a base metafile and a set of instance metafiles used to track storage operations such as a directory restore operation. The base metafile and the set of instance metafiles can be used to resume the storage operation from where the storage operation left off in the event of a failure. The base metafile and the set of instance metafiles can be used to track progress of the storage operation processing objects stored within an object store of a cloud storage environment.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/185 - Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
28.
RESTARTING A STORAGE OPERATION UTILIZING AN INSTANCE METAFILE
Techniques are provided for a data format for efficient management of checkpoint support. The data format corresponds to a base metafile and a set of instance metafiles used to track storage operations such as a directory restore operation. The base metafile and the set of instance metafiles can be used to resume the storage operation from where the storage operation left off in the event of a failure. The base metafile and the set of instance metafiles can be used to track progress of the storage operation processing objects stored within an object store of a cloud storage environment.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Systems and methods for providing a file system with object versioning support are provided. Rather than adding object records for each version of an object to a chapter database, in one example, the chapter database may be limited to a single object record for a given object including: (i) a name of the object; (ii) an object file handle containing information regarding a file containing data of a current version of multiple versions of the object; and (iii) a version table file handle containing information regarding a file containing a version table. In this manner, enumeration of objects associated with a given chapter may be performed more efficiently and prior versions of objects may be maintained separately within the version table without causing disproportionate growth of object records and without increasing the search depth with objects that are not referenced by the search at issue.
Techniques are provided for a snapshot difference interface integrated into an object store data management container. The snapshot difference interface is capable of interpreting an object format and snapshot file system format of snapshots backed up to an object store within objects formatted according to the object format. The snapshot difference interface can identify differences between snapshots, such as files that changed between the snapshots, while the snapshots are still resident within the object store. Because the snapshot difference interface does not retrieve the snapshots from the object store, security is improved, resource and network consumption is reduced, there is less of an impact upon client I/O processing, and a catalog of the snapshots can be more efficiently built and recovered in the event of corruption.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
31.
RESYNCHRONIZATION OF INDIVIDUAL VOLUMES OF A CONSISTENCY GROUP (CG) WITHIN A CROSS-SITE STORAGE SOLUTION WHILE MAINTAINING SYNCHRONIZATION OF OTHER VOLUMES OF THE CG
Systems and methods are provided for bringing a volume of a consistency group (CG) into an in-synchronization (InSync) state while other volumes of the CG remain in the InSync state. According to an example, in order to support recovery from disruptive events in a manner that ensures a zero recovery point objective (RPO) guarantee and insulates an application making use of the CG from adverse impacts, responsive to a triggering event, a Fast Resync process may first be attempted to promptly bring an affected volume back into an in-synchronization (InSync) state from an out of synchronization (OOS) state while allowing other members of the CG to remain in the InSync state. Should the Fast resync process be unsuccessful in bringing the volume back into the InSync state within a predetermined or configurable time threshold, then a second type of resynchronization process may be employed at the CG level.
Techniques are provided for supporting a lookup structure for a file system implementing hierarchical reference counting. A write operation to write data to a page maintained by the file system is received. A lookup within a lookup structure is performed using information related to the page in order to identify a lookup entry within the lookup structure. A hash generation count within the lookup entry is compared to a file system info generation count within a file system info object for a volume associated with the page. In response to the lookup entry generation count not matching the file system info generation count, a file system tree of the file system is traversed to determine a reference count for the page, and the write operation is implemented based upon the reference count. Otherwise, the lookup entry is utilized to access the page for processing the write operation.
Systems and methods for reducing delays between the time at which a need for a resynchronization of data replication between a volume of a local CG and its peer volume of a remote CG is detected and the time at which the resynchronization is triggered (Reseed Time Period) are provided. According to an example, information indicative of the direction of data replication between the volume and the peer volume is maintained within a cache of a node. Responsive to a disruptive operation (e.g., relocation of the volume from a first node to a second node), the Reseed Time Period is lessened by proactively adding a passive cache entry to a cache within the second node at the time the CG relationship is created when the second node represents an HA partner of the first node and prior to the volume coming online when the second node represents a non-HA partner.
G06F 16/178 - Techniques for file synchronisation in file systems
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 16/172 - Caching, prefetching or hoarding of files
Systems and methods are described for a streamlined garbage collection process during which data integrity checking is also performed for a distributed key-value (KV) store utilized by a distributed storage system. According to one embodiment, by making use of full or truncated block IDs (rather than an intermediate probabilistic data structure, such as a Bloom filter) for garbage collection, data integrity checking can be performed concurrently almost for free. During garbage collection, a block ID compare list may be compared to block IDs within the distributed KV store. If a particular block ID is present in the distributed KV store but is missing from the block ID compare list, the corresponding data block represents garbage to be collected. If the particular block ID is present in the block ID compare list but missing from the distributed KV store, a data integrity error has been identified.
Techniques are provided for orchestrating operations between a storage environment and a computing environment hosting virtual machines. A virtual machine proxy, associated with a computing environment hosting a virtual machine, is accessed by an orchestrator to identify the virtual machine and properties of the virtual machine. A storage proxy, associated with a storage environment comprising a volume within which snapshots of the virtual machine are to be stored, is accessed by the orchestrator to initialize a backup procedure. The orchestrator utilizes the virtual machine proxy to create a snapshot of the virtual machine. The orchestrator utilizes the storage proxy to back up the snapshot to the volume using the backup procedure.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
36.
Distributed File System that Provides Scalability and Resiliency
In various examples, data storage is managed using a distributed storage management system that is resilient. Data blocks of a logical block device may be distributed across multiple nodes in a cluster. The logical block device may correspond to a file system volume associated with a file system instance deployed on a selected node within a distributed block layer of a distributed file system. Each data block may have a location in the cluster identified by a block identifier associated with each data block. Each data block may be replicated on at least one other node in the cluster. A metadata object corresponding to a logical block device that maps to the file system volume may be replicated on at least another node in the cluster. Each data block and the metadata object may be hosted on virtualized storage that is protected using redundant array independent disks (RAID).
Techniques are provided for restoring a directory from a snapshot of a volume backed up to an object store. The snapshot may be backed up from a node to the object store, such as a cloud computing environment. A user may want to restore the directory within the volume without having to restore the entire volume, which otherwise would waste computing resources, storage, network bandwidth, and time. Accordingly, the techniques provided herein are capable of restoring just the directory from the snapshot that is stored within the object store. Because snapshot data of the snapshot may be stored across multiple objects within the object store, certain objects are identified as comprising snapshot data (backup data) of the directory and content items within the directory. In this way, the snapshot data of the directory is restored from these objects to a restore directory at a restore target.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
38.
MIRRORING OBJECTS BETWEEN DIFFERENT CLOUD PROVIDERS
Techniques are provided for mirroring objects between object stores hosted by cloud providers that could have different data layout requirements. An object may be stored within an object store that supports a unified object format where the object is capable of storing compressed data. The object may be mirrored to a destination object store that may also support the unified object format or to a destination object store that does not support the unified object format. If the destination object store does not support the unified object format, then slot header metadata within the object is used to decompress the data within the object into an uncompressed format. The data is packaged from being in the uncompressed format into a fixed offset format supported by the destination object store to create a mirrored object that is stored into the destination object store while retaining compression of the data.
Techniques are provided for virtual machine hosting and serverless disaster recovery. A virtual machine is hosted by a first hypervisor that may be located on-premise. Snapshots of virtual machine disks of the virtual machine are backed up to a cloud storage environment. The snapshots are used to on-demand host a new instance of the virtual machine within a destination environment such as within the cloud storage environment through a second hypervisor. The new instance of the virtual machine is hosted for various reasons such as part of a disaster recovery operation if the virtual machine fails, load balancing of I/O operations, migration to a different hosting environment (e.g., a cheaper or more performant environment), development testing, etc.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
40.
VIRTUAL MACHINE HOSTING AND DISASTER RECOVERY ACROSS VIRTUAL MACHINE HOSTING ENVIRONMENTS SUPPORTING DIFFERENT VIRTUAL MACHINE FORMATS
Techniques are provided for virtual machine hosting and disaster recovery across virtual machine hosting environments, such as hypervisors, supporting different virtual machine formats. A virtual machine is hosted by a first hypervisor that supports a first virtual machine format. Snapshots capturing virtual machine disks of the virtual machine are created and backed up to a cloud storage environment. The snapshots are used to restore the virtual machine as a destination virtual machine hosted by a second hypervisor supporting a second virtual machine format different than the first virtual machine format. As part of restoring the destination virtual machine, virtual machine disks of the virtual machine are reformatted according to the second virtual machine format supported by the second hypervisor.
The technology disclosed herein enables accelerated data transmission between producers and consumers. In a particular example, a method includes receiving a first request from a producer-connector component of a producer component to store a payload to a storage repository. In response to the first request, the method includes providing a unique identifier to the connector component. The connector component provides the unique identifier to the distributed-clustered application. The method further includes storing the payload in association with the unique identifier to the storage repository. The method also includes retrieving the payload from the storage repository using the unique identifier to identify the payload in the storage repository. The method includes receiving a second request from a consumer-connector component of the consumer component to retrieve the payload. In response to the second request, the method includes supplying the payload to the consumer component.
Systems and methods for sharing a namespace of an ephemeral storage device by multiple consumers are provided. In an example, an NVMe driver of a virtual storage system deployed within a compute instance of a cloud environment facilitates sharing of the namespace by exposing an API through which the multiple consumers access an ephemeral storage device associated with the compute instance. During initialization processing performed by each consumer, for example, during boot processing of the virtual storage system, the consumers may share the namespace by reserving for their own use respective partitions within the namespace via the API and thereafter restrict their usage of the namespace to their respective partitions, thereby retaining the functionality provided by the multiple consumers when the host on which the compute instance is deployed has fewer ephemeral storage devices than consumers that rely on the availability of vNVRAM backed by ephemeral storage.
Systems and methods for multiple device consumption of shared namespaces of ephemeral storage devices by a consumer of a virtual storage system are provided. In an example, multiple namespaces of respective ephemeral storage devices are shared among multiple of consumers of a virtual storage system by creating multiple partitions within each of the namespaces for use by respective consumers of the multiple consumers. Corresponding partitions of respective shared namespace may then be treated as a stripe set to facilitate multiple device consumption for a subsystem (e.g., operation log journaling) of the virtual storage system by striping data associated with input/output (I/O) requests of a consumer (e.g., a journaling driver) across one or more stripe units of one or more stripes within the stripe set.
Techniques are provided for dynamically implementing quality of service policies for a distributed storage system based upon resources saturation. A quality of service policy is defined for throttling I/O operations received by a node of the distributed storage system based upon whether resources of the node have become saturated. The quality of service policy is dynamically implemented based upon ever changing resource utilization and saturation. Dynamically implementing the quality of service policy improves the ability to efficiently utilize resources of the node compared to conventional static polices that cannot adequately react to such changing considerations and resource utilization/saturation. With conventional static policies, an administrator manually defines a minimum amount of guaranteed resources and/or a maximum resource usage cap that could be set to values that result in inefficient operation and resource starvation. Dynamically implementing the quality of service policy results in more efficient operation and mitigates resource starvation.
Techniques are provided for mirroring objects between object stores hosted by cloud providers that have different data layout requirements. An object may be stored within a first object store that supports a fix offset format where uncompressed data is stored according to fixed offsets and boundaries within fixed size objects. A mirroring operation may be used to mirror the object to a second object store that supports a unified object format where compressed data can be stored at non-fixed offsets and boundaries within variable sized objects. The mirroring operation selects a compression algorithm and compresses the object on the fly to create a mirrored object having the unified object format. The mirrored object, populated with the compressed data and slot header metadata comprising compression information for how to locate and decompress the data in the mirrored object, is stored into the second object store.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/215 - Improving data qualityData cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
G06F 16/25 - Integrating or interfacing systems involving database management systems
Intelligent self-adjusting metric collection is described. A first rule set is distributed that describes a first set of one or more metrics corresponding to operation of elements of the receiving entities. One or more metrics based on the first rule set are received. A second rule set is generated in response to an indication of a condition change. The second rule set can be generated using machine learning techniques. The second rule set that describes a second set of one or more metrics is distributed. Metrics based on the second rule set are received.
Techniques are provided for processing read operations while splitting a clone volume from a parent volume whose data is stored within objects of an object store. A transfer map is created to track mappings of child object identifiers to parent object identifiers of the parent objects to copy as child objects having the child object identifiers. The transfer map for the object store is traversed to copy the parent objects as the child objects for the split clone operation. The child objects are verified as being successfully created with valid data. In response to determining that the parent object has been copied as the child object, a context check is performed using the reverse map to verify a block within the child object. In response a successful context check, the read operation is processed using the block of the child object.
Techniques are provided for splitting a clone volume from a parent volume whose data is stored within objects of an object store. A transfer map is used to track mappings of selectively created child object identifiers used to subsequently copy the one or more parent objects to create child objects corresponding to the child object identifiers. A consistency point phase is performed. For each child object identifier processed during the consistency point phase, an object state for a corresponding child object is set to a copy pending state. A reverse map is populated with a reverse map entry. The transfer map is traversed to copy the one or more parent objects as the child objects for splitting the clone volume from the parent volume. The reverse map is used to verify that the child objects are successfully created with valid data.
Systems and methods for preserving storage efficiency during restoration of data from the cloud are provided. In one embodiment, a CBMAP is maintained that maps cloud block numbers (CBNs) to respective corresponding block numbers of a volume of a data storage system in which previously restored data has been stored by a previously restored file. By making use of the CBMAP during the restoration process, storage of duplicate file data blocks on the volume may be avoided by sharing with a current file being restored a reference to the corresponding file data block previously stored on the volume and associated with the previously restored file. In addition to preserving storage efficiency, use of the CBMAP facilitates avoidance of repeated GET operations for data associated with CBNs previously retrieved from the cloud and stored to the volume, thereby reducing data access costs as well as latency of the restore operation.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
50.
MANAGING ATTRIBUTES IN CHILD NODES FOR A MULTI-PART FILE
Approaches for setting file attributes in a distributed file system using a multipart file structure are described. A request to set attributes for one or more parts of a multipart file is received. In response to the request, a rectify indicator is set to indicate the attributes for the multipart file that are to be set. In response to the request, an entry corresponding to the request is created in a rectify database. The attributes for the one or more parts of the multipart files are set using at least the entry in the rectify database.
Approaches for providing a non-disruptive file move are disclosed. A request to move a target file from the first constituent to the second constituent is received. The file has an associated file handle. The target file in the first constituent is converted to a multipart file in the first constituent with a file location for the new file in the first constituent. A new file is created in the second constituent. Contents of the target file are moved to a new file on the second constituent while maintaining access via the associated file handle via access to the multipart file. The target file is deleted from the first constituent.
Techniques are provided for dynamically implementing quality of service policies using a configurable quality of service provider pipeline. A quality of service policy is defined for throttling I/O operations received by a node based upon whether resources of the node have become over utilized. The quality of service policy is used to dynamically construct a quality of service provider pipeline with select quality of service providers that improve the ability to efficiently utilize resources compared to conventional static polices that cannot adequately react to changing considerations and resource utilization/saturation. With conventional static policies, an administrator manually defines a minimum amount of guaranteed resources and/or a maximum resource usage cap that could be set to values that result in inefficient operation and resource starvation. Dynamically constructing and utilizing the quality of service provider pipeline results in more efficient operation and mitigates resource starvation.
Techniques are provided for implementing data requests associated with objects of an object store. A data connector component may be instantiated as a container for processing data requests associated with backup data stored within objects of an object store. The data connector component may evaluate the object store to identify snapshots stored as the backup data within the objects of the object store according to an object format. The data connector component may provide a client device with access to backup data of the snapshots.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
54.
SLICE FILE RECOVERY USING DEAD REPLICA SLICE FILES
Techniques are provided for repairing a primary slice file, affected by a storage device error, by using one or more dead replica slice files. The primary slice file is used by a node of a distributed storage architecture as an indirection layer between storage containers (e.g., a volume or LUN) and physical storage where data is physically stored. To improve resiliency of the distributed storage architecture, changes to the primary slice file are replicated to replica slice files hosted by other nodes. If a replica slice file falls out of sync with the primary slice file, then the replica slice file is considered dead (out of sync) and could potentially comprise stale data. If a storage device error affects blocks storing data of the primary slice file, then the techniques provided herein can repair the primary slice file using non-stale data from one or more dead replica slice files.
Systems and methods include negotiating a primary bias state for primary and secondary storage sites when a mediator is temporarily unavailable for a multi-site distributed storage system. In one example, a computer-implemented method comprises detecting, with the primary storage site having a primary storage cluster, a temporary loss of connectivity to a mediator or a failure of the mediator. The computer-implemented method includes negotiating the primary bias state and setting the primary bias state on a secondary storage cluster of the secondary storage site when the secondary storage cluster detects a temporary loss of connectivity to the mediator, determining whether the primary storage cluster receives a confirmation of the secondary storage cluster setting the primary bias state, and setting the primary bias state on the primary storage cluster when the primary storage cluster receives the confirmation.
According to an example, a computer-implemented method comprises initiating a first process for atomically setting the primary bias state with a first node of a primary storage cluster of a multi-site distributed storage system due to a temporary loss of connectivity to a mediator or a temporary mediator failure, releasing an atomic lock for the first process on the first node of the primary storage cluster, sending the first process and an associated first generation indicator to a first node of a secondary storage cluster of the multi-site distributed storage system to handle the first process for setting the primary bias state, and initiating a second process for atomically clearing a primary bias state with the first node or any node of the primary storage cluster based on detecting a connection to the mediator or detecting that the mediator is available.
A method, computing device, and non-transitory machine-readable medium for performing asynchronous write-backs. Data is written to a cache file in a cache. The cache corresponds to a volume. A tracking metafile is updated based on the data written to the cache file. A record in the tracking metafile is determined to be full. The record corresponds to a group of blocks in the cache file. A write-back of data stored in the group of blocks in the cache file that corresponds to the record to the volume is initiated. The write-back is determined to have been completed. The tracking metafile us updated to indicate that the write-back has been completed.
A method, computing device, and non-transitory machine-readable medium for write-back caching within a same or different clusters. A client write request to write data to a volume (for which the client has mounted the corresponding cache) may be received at a network module of a node and processed to generate a write request that can be forwarded to a disk module hosting the cache (at a same or different node than received the client write request). The data is written to the cache and confirmation of the write is sent to the client. Accumulated data in the cache is written back to the volume (hosted by a different node than the cache) when at least one of a cache file threshold or a cache threshold is met. These parameters are set to values that reduce write latency, increase throughput, and help ensure data consistency and resiliency.
A method and computing device for write-back caching. A client write request to write new data to a selected file on a volume (for which the client has mounted the corresponding cache) may be received at a network module of a node and processed to generate a write request that can be forwarded to a disk module hosting the cache (at a same or different node than received the client write request). The data is written to the cache and confirmation of the write is sent to the client. Accumulated data in the cache is written back to the volume (hosted by a different node than the cache) when at least one of a cache file threshold or a cache threshold is met. These parameters are set to values that reduce write latency, increase throughput, and help ensure data consistency and resiliency.
G06F 3/06 - Digital input from, or digital output to, record carriers
60.
METHODS AND SYSTEMS TO IMPROVE RESUMPTION TIME OF INPUT/OUTPUT (I/O) OPERATIONS BASED ON PREFETCHING OF CONFIGURATION DATA AND EARLY ABORT OF CONFLICTING WORKFLOWS DURING A NON-DISRUPTIVE AUTOMATIC UNPLANNED FAILOVER FROM A PRIMARY COPY OF DATA AT A PRIMARY STORAGE SYSTEM TO A MIRROR COPY OF THE DATA AT A CROSS-SITE SECONDARY STORAGE SYSTEM
Multi-site distributed storage systems and computer-implemented methods are described for improving a resumption time for processing of input/output (I/O) operations during an automatic unplanned failover (AUFO). A first storage cluster includes a first set of consistency groups (CGs) and a second storage cluster includes a second mirrored set of CGs. A computer-implemented method includes prefetching, with a user space of the second storage cluster, configuration information from a replicated database prior to starting the AUFO workflow, sending the configuration information to a kernel space of the second storage cluster on a per CG level while queuing the AUFO workflow, and determining if any in progress workflows conflict with the AUFO workflow.
Techniques are provided for backing up and restoring a file system or storage virtual machine located within a remote object store. A specification is parsed to identify resources associated with and including a primary resource hosted within a remote object store and to identify REST API endpoints of the resources. GET operations targeting the REST API endpoints of the resources are performed to retrieve the resources and properties of the resources. A link relationship specification is parsed to identify links corresponding to dependencies amongst the resources. A backup of the primary resource is generated to include the resources, the properties of the resources, and dependency information derived from the links. The backup can be used to restore the primary resource to the remote object store in manner that preserves the dependencies amongst the resources.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Techniques are provided for forwarding operations to bypass persistent memory. A modify operation, targeting an object, may be received at a persistent memory tier of a node. If a forwarding policy indicates that forwarding is not enabled for the modify operation and the target object, then the modify operation is executed through a persistent memory file system. If the forwarding policy indicates that forwarding is enabled for the modify operation and the target object, then the modify operation is forwarded to a file system tier as a forwarded operation for execution through a storage file system.
Systems and methods for sampling a set of block IDs to facilitate estimating an amount of data stored in a data set of a storage system having one or more characteristics are provided. According to an example, metadata (e.g., block headers and block IDs) may be maintained regarding multiple data blocks of the data set. When one or more metrics relating to the data set are desired, an efficiency set, representing a subset of the block IDs of the data set, may be created to facilitate efficient calculation of the metrics by sampling the block IDs of the data set. Finally, the metrics may be estimated based on the efficiency set by analyzing one or more of the metadata (e.g., block headers) and the data contained in the data blocks corresponding to the subset of the block IDs and extrapolating the metrics for the entirety of the data set.
The technology disclosed herein enables access to a file system by a portable executable program. In a particular example, a method includes, in a host executing the portable executable program, recognizing the portable executable program is executing on the one or more processing systems and determining the portable executable program is configured to access the file system. The method also includes directing the portable executable program to create a module therein for file system access and creating an abstraction layer with which the module exchanges file system commands. In the abstraction layer, the method includes translating the file system commands to translated commands for the file system and exchanging translated commands between the abstraction layer and the file system.
G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
Systems and methods for performing a fast resynchronization of a mirrored aggregate of a distributed storage system using disk-level cloning are provided. According to one embodiment, responsive to a failure of a disk of a plex of the mirrored aggregate utilized by a high-availability (HA) pair of nodes of a distributed storage system, disk-level clones of the disks of a healthy plex may be created external to the distributed storage system and attached to the degraded HA partner node. After detection of the cloned disks by the degraded HA partner node, mirror protection may be efficiently re-established by assimilating the cloned disks within the failed plex and then resynchronizing the mirrored aggregate by performing a level-1 resync of the failed plex with the healthy plex based on a base file system snapshot of the healthy plex. In this manner, a more time-consuming level-0 resync may be avoided.
G06F 3/06 - Digital input from, or digital output to, record carriers
66.
VERIFICATION OF A PUBLISHED IMAGE HAVING A PREDEFINED PORTION THAT HAS BEEN ALTERED BY A CLOUD PROVIDER PRIOR TO BEING MADE AVAILABLE VIA A MARKETPLACE OF THE CLOUD PROVIDER
Systems and methods for verifying an executable portion of a published cloud image represents an unaltered version of an executable portion of a corresponding original cloud image are provided. In one embodiment, modification of a predefined portion of a cloud image by a cloud provider prior to its publication via a marketplace of the cloud provider is proactively addressed as part of (i) an automated signing process performed by a software publisher on the original cloud image prior to delivery to the cloud provider and (ii) a corresponding background verification process performed on the published cloud image on behalf of users by a management platform. The signing and verification processes are operable to exclude the predefined portion when creating their respective digests, thereby allowing the signed digest created prior to the modification to remain useful as part of a subsequent digest comparison performed by the verification process.
H04L 9/32 - Arrangements for secret or secure communicationsNetwork security protocols including means for verifying the identity or authority of a user of the system
A system, method, and machine-readable storage medium for retrieving data are provided. In some embodiments, a cache may receive a request for data from a client. The cache may determine that a first subset of the data is stored on a storage device and that a second subset of the data is stored at a cloud address located at a cloud storage endpoint. The cache may also receive from the storage device the first subset of data. The cache further receives from the cloud storage endpoint the second subset of data in response to transmitting a request for the second subset of data stored at the cloud address to the cloud storage endpoint. The cache then transmits to the client the first and second subsets of data from the various sources in response to the data request.
H04L 67/568 - Storing data temporarily at an intermediate stage, e.g. caching
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 12/0866 - Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
Techniques are provided for atomic writes for persistent memory. In response to receiving a write operation, a new per-page structure with a new page block number is allocated. New data of the write operation is persisted to a new page of the persistent memory having the new page block number, and the new per-page structure is persisted to the persistent memory. If the write operation targets a hole after the new data and the new per-page structure have been persisted, then a new per-page structure identifier of the new per-page structure is inserted into a parent indirect page of a page comprising the new data. If the write operation targets old data after the new data and the new per-page structure have been persisted, then an old per-page structure of the old data is updated with the new page block number.
Approaches and mechanisms for cloning a file are described. A first node requests a clone of a file at a time when it also requests an exclusive delegation of the original file from a second node where the original file is stored. The second node marks the original file as delegated to the first node and the second node records an intent to create the clone file and a delegation record for the clone file. The second node creates the clone file. The delegation of and the identity of the clone file are returned to the first node. The first node marks in the delegation record that the clone file was committed in response to modification. If the clone file was committed the delegation is cleared and the clone file is kept, and if the clone file was not committed, the delegation is cleared, and the clone file is deleted.
METHODS AND SYSTEMS TO REDUCE LATENCY OF INPUT/OUTPUT (I/O) OPERATIONS BASED ON FILE SYSTEM OPTIMIZATIONS DURING CREATION OF COMMON SNAPSHOTS FOR SYNCHRONOUS REPLICATED DATASETS OF A PRIMARY COPY OF DATA AT A PRIMARY STORAGE SYSTEM TO A MIRROR COPY OF THE DATA AT A CROSS-SITE SECONDARY STORAGE SYSTEM
Multi-site distributed storage systems and computer-implemented methods are described for improving a resumption time of input/output (I/O) operations during a common snapshot process for storage objects. A computer-implemented method comprises performing a baseline transfer from at least one storage object of a first storage node to at least one replicated storage object of a second storage node, starting the common snapshot process including stop processing of I/O operations, performing a snapshot create operation on the primary storage site for the at least one storage object of the first storage node, resuming processing of I/O operations, and assigning a new universal unique identifier (UUID) to the at least one storage object of the second storage node after resuming processing of I/O operations with the new UUID to identify when file system contents are different than the baseline transfer.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 16/178 - Techniques for file synchronisation in file systems
71.
AUTOMATED REMEDIATION OF DEVIATIONS FROM BEST PRACTICES IN A DATA MANAGEMENT STORAGE SOLUTION
Systems and methods for automated remediation of deviations from best practices in the context of a data management storage system are provided. Deployed assets of a storage solution vendor may periodically deliver telemetry data to the vendor. The telemetry data may be processed by an AIOps platform to perform predictive analytics and arrive at “community wisdom” from the vendor's installed base. In one embodiment, an insight-based approach is used to facilitate risk detection and remediation including proactively addressing deviations from best practices before they turn into more serious problems. Based on the community wisdom and making a rule set and a remediation set derived therefrom available for use by auto-healing service associated with a customer's storage system, a risk (e.g., a deviation from a best practice) to which the storage system is exposed may be determined and a corresponding remediation may be deployed to address or mitigate the risk.
Systems and methods that make use of cluster-level redundancy within a distributed storage system to address various node-level error scenarios are provided. Rather than making use of a generalized one-size-fits-all approach in an effort to reduce complexity, an approach tailored to the node-level error scenario at issue may be performed to avoid doing more than necessary. According to one embodiment, responsive to identification of a failed RAID stripe by a node of a cluster of a distributed storage management system, for each block ID of multiple block IDs associated with the failed RAID stripe, a data block is restored corresponding to the block ID by reading the data block from another node of the cluster having a redundant copy of the data block; and writing the redundant copy of the data block to a storage area of the node that is unaffected by the failed RAID stripe.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
73.
TECHNIQUES FOR COORDINATING PARALLEL PERFORMANCE AND CANCELLATION OF COMMANDS IN A STORAGE CLUSTER SYSTEM
Various embodiments are directed to techniques for coordinating at least partially parallel performance and cancellation of data access commands between nodes of a storage cluster system. An apparatus may include a processor component of a first node coupled to a first storage device storing client device data; an access component to perform replica data access commands of replica command sets on the client device data, each replica command set assigned a set ID; a communications component to analyze a set ID included in a network packet to determine whether a portion of a replica command set in the network packet is redundant, and to reassemble the replica command set from the portion based if the portion is not redundant; and an ordering component to provide the communications component with set IDs of replica command sets of which the access component has fully performed the set of replica data access commands.
H04L 67/1097 - Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
H04L 67/1095 - Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Techniques are provided for implementing a defragmentation process during a merge operation performed by a re-compaction process upon a log structured merge tree. The log structured merge tree is used to store keys of key-value pairs within a key-value store. As the log structured merge tree fills with keys over time, the re-compaction process is performed to merge keys down to lower levels of the log structured merge tree to re-compact the keys. Re-compaction can result in fragmentation because there is a lack of spatial locality of where the re-compaction operations re-writes the keys within storage. Fragmentation increases read and write amplification when accessing the keys stored in different locations within the storage. Accordingly, the defragmentation process is performed during a last merge operation of the re-compaction process in order to store keys together within the storage, thus reducing read and write amplification when accessing the keys.
The disclosed technology enables quicker initialization of a new master node for a cluster when a previous master node fails by tracking node state in the cluster prior to being designated the new master node. In a particular example, a method includes, in a first node, designated as a current master node for the cluster, managing the cluster based on states of the nodes determined by the first node. While the first node is designated the master node, the method includes each of the nodes collecting, and storing locally, the states of the nodes. In response to a failure of the first node, the method includes selecting a second node of the nodes a new master node. Upon being designated the new master node, the method includes the second node managing the cluster of nodes based on the states of the nodes that the second node collected and stored locally.
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
H04L 41/0668 - Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
Systems and methods for creation and retention of immutable snapshots to facilitate ransomware protection are provided. According to one embodiment, multiple use cases for retention of snapshots are supported, including (i) maintaining a locked snapshot on a source volume of a first storage system on which it was originally created for at least an associated immutable retention time; (ii) replicating the locked snapshot to a destination volume of a second storage system and also maintaining the replica of the locked snapshot on the destination volume for at least the associated immutable retention time; and (iii) maintaining an unlocked snapshot on the source volume, replicating the unlocked snapshot to the destination volume, locking the replicated snapshot on the destination volume when it has an associated non-zero immutable retention time, and thereafter maintaining the replica on the destination volume in accordance with the immutable retention time.
Systems and methods for creation and retention of immutable snapshots to facilitate ransomware protection are provided. According to one embodiment, multiple use cases for retention of snapshots are supported, including (i) maintaining a locked snapshot on a source volume of a first storage system on which it was originally created for at least an associated immutable retention time; (ii) replicating the locked snapshot to a destination volume of a second storage system and also maintaining the replica of the locked snapshot on the destination volume for at least the associated immutable retention time; and (iii) maintaining an unlocked snapshot on the source volume, replicating the unlocked snapshot to the destination volume, locking the replicated snapshot on the destination volume when it has an associated non-zero immutable retention time, and thereafter maintaining the replica on the destination volume in accordance with the immutable retention time.
With a forever incremental snapshot configuration and a typical caching policy (e.g., least recently used), a storage appliance may evict stable data blocks of an older snapshot, perhaps unchanged data blocks of the snapshot baseline. If stable data blocks have been evicted, restore of a recent snapshot will suffer the time penalty of downloading the stable blocks for restoring the recent snapshot. Creating synthetic baseline snapshots and refreshing eviction data of stable data blocks can avoid eviction of stable data blocks and reduce the risk of violating a recovery time objective.
G06F 16/11 - File system administration, e.g. details of archiving or snapshots
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 12/121 - Replacement control using replacement algorithms
G06F 16/13 - File access structures, e.g. distributed indices
H04L 67/568 - Storing data temporarily at an intermediate stage, e.g. caching
42 - Scientific, technological and industrial services, research and design
Goods & Services
Technical consulting in the field of artificial intelligence (AI) software customization; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence (AI) for use in database management; Artificial intelligence as a service (AIAAS) services featuring software using artificial intelligence (AI) for use in machine learning; Design, deployment and management of artificial intelligence for the purpose of managing business data and workflows; Artificial intelligence training models for business data management; Providing online and cloud-based non-downloadable software for deploying, implementing, monitoring, securing, optimizing, analyzing, storing, managing, and troubleshooting artificial intelligence (AI) platforms and internal processes; technical consulting in the field of artificial intelligence (AI) software
Systems, methods, and machine-readable media are disclosed for isolating and reporting a volume placement error for a request to place a volume on a storage platform. A volume placement service requests information from a database using an optimized database query to determine an optimal location to place a new volume. The database returns no results. The volume placement service deconstructs the optimized database query to extract a plurality of queries. The volume placement service iterates over the plurality queries, combining queries in each iteration, to determine a cause for the database to return no results. The volume placement service determines based on the results of each iterative database request a cause the database to return an empty result. The volume placement service provides an indication of the cause for returning an empty result.
Methods and multi-site systems to provide recovery point objective (RPO) protection, snapshot retention between secondary storage site and tertiary storage site, and automatically initiating realignment and reconfiguration of a protection configuration from the secondary storage site to the tertiary storage site upon primary storage site failure
Multi-site distributed storage systems and computer-implemented methods are described for providing common snapshot retention and automatic fanout reconfiguration for an asynchronous leg after a failure event that causes a failover from a primary storage site to a secondary storage site. A computer-implemented method comprises providing an asynchronous replication relationship with an asynchronous update schedule from one or more storage objects of the first storage node to one or more replicated storage objects of the third storage node, creating a snapshot copy of the one or more storage objects of the first storage node, transferring the snapshot copy to the third storage node based on an asynchronous mirror policy, and intercepting the snapshot create operation on the primary storage site and transferring the snapshot copy to the second storage node to provide a common snapshot between the second storage node and the third storage node.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
A system can comprise a memory that stores computer executable components, and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a forecasting component that, based on performance data for a storage system, forecasts a performance metric for a storage unit subset of the storage system, wherein the performance metric is based on saturation of a capacity at the storage system related to the storage unit subset. An execution component can execute a modification at the storage system, wherein the modification at the storage system comprises changing a functioning of the storage system relative to the storage unit subset. The performance metric can be based on at least one of storage capacity or performance capacity for the subset.
A system is described. The system includes a processing resource and a non-transitory computer-readable medium, coupled to the processing resource, having stored therein instructions that when executed by the processing resource cause the processing resource to detect an unrecognized Internet Protocol Security (IPsec) packet associated with an IP address at a first node within a cluster, retrieve one or more selector fields from the IPsec packet, query of a security policy database to determine whether a destination IP address included in the one or more retrieved selector fields matches one or more matching outbound IPsec policies associated with a destination IP address, determine whether a matching outbound IPsec policy includes an IPsec policy associated with the destination address entry and establish the first IPsec SA communication session between the first node and the client based on the outbound IPsec policy.
Techniques are provided for combining data block and checksum block I/O into a single I/O operation. Many storage systems utilize checksums to verify the integrity of data blocks stored within storage devices managed by a storage stack. However, when a storage system reads a data block from a storage device, a corresponding checksum must also be read to verify integrity of the data in the data block. This results in increased latency because two read operations are being processed through the storage stack and are being executed upon the storage device. To reduce this latency and improve I/O operations per second, a single combined I/O operation corresponding to a contiguous range of blocks including the data block and the checksum block is processed through the storage stack instead of two separate I/O operations. Additionally, I/O operation may be combined into a single request that is executed upon the storage device.
G06F 3/06 - Digital input from, or digital output to, record carriers
86.
Methods and multi-site systems to provide recovery point objective (RPO) protection and automatically initiate realignment and reconfiguration of a protection configuration from the secondary storage site to the tertiary storage site upon primary storage site failure
A computer-implemented method comprises providing a synchronous replication relationship from one or more storage objects of a first storage node to one or more replicated storage objects of a second storage node, providing an asynchronous replication relationship with an asynchronous update schedule from the one or more storage objects of the first storage node to one or more replicated storage objects of the third storage node to provide a protection configuration, tracking, with the third storage node of the tertiary site, a state of the secondary storage site, automatically performing a failover from the primary storage site to the secondary storage site and activating a synchronous mirror copy for the one or more replicated storage objects of the second storage node, and automatically initiating realignment and reconfiguration of the protection configuration to the tertiary storage site based upon the state of the secondary storage site.
G06F 11/16 - Error detection or correction of the data by redundancy in hardware
G06F 11/20 - Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
87.
METHODS FOR HIERARCHICAL PROPAGATION IN TREE STRUCTURES AND DEVICES THEREOF
Methods, non-transitory machine readable media, and computing devices that provide more efficient hierarchical propagation in tree structures are disclosed. With this technology, a first delta record for a first interior node is created optionally in an atomic transaction along with updating a first tally record for a leaf node based on a first value. The transaction is in response to an action associated with the leaf node and the first interior node is a parent of the leaf node in a hierarchical tree. A timer associated with the first delta record is then set. A second value is updated in a second tally record for the first interior node based on the first value, when the timer has expired. Accordingly, this technology advantageously maintains recursive properties or values throughout a hierarchical tree continually, with reduced cost, even in a distributed network and in hierarchical trees with large numbers of nodes.
The instant disclosure provides a data structure store system and a method of managing data in the store. The method includes receiving, by a data structure store management system, a request for storing data from a client. In the created data structure, each data element includes a portion of the data. On receiving a read request for at least part of the data, the data structure store management system provides at least part of the data to a recipient device. The data elements are stored in persistent memory in the form of one or more non-volatile random access devices, wherein during a time interval between receiving the storing request and providing the at least part of the data by the data structure store management system to the recipient device, the data structure store management system provides no portion of the data for writing to a hard disk drive.
Techniques are provided for maintaining and recomputing reference counts in a persistent memory file system of a node. Primary reference counts are maintained for pages within persistent memory of the node. In response to receiving a first operation to link a page into a persistent memory file system of the persistent memory, a primary reference count of the page is incremented before linking the page into the persistent memory file system. In response to receiving a second operation to unlink the page from the persistent memory file system, the page is unlinked from the persistent memory file system before the primary reference count is decremented. Upon the node recovering from a crash, the persistent memory file system is traversed in order to update shadow reference counts for the pages with correct reference count values, which are used to overwrite the primary reference counts with the correct reference count values.
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
Techniques are provided for storage tier verification checks. A determination is made that a mount operation of an aggregate of a set of volumes stored within a multi-tier storage environment has completed. A first metafile and a second metafile are maintained to track information related to the storage of objects of a volume of the aggregate within a remote object store that is a tier of the multi-tier storage environment. A distributed verification is performed between the first metafile and the second metafile to identify an inconsistency. Accordingly, the first metafile and the second metafile are reconciled to address the inconsistency so that storage information within the first metafile and the second metafile are consistent.
Techniques are provided for caching data during an on-demand restore using a cloud block map. A client may be provided with access to an on-demand volume during a restore process that copies backup data from a snapshot within a remote object store to the on-demand volume stored within local storage. In response to receiving a request from the client for a block of the backup data not yet restored from the snapshot to the on-demand volume, the block may be retrieved from the snapshot in the remote object store. The block may be cached within a cloud block map stored within the local storage as a cached block. The client may be provided with access to the cached block.
G06F 3/06 - Digital input from, or digital output to, record carriers
G06F 11/14 - Error detection or correction of the data by redundancy in operation, e.g. by using different operation sequences leading to the same result
Techniques are provided for implementing a persistent key-value store for caching client data, journaling, and/or crash recovery. The persistent key-value store may be hosted as a primary cache that provides read and write access to key-value record pairs stored within the persistent key-value store. The key-value record pairs are stored within multiple chains in the persistent key-value store. Journaling is provided for the persistent key-value store such that incoming key-value record pairs are stored within active chains, and data within frozen chains is written in a distributed manner across distributed storage of a distributed cluster of nodes. If there is a failure within the distributed cluster of nodes, then the persistent key-value store may be reconstructed and used for crash recovery.
Systems and methods that make use of cluster-level redundancy within a distributed storage management system to address various node-level error scenarios are provided. Rather than using a generalized one-size-fits-all approach to reduce complexity, an approach tailored to the node-level error scenario at issue may be performed to avoid doing more than necessary. According to one embodiment, after identifying a missing branch of a tree implemented by a KV store of a first node of a cluster of a distributed storage management system, a branch resynchronization process may be performed, including, for each block ID in the range of block IDs of the missing branch (i) reading a data block corresponding to the block ID from a second node of the cluster that maintains redundant information relating to the block ID; and (ii) restoring the block ID within the KV store by writing the data block to the first node.
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
Systems and methods for enhancing container security are provided by reducing the attack surface. In one example, the exposure of containers to potential security vulnerabilities is reduced by identifying dynamically loaded symbols by an application via performance of static analysis (which may be referred to herein as static symbol analysis). Static symbol analysis may include examining one or more sections of an executable to identify dynamically loaded symbols corresponding to functions contained within shared libraries (e.g., shared object files and dynamic libraries). Based on a given shared library's usage of functions within standard libraries (e.g., the standard C library) and a known mapping between functions of standard libraries and kernel system calls, those kernel system calls potentially accessed by the application may be identified and a security policy may be generated and configured for enforcement by a kernel security module to limit kernel system call usage accordingly.
G06F 21/51 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems at application loading time, e.g. accepting, rejecting, starting or inhibiting executable software based on integrity or source reliability
G06F 21/52 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure
Techniques are provided for implementing garbage collection and bin synchronization for a distributed storage architecture of worker nodes managing distributed storage composed of bins of blocks. As the distributed storage architecture scales out to accommodate more storage and worker nodes, garbage collection used to free unused blocks becomes unmanageable and slow. Accordingly garbage collection is improved by utilizing heuristics to dynamically speed up or down garbage collection and set sizes for subsets of a bin to process instead of the entire bin. This ensures that garbage collection does not use stale information about what blocks are in-use, and ensures garbage collection does not unduly impact client I/O processing or conversely falls behind on garbage collection. Garbage collection can be incorporated into a bin sync process to improve the efficiency of the bin sync process so that unused blocks are not needlessly copied by the bin sync process.
Techniques are provided for implementing garbage collection and bin synchronization for a distributed storage architecture of worker nodes managing distributed storage composed of bins of blocks. As the distributed storage architecture scales out to accommodate more storage and worker nodes, garbage collection used to free unused blocks becomes unmanageable and slow. Accordingly garbage collection is improved by utilizing heuristics to dynamically speed up or down garbage collection and set sizes for subsets of a bin to process instead of the entire bin. This ensures that garbage collection does not use stale information about what blocks are in-use, and ensures garbage collection does not unduly impact client I/O processing or conversely falls behind on garbage collection. Garbage collection can be incorporated into a bin sync process to improve the efficiency of the bin sync process so that unused blocks are not needlessly copied by the bin sync process.
Techniques are provided for upgrading an external distributed storage layer that provides storage services to containerized applications hosted within a container hosting platform. An operator within the container hosting platform is custom configured to orchestrate, from within the container hosting platform, the upgrade for the external distributed storage layer. Because the external distributed storage layer and the container hosting platform are separate computing environment that utilize different namespaces, semantics, operating states, and/or application programming interfaces, a cluster controller within the container hosting platform is custom configured to reformat/translate commands between the external distributed storage layer and the container hosting platform for performing the upgrade. Because the external distributed storage layer upgrade may be part of an overall upgrade that upgrades the containerized applications hosted within the container hosting platform, the operator and cluster controller provide a single upgrade orchestration point for perform both upgrades in an orchestrated manner.
Systems and methods for reducing the provisioned storage capacity of a disk or aggregate of disks of a storage appliance while the storage appliance continues to serve clients are provided. According to one embodiment, the size of the aggregate may be reduced by shrinking the file system of the storage appliance and removing a selected disk from the aggregate. When an identified shrink region includes the entire addressable PVBN space of the selected disk, the file system may be shrunk by relocating valid data from the selected disk elsewhere within the aggregate. After the valid data is relocated, the selected disk may be removed from the aggregate, thereby reducing the provisioned storage capacity of the aggregate by the size of the selected disk.
G06F 3/06 - Digital input from, or digital output to, record carriers
99.
ENHANCING CONTAINER SECURITY BY PERFORMING CONTAINER VULNERABILITY REDUCTION BASED ON STATIC ANALYSIS OF DYNAMICALLY LOADED SYMBOLS AND SYSTEM CALL BLOCKING
Systems and methods for enhancing container security are provided by reducing the attack surface. In one example, the exposure of containers to potential security vulnerabilities is reduced by identifying dynamically loaded symbols by an application via performance of static analysis (which may be referred to herein as static symbol analysis). Static symbol analysis may include examining one or more sections of an executable to identify dynamically loaded symbols corresponding to functions contained within shared libraries (e.g., shared object files and dynamic libraries). Based on a given shared library's usage of functions within standard libraries (e.g., the standard C library) and a known mapping between functions of standard libraries and kernel system calls, those kernel system calls potentially accessed by the application may be identified and a security policy may be generated and configured for enforcement by a kernel security module to limit kernel system call usage accordingly.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
ENHANCING CONTAINER SECURITY BY PERFORMING CONTAINER VULNERABILITY REDUCTION BASED ON STATIC AND DYNAMIC ANALYSIS OF DYNAMICALLY LOADED SYMBOLS AND SYSTEM CALL BLOCKING
Systems and methods for enhancing container security are provided by reducing the attack surface. In one example, the exposure of containers to potential security vulnerabilities is reduced by identifying dynamically loaded symbols by an application via performance of static symbol analysis by examining a section of an executable to identify dynamically loaded symbols corresponding to functions contained within shared libraries. Based on a given shared library's usage of functions within standard libraries and a known mapping between functions of standard libraries and system calls, those system calls potentially accessed by the application may be identified and a security policy may be generated and configured for enforcement by a kernel security module to limit system call usage accordingly. Thereafter, the security policy enforced by the kernel security module may be refined based on performance of dynamic symbol analysis to identify system calls that are actually called by the application during runtime.
G06F 21/57 - Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities