The described system aims to reduce or eliminate inaccuracies and hallucinations in responses generated by a machine learning model when processing user queries. The data platform parses and categorizes the text within data files to create structured textual representations. The user submits multiple prompts which are collectively assessed to refine and modify the initial queries.
The described system aims to reduce or eliminate inaccuracies and hallucinations in responses generated by a machine learning model when processing user queries. The data platform parses and categorizes the text within data files to create structured textual representations. The user submits multiple prompts which are collectively assessed to refine and modify the initial queries.
The modified query is used to identifying segments of data files that are most relevant to the query. These relevant portions are then compiled into a Retrieval-Augmented Generation (RAG) context block. This RAG context block is fed into a prompt response machine learning model, which processes the enriched information to generate a well-informed and accurate response to the user's query. Finally, this response is displayed back to the user through the chat interface, completing a cycle that enhances the reliability and relevance of machine-generated answers.
Techniques for managing states of virtual warehouses in a multi-tenant network-based data system are described. A “resolver” may be provided in each warehouse scheduling service thread. The resolver may maintain a current state of the virtual warehouse and may generate a target state of the virtual warehouse based on an operation request, such as a resume operation, a suspend operation, resize operation, etc. The resolver may generate an action plan to converge the current state to the target state.
Questionnaire completion systems and methodologies for a data platform. The data platform receives from a consumer an unstructured questionnaire to be completed based on structured database objects, semi-structured database objects, and unstructured database objects stored on the data platform by a provider. The data platform generates a secured completion of the unstructured questionnaire based on a questionnaire completion model and the unstructured questionnaire. The data platform determines a confidence score for the completion and in response to determining the confidence score does not exceed a threshold value, the data platform generates a structured query based on the unstructured questionnaire and a structured query model, and generates the secured completion based on querying the structured database objects using the structured query. The data platform applies a security function to the secured completion to generate a completion of the unstructured questionnaire and provides the completion to the consumer.
A query engine can use partition-granular level statistics to optimize query performance. A query can reference a table with a plurality of partitions and include a predicate. A partition-granular selectivity estimate for the predicate can be generated based on statistics stored regarding the plurality of partitions of the table. A query plan can be generated based on partition-granular selectivity estimate to optimize query processing.
Methods, systems, and computer programs are presented for providing performance metrics in an online performance analysis system employing customer objects, such as database tables. A plurality of metric source data associated with a plurality of objects is accessed and a subset of the plurality of objects is determined that satisfies stableness criteria based on the plurality of metric source data to identify a set of stable objects. A set of metrics is generated based on the subset of the plurality of objects that satisfies the stableness criteria.
Antagonistic queries can have a high resource and time footprint triggering a range of issues, such as causing compilation performance degradation of other queries and machine failures. Described herein are techniques for automatically identifying antagonistic queries and redirecting the antagonistic queries to dedicated resources. This redirecting can help better balance the workload on different work clusters and to isolate antagonistic workloads from impacting the compilation and execution performance of other queries.
Various embodiments described herein provide for systems, methods, devices, instructions, and like for generating synthetic data. According to various embodiments, synthetic data generation comprises receiving input specifying one or more source tables and join key columns, and generating synthetic data that preserves statistical similarity and referential integrity among columns of the source data.
A system is disclosed for recovering historical table data in a database environment. The system includes at least one hardware processor and at least one memory. The memory stores instructions that, when executed, cause the system to receive a request to recover historical table data of a source table. The historical table data includes multiple partition files, and each partition file includes a deleted file designation. The system performs a recovery process on the partition files by determining a recoverable time range for the source table based on lifecycle information and restoring the partition files based on the recoverable time range. The system retrieves a schema associated with the historical table data and generates metadata corresponding to the schema. The metadata is associated with the recovered partition files to reconstruct the historical table data. This approach allows efficient and reliable recovery of deleted or lost table data.
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
G06F 16/22 - IndexationStructures de données à cet effetStructures de stockage
9.
FILE-BASED ERROR HANDLING DURING INGESTION WITH TRANSFORMATION
A data platform including an error handling framework for loading of input data. The data platform generates input data columns based on an input file and generates result data columns based on the input data columns and evaluating expressions. The data platform detects projection errors during the generating of the result data columns and stores result error indicators in error indicator arrays of the result data columns based on the projection errors. The data platform generates filtered result data columns based on the result data columns and the result error indicator arrays of the result data columns and stores the filtered result data columns in a database of the data platform.
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
G06F 16/25 - Systèmes d’intégration ou d’interfaçage impliquant les systèmes de gestion de bases de données
10.
EXECUTING DIFFERENTIALLY PRIVATE QUERY USING STRUCTURED LANGUAGE PARSING
Various example embodiments described herein provide for systems, methods, devices, instructions, and the like for structured language parsing to execute a differentially private query on a database system. According to some example embodiments, a user (e.g., an analyst) submits to a data system (e.g., data platform) a differentially private query using a structured language interface (e.g., SQL interface), which causes the calling of one or more stored procedures on the data system, where the one or more stored procedures encapsulate or facilitate use of a differential privacy engine, which can execute the differentially private query and generate a differentially private query result.
A system is disclosed that includes one or more hardware processors and at least one memory storing instructions. The system receives a first query directed towards a shared dataset and accesses a first set of data from a first table in the shared dataset. The system determines that an aggregation constraint policy is attached to the first table, which restricts output of data values stored in the table. The system performs a uniqueness check on join keys for a join operation associated with the first table, verifying that at least one row from the first table is not amplified in the result. The system enforces the aggregation constraint policy on the first query based on this verification. The system generates an output to the first query based on the first set of data. This approach helps control data aggregation and ensures privacy when accessing shared datasets.
Systems and methods are provided for creating a secure database execution environment. The system generates, by a database system executing on a secure enclave, attestation information. The system transmits the attestation information to a remote entity. The system obtains, by the database system executing on the secure enclave, one or more encryption keys in response to the remote entity authenticating the attestation information. The system performs, by the database system executing on the secure enclave, one or more database operations on encrypted data stored on the database system using the one or more encryption keys.
Described is a system for join constraints for query processing by receiving a first query directed towards a shared dataset in a data clean room; assessing the first query to identify that the one or more functions s at least a join function; determining that the first query is configured to join a first set of data from the shared dataset with a second set of data using the join function; determining that a join constraint policy is to be enforced in relation to the first query; and generating an output to the first query based on the execution of the one or more functions, the output to the first query without data values stored in the portion of the first set of data based on determining that the join constraint policy is to be enforced in relation to the first query.
The subject technology sends, from a child rowset operator (RSO) instance, a first request for performing a user defined aggregate function (UDAF) to a user defined function (UDF) server to initialize an aggregate state for a set of aggregation groups and update aggregated states for each aggregation group from the set of aggregation groups, the first request including a set of input rows. The subject technology receives information comprising a computation status of the UDAF. The subject technology sends, by the child RSO instance, a second request to the UDF server to update the aggregated states for each aggregation group from the set of aggregation groups, the second request including a second set of input rows. The subject technology receives an aggregate states vector with one entry per aggregation group. The subject technology sends, by the child RSO instance, the aggregate states vector to a parent RSO instance.
A join decision manager (JDM) generates a data processing pipeline. The data processing pipeline includes at least one join operation associated with build-side row data and probe-side row data. The JDM determines the maximum cardinality associated with the probe-side row data. The JDM determines size of the build-side row data at a decision node of the at least one join operation. The JDM configures execution of the at least one join operation as one of a broadcast join or a hash-hash join based on the size of the build-side row data and the maximum cardinality.
Disclosed are techniques for routing and filtering telemetry data based on custom telemetry definitions provided by a user. A telemetry filter definition comprising rules for routing and filtering telemetry data may be converted into a common expression language (CEL) abstract syntax tree (AST). The CEL AST may be provided to a filtering component, which may compile the CEL AST into a CEL filter program comprising the rules for routing and filtering telemetry data. In response to receiving telemetry data, filtering, by the filtering component, the received telemetry data based on the CEL filter program to generate filtered telemetry data.
Systems and methods are provided for controlling the deletion of data in a database system. The system receives input comprising a deletion criterion for a database system. The system applies the deletion criterion to a set of tables of the database system. The system determines that an individual portion of the set of tables satisfies the deletion criterion. In response to determining that the individual portion of the set of tables satisfies the deletion criterion, the system transfers the individual portion of the set of tables to a temporary storage system.
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
G06F 16/11 - Administration des systèmes de fichiers, p. ex. détails de l’archivage ou d’instantanés
G06F 16/16 - Opérations sur les fichiers ou les dossiers, p. ex. détails des interfaces utilisateur spécialement adaptées aux systèmes de fichiers
18.
SHARING EVENTS AND OTHER METRICS IN NATIVE APPLICATIONS
Disclosed is an execution information sharing system that writes execution information to a provider target (and other targets) in a secure manner. Execution information generated by an application may be written to a consumer stage, wherein the application is shared by a provider account of a data exchange with a consumer account that executes the application. A consumer exchange service(ES) of the data exchange may send a request to a copy service of the data exchange to copy the execution information from the consumer stage to the provider stage, wherein the consumer ES is a part of the data exchange and is protected from actions of the consumer account. A copy operation may be executed to copy the execution information from the consumer stage to the provider stage using the copy service of the data exchange. The execution information is ingested from the provider stage to a provider table.
Disclosed are techniques for using an application control framework to build, share and manage access to and usage of applications via a data sharing platform. An application control framework may provide a number of predefined controls and may receive values for certain predefined controls as well as custom control definitions and corresponding values from a provider. The application control framework may also receive application logic and may build an application package comprising the application logic and a set of controls including the predefined and custom controls to manage access to and usage of the application. In response to a consumer of the data sharing platform importing the application package, the application control framework may call the set of install scripts to install an instance of the application in the consumer account using the application logic and manage access to the application instance by the consumer using the set of controls.
A hot server is identified from a plurality of servers based on one or more server metrics associated with the hot server. A hot data range stored by the hot server is identified based on one or more read density metrics. The hot data range comprises a range of data values with a higher volume of access requests compared to other data values stored by the hot server. The hot data range is replicated across a number of additional servers.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 9/50 - Allocation de ressources, p. ex. de l'unité centrale de traitement [UCT]
G06F 16/25 - Systèmes d’intégration ou d’interfaçage impliquant les systèmes de gestion de bases de données
Embodiments of the present disclosure provide techniques for efficient computation over a wide table. A processing device determines that a first number of columns of a first table is greater than a threshold number of columns. The processing device transforms the first table into a second table based on the determination, where the second table includes a second number of columns that is less than the first number of columns, and where the second table includes a first column that includes first fields that identify columns of the first table, a second column that includes second fields that identify data types of fields of the first table, and a third column that includes third fields that include data of the fields of the first table. The processing device executes a UDTF on the second table.
Cloning operations can be used generate snapshots of tables at specified times. The snapshot objects can be stored in a first-tier storage with the table, where the cloned versions of the tables and the table may share files, such as micro-partition files, to conserve storage resources. After a first expiration time, snapshot objects can be transferred from the first-tier storage to a second-tier storage to further save on storage costs. After a second expiration time (e.g., full retention period), the snapshot objects can be deleted from the second-tier storage as well.
A data platform is provided that uses a replication cache to replicate data. The data platform is designed to receive a replication request from a secondary deployment that includes a request for a data transfer of data files from a primary deployment. The data platform analyzes metadata of a replication cache and the primary deployment to identify the data files for replication. Based on this metadata, the data platform determines whether to route the data transfer through the replication cache or directly from the primary deployment to the secondary deployment. The data transfer is then routed accordingly, and the receipt of the data transfer at the secondary deployment is verified.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 11/34 - Enregistrement ou évaluation statistique de l'activité du calculateur, p. ex. des interruptions ou des opérations d'entrée–sortie
Systems and methods for replicating unstructured staged data between remote database deployments are disclosed. The system includes at least one hardware processor and memory storing instructions that identify staged data at a first database deployment for replication to a second, remote database deployment. The staged data includes unstructured data items stored in a storage resource associated with the first database deployment. The system replicates a directory from the first database deployment to the second, where the directory includes information identifying the unstructured data items. Metadata is also replicated, including references to the locations of the unstructured data items in the storage resource. The second database deployment is enabled to access one or more unstructured data items from the storage resource of the first database deployment using the directory and references, without duplicating the data. Incremental replication of additional staged data is facilitated based on a comparison of directories between deployments.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 16/25 - Systèmes d’intégration ou d’interfaçage impliquant les systèmes de gestion de bases de données
25.
DYNAMIC TABLES WITH EXTERNALLY MANAGED ICEBERG SOURCE TABLES
Provided herein are systems and methods for configuring dynamic tables with externally-managed Iceberg source tables. An example method for updating a dynamic table using data from an Iceberg source table includes generating, for each row in an Iceberg source table, a row identifier derived from immutable metadata associated with a physical storage location of the row and a position of the row within the physical storage location. The method further includes generating, for each of a first version and a second version of the Iceberg source table, a set of the row identifiers by computing the row identifier for each row present in the respective version. The sets of the row identifiers are compared between the first version and the second version of the Iceberg source table to identify changes at a row level. A dynamic table associated with the Iceberg source table is updated based on the identified changes.
To provide outbound private link support for a multi-tenant data system with tenant isolation, a separate, dedicated virtual network is provided, referred to as private link (PL) virtual network. The PL virtual network may host a plurality of host interface endpoints and resource endpoints. A core virtual network and the PL virtual network may be peered together to work in conjunction. The private endpoints in the PL virtual network may then be connected to external systems using a private link without exposure to the public internet.
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
27.
HASH-JOIN BROADCAST DECISION MAKING IN DATABASE SYSTEMS
A system includes at least one hardware processor and memory storing instructions. The processor generates a query plan for a received query. The query plan includes multiple hash-join-build and hash-join-probe operations. A primary decision node is configured in the query plan. The primary decision node receives build-side data information from the hash-join-build operations. For each hash-join-build operation, a memory amount for performing a broadcast is determined. A subset of hash-join-build operations is selected for broadcast join distribution by comparing the memory amount to a broadcast memory threshold. The system selects a broadcast join distribution for the subset and a hash-hash join distribution for the remaining hash-join-build operations. The query plan is executed using the broadcast join distribution for the selected subset and the hash-hash join distribution for the remaining operations. This approach optimizes memory usage and join distribution during query execution.
The subject technology receives a first query plan corresponding to a query, the first query plan comprising a set of predicates. The subject technology receives, during execution of a first portion of the first query plan, a set of rowsets. The subject technology determines a set of metrics for a first number of rows from a plurality of rows, the first number of rows corresponding to a first predicate order. The subject technology determines, using a heuristic, a second predicate order based at least in part on the set of metrics. The subject technology processes, during execution of the first portion of the first query plan using the second predicate order, a second set of rowsets, the second set of rowsets comprising a second plurality of rows that correspond to the first portion of the first query plan that has been executed based on the second predicate order.
Provided herein are systems and methods for configuring managed dynamic Iceberg tables. An example method includes parsing, by at least one hardware processor, a table definition to determine a lag duration value, an external volume indicator, and a location indicator. A dynamic table (DT) manager generates a dynamic Iceberg table based on the table definition. The generating is based on selecting an external storage volume of a network-based database system based on the external volume indicator and the location indicator. The DT manager stores a base Iceberg table at a storage location associated with the external storage volume. The DT manager configures the base Iceberg table as the dynamic Iceberg table based on the lag duration value. The lag duration value indicates a maximum time period that a result of a prior refresh of the dynamic Iceberg table lags behind a current time instance.
Various embodiments described herein provide for systems, methods, devices, instructions, and like for swapping artificial intelligence models, such as large language models (LLMs), based on inference request monitoring. In particular, some embodiments monitor inference requests submitted to various inference engines (where each inference engine comprises a group of software containers sharing assigned computing resources) and, based on analysis of inference request data, available models, currently loaded models, or a combination thereof, determine whether to swap out a set of AI models currently active on a select inference engine with another set of AI models available on the select inference engine.
A tag propagator may obtain a SQL statement. As a result of obtaining the SQL statement, object dependencies between objects referenced in the SQL statement may be determined. Tags associated with the determined object dependencies may be further determined. The tags may be propagated.
As described herein, a N-Gram index may be created and the search may be conducted using the index, which will lead to faster search results. The N-Gram index may also include partial N-Gram components to capture more relevant data. Moreover, as described herein, the search may also take into account recent log data that has not yet been indexed. Techniques for building an index store using log data and efficiently searching the index store and log data to process search requests are described herein.
G06F 7/14 - Interclassement, c.-à-d. association d'au moins deux séries de supports d'enregistrement, chacun étant rangé dans le même ordre de succession, en vue de former une série unique rangée dans le même ordre de succession
G06F 16/22 - IndexationStructures de données à cet effetStructures de stockage
The subject technology receives a query, the query including a set of statements, the set of statements including a function call, the function call including a declaration of a vector data type as an argument of the function call. The subject technology processes the query, the processing including invoking the function call. The subject technology provides a set of query results from processing the query, the set of query results including a vector data structure corresponding to the vector data type, the vector data structure including a set of elements, each element comprising a numerical data type.
Provided herein are systems and methods for configuring automatic evolution of dynamic tables. An example method includes parsing, by at least one hardware processor, a query associated with a dynamic table to determine a current base object dependency of the dynamic table on at least a first base object. A prior base object dependency of the dynamic table on at least a second base object is retrieved. A delta between data stored by the at least first base object and data stored by the at least second base object is determined. The dynamic table is updated based on the delta.
Embodiments of the present disclosure provide techniques for mountless querying of listing data. A processing device obtains a query that includes a universal listing identifier of a database, wherein the universal listing identifier is different from an identifier for the database. The processing device activates, at runtime, at least one role for accessing the database and shared objects based on the universal listing identifier. The processing device generates, based on the universal listing identifier and the at least one activated role, an in-memory placeholder object associated with the database. The processing device provides access to data of the database based on the in-memory placeholder object and the query.
A data platform is provided. The data platform is configured to receive a request from a client device of a user to run a web application within a computing environment. It initiates an execution of the web application and determines the availability of a cached user interface state of the web application. Upon determining that the cached user interface state is available, the data platform fetches the cached user interface state from the datastore and communicates it to the client device. This allows for displaying an initial user interface to a user by the client device using the cached user interface state while continuing to initialize the web application as the initial user interface is displayed.
G06F 16/957 - Optimisation de la navigation, p. ex. mise en cache ou distillation de contenus
G06F 9/451 - Dispositions d’exécution pour interfaces utilisateur
G06F 16/953 - Requêtes, p. ex. en utilisant des moteurs de recherche du Web
G06F 21/53 - Contrôle des utilisateurs, des programmes ou des dispositifs de préservation de l’intégrité des plates-formes, p. ex. des processeurs, des micrologiciels ou des systèmes d’exploitation au stade de l’exécution du programme, p. ex. intégrité de la pile, débordement de tampon ou prévention d'effacement involontaire de données par exécution dans un environnement restreint, p. ex. "boîte à sable" ou machine virtuelle sécurisée
37.
CONFIGURING INTERACTIONS BETWEEN PYTHON AND SQL CELLS IN A NOTEBOOK
Provided herein are systems and methods for configuring interactions between Python and SQL cells in a notebook. An example method includes detecting a run cell message received from a notebook UI application. The run cell message specifies a set of cells of a notebook. At least a first cell of the set of cells is configured as an SQL cell within the notebook. A query within at least one SQL statement associated with the SQL cell is executed to generate cell results. The cell results of the SQL cell are stored in a global namespace of the notebook. Access to the cell results in the global namespace is configured to at least a second cell of the set of cells.
Data replication can be used to copy database data from a primary deployment to a secondary deployment in a network-based data system. Logical representation of the clone tables in the secondary deployment can be used to reduce data transfer and storage costs. In response to a refresh request, the data system may clone from existing tables stored in the secondary deployment by applying a difference operation on the existing tables instead of copying entire cloned tables for each refresh request.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 16/11 - Administration des systèmes de fichiers, p. ex. détails de l’archivage ou d’instantanés
G06F 16/174 - Élimination de redondances par le système de fichiers
39.
DISCRETE WORKLOAD PROCESSING USING A PROCESSING PIPELINE
Provided herein are systems and methods for discrete workload processing using a file processing service. An example method includes retrieving a manifest file from a work queue. The manifest file includes metadata associated with a plurality of workloads. A plurality of processing configurations corresponding to the plurality of workloads is generated. A processing configuration of the plurality of processing configurations is associated with scheduling execution of one or more tasks for a workload of the plurality of workloads. A processing pipeline definition of the manifest file is generated. The processing pipeline definition includes the plurality of processing configurations. The processing pipeline definition is registered with a pipeline definition registry of a network-based database system to generate a definition registration.
Provided herein are systems and methods for source monitoring associated with discrete workload processing. An example method includes generating a processing pipeline definition comprising a plurality of configurations associated with a corresponding plurality of notification fetching jobs. A source monitor definition is generated based on the processing pipeline definition. A source monitor definition instance is instantiated based on the source monitor definition. One or more notifications associated with a data source are fetched based on executing at least one notification fetching job of the plurality of notification fetching jobs configured in the source monitor definition instance.
Provided herein are systems and methods for data table auto-refresh. An example method includes configuring a first processing pipeline definition comprising a first plurality of configurations associated with a corresponding plurality of notification fetching jobs for metadata of a database table. A second processing pipeline definition is configured to include a second plurality of configurations associated with the metadata. A source monitor pipeline is instantiated based on the first processing pipeline definition to fetch a manifest file based on the first plurality of configurations. A refresh pipeline is instantiated based on the second processing pipeline definition to perform a refresh operation of the metadata and generate refreshed metadata based on the second plurality of configurations.
Replication of data is disclosed. A method includes replicating the data stored in a primary deployment hosted by a first cloud storage provider such that the data is further stored in a secondary deployment hosted by a second cloud storage provider. The method includes determining that the primary deployment transitioned from an available state to an unavailable state. The method includes executing one or more transactions on the data at the secondary deployment to cause a change to the data in response to determining that the primary deployment is unavailable.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
H04L 67/1097 - Protocoles dans lesquels une application est distribuée parmi les nœuds du réseau pour le stockage distribué de données dans des réseaux, p. ex. dispositions de transport pour le système de fichiers réseau [NFS], réseaux de stockage [SAN] ou stockage en réseau [NAS]
43.
DATA CONSISTENCY SERVICE FOR INTERNAL AND EXTERNAL VOLUMES
A network-based database system that performs consistency checks on data files to which the network-based database system does not have write access is provided. The network-based database system monitors a data file stored in a read-only storage system for changes. Upon detecting a change, the network-based database system performs a data consistency check using the content of the data file and its first metadata. If an inconsistency between the content and the first metadata is detected, the network-based database system sets a flag in second metadata, which is stored in a writable storage system, indicating the detected inconsistency. The network-based database system detects this flag during the execution of a query against a data object of the data file and executes the query without query performance tuning based on the detection of the flag, ensuring accurate query results.
Various embodiments described herein provide for systems, methods, devices, instructions, and like for generating a structured language data query based on a natural language request and context data relating to a schema of a data store (e.g., database or the like). In particular, some embodiments use a set of large language models to generate a structured language data query for a data store based on the natural language request and the context data, where the response comprises a structured language data query for a data store, and a natural language explanation of the structured language data query.
The subject technology receives blob metadata from a key-value store. The subject technology retrieves a blob file from a blob store based on the blob metadata; the blob file comprises at least one of a snapshot file or a delta file. The subject technology transforms the blob file from a first format to a column file format, the transformation comprising converting data from the blob file to rowsets and writing the rowsets into a file in the column file format. The subject technology stores the file in a local cache.
A data platform for managing an application as a first-class database object. The data platform includes at least one processor and a memory storing instructions that cause the at least one processor to perform operations including detecting a data request from a browser for a data object located on the data platform, executing a stored procedure, the stored procedure containing instructions that cause the at least one processor to perform additional operations including instantiating a User Defined Function (UDF) server, an application engine, and the application within a security context of the data platform based on a security policy determined by an owner of the data object. The data platform then communicates with the browser using the application engine as a proxy server.
G06F 16/955 - Recherche dans le Web utilisant des identifiants d’information, p. ex. des localisateurs uniformisés de ressources [uniform resource locators - URL]
47.
CONFIGURATION OF ACCESS CONTROL FOR A PACKAGES POLICY
A system includes one or more hardware processors and at least one memory storing instructions. The hardware processors receive a packages policy for a cloud data platform account, the packages policy including at least one allowlist and at least one blocklist. The hardware processors receive a request to generate a report associated with the packages policy. In response, the hardware processors generate a report identifying, for the account, packages or versions of packages allowed by the allowlist and packages or versions of packages blocked by the blocklist, at a specified time or over a specified period. The hardware processors generate a notification to a user when a package is added to or removed from the allowlist or blocklist, the notification including a summary of changes and a reference to access an updated version of the report.
G06F 21/44 - Authentification de programme ou de dispositif
G06F 21/53 - Contrôle des utilisateurs, des programmes ou des dispositifs de préservation de l’intégrité des plates-formes, p. ex. des processeurs, des micrologiciels ou des systèmes d’exploitation au stade de l’exécution du programme, p. ex. intégrité de la pile, débordement de tampon ou prévention d'effacement involontaire de données par exécution dans un environnement restreint, p. ex. "boîte à sable" ou machine virtuelle sécurisée
48.
COLUMNAR DATA ANONYMIZATION USING SEMANTIC AND PRIVACY CATEGORY-BASED EXPANSION AND TRANSFORMATION
An approach is disclosed that retrieves data from a data set organized in multiple columns, where a first column includes both a first and a second data type. The approach expands the first column into a second column for the first data type and a third column for the second data type; determines a semantic category for each data type; and assigns a privacy category to each semantic category. The approach then anonymizes the second column using a first anonymization technique based on the first privacy category, and anonymizes the third column using a second anonymization technique based on the second privacy category. In turn, the approach generates an anonymized view of the data set using the anonymized data.
Provided herein are systems and methods for dynamic table replication. A method includes configuring a first DT within a first failover group. The method further includes causing replication of the first DT from a primary deployment of a network-based database system to a second DT in a secondary deployment of the network-based database system. The method further includes configuring the second DT as a primary DT in the secondary deployment based on detecting a failover event in the primary deployment. The method further includes performing an automatic refresh of the primary DT in the secondary deployment based on a scheduling state of the first DT in the primary deployment prior to the failover event.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
Disclosed are techniques for selectively sharing with a provider account of a data exchange, events generated by an application shared by the provider account. A set of telemetry definitions may be defined for a data listing via which an application is shared by a provider account of a data sharing platform. Each of the set of telemetry definitions specifies a type of event generated by the application and a corresponding sharing requirement for the type of event. The set of telemetry definitions are persisted as metadata associated with the data listing. The application may be installed in a consumer account of the data exchange. In response to the application generating a plurality of events, a subset of the plurality of events may be shared with the provider account, wherein the subset of the plurality events that is shared is based in part on the set of telemetry definitions.
An entity-level privacy system receives a query directed towards a shared dataset, the shared dataset comprising one or more data entries associated with one or more distinct entities, each entity of the one or more distinct entities being identifiable by one or more unique entity identifiers. The entity-level privacy system implements an entity-level privacy constraint, the entity-level privacy constraint comprising a dynamic aggregation constraint based on the one or more unique entity identifiers. The entity-level privacy system determines that the one or more unique entity identifiers satisfy a threshold condition comprising a minimum number of entities. The entity-level privacy system enforces the entity-level privacy constraint on the query and generates an output to the query based on the entity-level privacy constraint and the dynamic aggregation constraint while maintaining entity-level privacy associated with the one or more distinct entities.
Various embodiments described herein provide for systems, methods, devices, instructions, and like for generating a structured language data query based on a natural language question and semantic data associated with a schema of a data store (e.g., database or the like). In particular, some embodiments use a set of large language models to generate a structured language data query for a data store based on semantic data and the natural language question, determines whether the structured language data query is valid, causes the structured language data query to be performed on a data store in response to determining that the structured language data query is valid, and generating a response that comprises a query result from the data store.
Some embodiments include information retrieval through query history insights by accessing query history of a first user, processing the query history of the first user using a first machine learning model to identify naming characteristics of the query history specific for the first user, and enriching a database comprising data associated with the first user with the identified naming characteristics of the query history. The system receives a new search query in natural language from the first user, processes the new search query in the natural language using a second machine learning model to identify embeddings within the new search query, identifies one or more recommended tables and corresponding columns, and causes display of the recommended tables and corresponding columns for each of the recommended tables by a user device of the first user.
Provided herein are systems and methods for a zero-copy clone of a DT. A method includes performing a clone operation on a dynamic table (DT) to generate a cloned DT. The DT is based on a query applied on a base table. The cloned DT is based on the query applied on a cloned base table corresponding to the base table. A first delta is determined based on at least one change in the base table between a first version of the base table used by the DT at a time of the clone operation and a second version of the base table generated prior to the clone operation. A first refresh operation of the cloned DT is performed based on the first delta.
G06F 16/27 - Réplication, distribution ou synchronisation de données entre bases de données ou dans un système de bases de données distribuéesArchitectures de systèmes de bases de données distribuées à cet effet
G06F 16/22 - IndexationStructures de données à cet effetStructures de stockage
A data platform that performs a file existence check is provided. The data platform creates a bounded page and selects a set of selected metadata files from a set of metadata files, where each selected metadata file includes a set of data file metadata files. Each member of the set of data file metadata files includes a file name of a respective data file. The data platform stores the set of data file metadata files of each selected metadata file in a first sorted list in the bounded page. The data platform retrieves a second sorted list of file names of a set of data files stored on a data storage system. The data platform determines the existence of each respective data file of each member of the set of data file metadata files on the data storage system by comparing the first sorted list to the second sorted list.
G06F 7/08 - Tri, c.-à-d. rangement des supports d'enregistrement dans un ordre de succession numérique ou autre, selon la classification d'au moins certaines informations portées sur les supports
G06F 16/174 - Élimination de redondances par le système de fichiers
56.
INTEGRATE SQL DATABASE WITH CONTAINER EXECUTION MANAGEMENT
The subject technology receives a first set of statements, the first set of statements comprising at least a first statement to create a particular service associated with a container service. Further, the subject technology instantiates the particular service associated with the container service, in response to the first statement, at a first cluster, the first cluster including a first set of worker nodes.
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
The subject technology receives a first request to create a container service, the request indicating a service specification for creating the container service. The subject technology generates a set of endpoints based on the service specification. The subject technology generates a set of roles based on the service specification. The subject technology stores service metadata related to the set of endpoints and the set of roles in a metadata database. The subject technology instantiates the container service at a container services cluster, the container services cluster including a set of worker nodes, the container service being deployed on a worker node from the set of worker nodes, and enforces security policies based on the roles and service metadata. The subject technology coordinates with Role Based Access Control (RBAC) and network policies of the subject database system and transparently enforces the same policies over in the subject container system.
Techniques for providing an interface for viewing real-time metadata stored in different locations and in different formats are described. A monitoring schema may process queries related to user metadata using the techniques described below in further detail. The monitoring schema may also provide a single interface with fine-grain access control for viewing metadata based on role-based access control with limitless retention using different storage locations.
An assignment of a resource for a service to a compute node in a compute cluster is evaluated. The evaluating of the assignment includes determining one or more capacity consumption metrics associated with compute capacity consumed by the resource and determining one or more available capacity metrics associated with the compute node. The one or more capacity consumption metrics are compared with the one or more available capacity metrics to determine whether the compute node has available capacity for the assignment of the resource. A determination whether to confirm the assignment of the resource to the compute node is made based on the evaluating.
Systems and methods are provided for classifying data. The systems and methods access an automatic classification profile comprising one or more conditions for triggering data classification and access a classification scope that identifies one or more tables to be classified. The systems and methods determine that a set of attributes of the one or more tables identified by the classification scope corresponds to the one or more conditions of the automatic classification profile. The systems and methods, in response to determining that the set of attributes of the one or more tables identified by the classification scope corresponds to the one or more conditions of the automatic classification profile, automatically classify data stored in one or more columns of the one or more tables.
The subject technology initiates a reinstallation process of a key-value storage device and locking the key-value storage device. The subject technology performs a bootstrap process for a blob manager and a blob worker. The subject technology performs a restoration process of a storage server. The subject technology applies a set of mutation logs to the storage server. The subject technology unlocks the key-value storage device and enabling network traffic for the key-value storage device.
G06F 11/14 - Détection ou correction d'erreur dans les données par redondance dans les opérations, p. ex. en utilisant différentes séquences d'opérations aboutissant au même résultat
A data platform that upgrades applications having containerized services across multiple consumer user accounts when the data platform receives a new version from a provider user. For each consumer account utilizing the application, the data platform performs a series of upgrade operations. The operations include identifying the relevant set of services linked to the application and executing an upgrade command for each service to transition to the new version. The data platform actively monitors the health and version status of each service, ensuring they meet the upgrade criteria. The upgrade is deemed successful and confirmed by the data platform once all services are verified to be healthy and aligned with the new version, thus ensuring a seamless and efficient upgrade experience.
Systems and methods for an organization-level account for an organization on a data platform, users of which can possess administrative or management privileges with respect to the organization and across one or more others accounts of the organization.
Row-level security (RLS) may provide fine-grained access control based on flexible, user-defined access policies to databases, tables, objects, and other data structures. A RLS policy may be an entity or object that defines rules for row access. A RLS policy may be decoupled or independent from any specific table. This allows more robust and flexible control. A RLS policy may then be attached to one or more tables. The RLS policy may include a Boolean-valued expression.
Some embodiments include receiving a first query directed towards a shared dataset, accessing a first set of data from the shared dataset to perform the one or more functions, determining that a row access policy is to be enforced in relation to the first query, and generating an output to the first query based on an execution of the one or more functions.
Disclosed are techniques for anomaly detection in time series data using an ML model. An untrained time series forecasting machine learning (ML) model may be provided as part of a class that includes an anomaly detection function, a features module, and a target transform module. In response to the class being invoked, an instance of the time series forecasting ML model may be trained using training time series data specified in the invocation of the class. The trained instance of the forecasting ML model may be persisted in an anomaly detection object along with instances of the anomaly detection function, the features module, and the target transform module. In response to receiving a call to the anomaly detection object, performing anomaly detection on time series data specified in the call using at least the trained instance of the forecasting ML model and the instance of the anomaly detection function.
Disclosed is a method of detecting anomalies in time series data. The method includes computing a first bound for a first window of the time series a second bound for a second window of the time series, wherein the second window includes more samples of the time series data. The method also includes generating a first outlier status that indicates whether a current value of the time series data exceeds the first bound, and generating a second outlier status that indicates whether the current value of the time series data exceeds the second bound. The method also includes determining, by a processing device, whether an anomaly is detected in the time series data based on values of the first outlier status and the second outlier status. The method also includes generating an alert in response to determining that the anomaly is detected and sending the alert to a notification system.
H04L 41/0604 - Gestion des fautes, des événements, des alarmes ou des notifications en utilisant du filtrage, p. ex. la réduction de l’information en utilisant la priorité, les types d’éléments, la position ou le temps
H04L 41/16 - Dispositions pour la maintenance, l’administration ou la gestion des réseaux de commutation de données, p. ex. des réseaux de commutation de paquets en utilisant l'apprentissage automatique ou l'intelligence artificielle
68.
LARGE LANGUAGE MODEL-BASED COMMUNICATION CONTENT GENERATION
Various embodiments described herein provide for systems, methods, devices, instructions, and like for generating communication content using one or more large language models (LLMs). In particular, some embodiments provide a communication content generation system that generates content for a communication to a target organization using one or more LLMs and information regarding the target organization provided by an organization database, which can comprise curated organization-intelligence data.
Systems and methods are provided for processing a query with one or more predicates in a database system. The system and methods receive a query comprising one or more predicates. The systems and methods process metadata associated with a database comprising a plurality of files to identify a set of fully-matched files and a set of partially-matched files, the set of fully-matched files comprising a first group of files in which all rows of each file match each of the one or more predicates of the query, the set of partially-matched files comprising a second group of files having rows that possibly match the one or more predicates of the query. The system and methods perform, based on the query, one or more database operations on the set of fully matched files prior to processing the set of partially-matched files.
A network egress request is received from a container service within a cloud data platform. A cryptographically signed egress policy associated with the network egress request is received by a trusted service controller of the cloud data platform. The network egress request is validated against the cryptographically signed egress policy. Based on the validation, a determination of whether the network egress request complies with the cryptographically signed egress policy is established. Upon validation, the network egress request is granted or denied based on the determination.
Provided herein are systems and methods for hash-join broadcast decision making. For example, a method includes generating a query plan for a received query. The query plan includes a plurality of join operations with a plurality of hash-join-build (HJB) operations and a plurality of hash-join-probe (HJP) operations. A decision node of a plurality of decision nodes of the query plan is configured as a primary decision node. Build-side data information associated with build-side data and received from the plurality of HJB operations is decoded by the primary decision node. A data distribution method is determined by the primary decision node for each HJB operation of the plurality of HJB operations based on the build-side data information. The query plan is executed based on distributing the build-side data to the plurality of HJP operations using the data distribution method for each HJB operation of the plurality of HJB operations.
A data platform for executing containers is provided. In some examples, the data platform receives an application from an application package of a provider account, the application including a setup script and a manifest of a service. The data platform activates access roles based on the manifest and creates the service and a compute pool using the setup script and a specification file accessed from the application package using an access role. The service is executed in the compute pool, accessing objects of the application package and of the data platform using the access roles.
G06F 16/28 - Bases de données caractérisées par leurs modèles, p. ex. des modèles relationnels ou objet
G06F 21/57 - Certification ou préservation de plates-formes informatiques fiables, p. ex. démarrages ou arrêts sécurisés, suivis de version, contrôles de logiciel système, mises à jour sécurisées ou évaluation de vulnérabilité
73.
LIVE METRIC AUTO-SCALING LEVERAGING TELEMETRY SERVICES
Autoscaling techniques can optimize usage of computing resources in a data system while also quickly reacting to change in workloads. The computing resources are arranged in different clusters. Autoscaling can be partitioned into two separate, independent autoscaling phases: a slow autoscaler and a fast autoscaler.
Autoscaling techniques can optimize usage of computing resources in a data system while also quickly reacting to change in workloads. The computing resources are arranged in different clusters. Autoscaling can be partitioned into two separate, independent autoscaling phases: a slow autoscaler and a fast autoscaler.
The subject technology receives, by one or more hardware processors, a request to execute a user-defined function (UDF) within a sandbox process. The subject technology establishes a secure egress path for the UDF using an overlay network, where the overlay network includes a dedicated DNS resolver at a proxy service. The subject technology receives, from the UDF, a DNS request to resolve a hostname. The subject technology validates, by the proxy service, that the hostname is included in an allowed host list associated with the UDF. The subject technology resolves, by the dedicated DNS resolver, the hostname to an IP address using a UDP listener configured to handle DNS protocol traffic on a designated port of the proxy service. The subject technology enables the UDF to communicate with a host at the resolved IP address via the secure egress path.
The subject technology receives a first semi-structured object. The subject technology iterates through a list of fields specified by a target object type. The subject technology, for each field, determines whether a field with a same name is present in the first semi-structured object. The subject technology, in response to the field being found in the first semi-structured object, converts a value of the field to a target field type according to defined type conversion rules. The subject technology stores the converted value in a unified representation comprising a data structure that stores both structured and semi-structured data types. The subject technology processes a query using the unified representation.
Techniques for creating and modifying database entities, such as tables, tasks, etc., using declarative statements are described. Declarative statements specify a target state of the entity without specifying specific actions. The techniques described herein apply changes to the database entity atomically and incrementally.
A method includes retrieving, by at least one hardware processor in a database system, a database table. The database table includes a plurality of partitions. A plurality of batches is generated for the database table based on a file selection task of the database system. Each batch of the plurality of batches includes a partition subset of the plurality of partitions. A plurality of execution jobs is configured based on an execution management task of the database system. Each execution job of the plurality of execution jobs includes a batch subset of the plurality of batches, and the skew of batch sizes for the batch subset is below a threshold skew. Concurrent execution of the plurality of execution jobs is performed to cluster the partition subset associated with each of the plurality of execution jobs.
Techniques for continuous ingestion of files using custom file formats are described. A custom file format may include formats not natively supported by a data system. Unstructured files (e.g., images) may also be considered custom file formats. A custom file format may be set using a user defined table function and scanner options.
Various embodiments provide for managing differential privacy on a database system using one or more differential privacy policies and one or more differential privacy budgets associated with the one or more differential privacy policies.
Various embodiments provide for using one or more differential privacy domains on a database system to execute a differentially private query on the database system.
42 - Services scientifiques, technologiques et industriels, recherche et conception
Produits et services
Data migration services; Cloud computing featuring software for use in data migration, data backup, and data retrieval; Software as a service (SAAS) services featuring software for use in data migration, data backup, and data retrieval; Providing temporary use of non-downloadable software for data extraction, data integration, data management, data consolidation, data migration, data configuration, data unification and data loading; Platform as a service (PAAS) featuring computer software platforms for use by others in connection with data migration; Computer consulting services in the field of data management, data migration, data analysis, and data reporting
A low-code web application testing platform is provided. The low-code web application testing platform automates the testing process of web applications. The low-code web application testing platform executes a script that simulates the frontend of a web application, capturing output messages that detail the UI elements. The low-code web application testing platform then interprets these messages to construct a navigable structure that represents the application's UI. To emulate user interactions, the low-code web application testing platform performs test actions within this structure, subsequently rerunning the script with these interactions to capture additional output messages that reflect the application's response. The culmination of this process is the generation of a test report, which is based on the application's reaction to the emulated interactions, providing a comprehensive assessment of the application's functionality and user experience.
Embodiments of the present disclosure describe systems, methods, and computer program products for redacting sensitive data within a database. An example method can include receiving a data query referencing unredacted data of a database, wherein the data query that is received comprises a value identifying a type of sensitive data to be redacted from the unredacted data, responsive to the data query, executing, by a processing device, a redaction operation to identify candidate sensitive data that matches the type of sensitive data to be redacted within the unredacted data of the database, and returning a redacted data set in which the candidate sensitive data that is provided is based on an authentication level utilized for execution of the redaction operation.
Systems and methods are provided for controlling the deletion of data in a database system. The system receives input comprising a deletion criterion for a database system. The system applies the deletion criterion to a set of tables of the database system. The system determines that an individual portion of the set of tables satisfies the deletion criterion. In response to determining that the individual portion of the set of tables satisfies the deletion criterion, the system transfers the individual portion of the set of tables to a temporary storage system.
G06F 16/215 - Amélioration de la qualité des donnéesNettoyage des données, p. ex. déduplication, suppression des entrées non valides ou correction des erreurs typographiques
G06F 16/11 - Administration des systèmes de fichiers, p. ex. détails de l’archivage ou d’instantanés
G06F 16/16 - Opérations sur les fichiers ou les dossiers, p. ex. détails des interfaces utilisateur spécialement adaptées aux systèmes de fichiers
A method implementing a fault-tolerant data warehouse using availability zones includes allocating a plurality of processing units to a data warehouse, the processing units located in different availability zones, an availability zone comprising one or more data centers. The method further includes routing a query to a processing unit within the data warehouse, the query having a common session identifier with a query previously provided to the processing unit, the processing unit determined to be caching a data segment associated with a cloud storage resource independent of the plurality of processing units. The method further includes, as a result of monitoring a number of queries running at an input degree of parallelism, determining that the processing capacity of the processing units has reached a threshold; and changing a total number of processing units using the input degree of parallelism and the number of queries.
H04L 67/1097 - Protocoles dans lesquels une application est distribuée parmi les nœuds du réseau pour le stockage distribué de données dans des réseaux, p. ex. dispositions de transport pour le système de fichiers réseau [NFS], réseaux de stockage [SAN] ou stockage en réseau [NAS]
G06F 9/50 - Allocation de ressources, p. ex. de l'unité centrale de traitement [UCT]
G06F 16/28 - Bases de données caractérisées par leurs modèles, p. ex. des modèles relationnels ou objet
H04L 41/0896 - Gestion de la bande passante ou de la capacité des réseaux, c.-à-d. augmentation ou diminution automatique des capacités
H04L 41/5025 - Pratiques de respect de l’accord du niveau de service en réagissant de manière proactive aux changements de qualité du service, p. ex. par reconfiguration après dégradation ou mise à niveau de la qualité du service
H04L 43/0817 - Surveillance ou test en fonction de métriques spécifiques, p. ex. la qualité du service [QoS], la consommation d’énergie ou les paramètres environnementaux en vérifiant la disponibilité en vérifiant le fonctionnement
H04L 67/1008 - Sélection du serveur pour la répartition de charge basée sur les paramètres des serveurs, p. ex. la mémoire disponible ou la charge de travail
88.
Distributed in-database vectorized operations using user defined table functions
The subject technology determines a set of shards of rows from a data set based on a number of rows and a number of execution nodes to execute a request for determining a correlation. For each shard from the set of shards, the subject technology sends a particular user defined table function (UDTF), including a particular shard of rows, to a different execution node to perform a set of operations for determining the correlation. The subject technology provides a set of output values of each particular UDTF corresponding to each shard from the set of shards in a second UDTF. The subject technology sends the second UDTF to a particular execution node to perform an aggregate operation using the set of output values of each particular UDTF. The subject technology receives a value of the correlation from the particular execution node based on the aggregate operation.
G06F 16/20 - Recherche d’informationsStructures de bases de données à cet effetStructures de systèmes de fichiers à cet effet de données structurées, p. ex. de données relationnelles
G06F 16/22 - IndexationStructures de données à cet effetStructures de stockage
89.
ENHANCED SEARCHING USING FINE-TUNED MACHINE LEARNING MODELS
An advanced search system leverages a pre-trained large language model to enhance user query responses. The system, equipped with hardware processors, a search query via an interface and accesses a pre-trained large language model designed to respond to the search query. The system fine-tunes the model to generate a task-specific generative model. The system employs the task-specific generative model to generate a search result to the search query and analyzes the search result based on a performance metric associated with the task-specific generative model. The system refines the task-specific generative model based on the analyzing of the search result.
G06F 16/9538 - Présentation des résultats des requêtes
G06F 16/955 - Recherche dans le Web utilisant des identifiants d’information, p. ex. des localisateurs uniformisés de ressources [uniform resource locators - URL]
Techniques for generating and storing application telemetry data are described. An example method includes generating, by an application, a unit of telemetry data comprising metrics related to a runtime state of the application. The method also includes generating a character string comprising the metrics. The method also includes writing, by a processing device executing the application, a zero-byte file to a storage system using the character string as a file name of the zero-byte file.
A data platform having an anti-abuse analysis pipeline is provided. The anti-abuse analysis pipeline detects an image referenced in an application package and schedules an application scan to scan application source files and generate application scan results. The anti-abuse analysis pipeline also schedules an image scan to scan the image and generate image scan results. The anti-abuse analysis pipeline extracts application source files from the application package and executes an application scan using the extracted application source files to generate application scan results. The anti-abuse analysis pipeline extracts artifacts from the referenced image and executes an image scan using the artifacts to generate image scan results. The anti-abuse analysis pipeline determines a completion of the application scan and the image scan and generates a scan result using the application scan results and image scan results.
G06F 21/57 - Certification ou préservation de plates-formes informatiques fiables, p. ex. démarrages ou arrêts sécurisés, suivis de version, contrôles de logiciel système, mises à jour sécurisées ou évaluation de vulnérabilité
G06F 16/538 - Présentation des résultats des requêtes
Techniques for providing adaptive warehouses in a multi-tenant data system are described. The workloads for the account can be multiplexed in the adaptive warehouse environment. Warehouse endpoints in a warehouse layer can be defined for an account in the multi-tenant data system. A compute layer for the account can be divided into workload regions, where each workload region corresponds to a different workload type.
G06F 9/455 - ÉmulationInterprétationSimulation de logiciel, p. ex. virtualisation ou émulation des moteurs d’exécution d’applications ou de systèmes d’exploitation
G06F 16/28 - Bases de données caractérisées par leurs modèles, p. ex. des modèles relationnels ou objet
93.
Secure database environment with third-party verification
Systems and methods are provided for creating a secure database execution environment. The system generates, by a database system executing on a secure enclave, attestation information. The system transmits the attestation information to a remote entity. The system obtains, by the database system executing on the secure enclave, one or more encryption keys in response to the remote entity authenticating the attestation information. The system performs, by the database system executing on the secure enclave, one or more database operations on encrypted data stored on the database system using the one or more encryption keys.
G06F 21/62 - Protection de l’accès à des données via une plate-forme, p. ex. par clés ou règles de contrôle de l’accès
G06F 21/57 - Certification ou préservation de plates-formes informatiques fiables, p. ex. démarrages ou arrêts sécurisés, suivis de version, contrôles de logiciel système, mises à jour sécurisées ou évaluation de vulnérabilité
Systems and methods are provided for generating personalized service disruption notifications. The system allocates resources of a database system to a plurality of entities, the resources of the database system being distributed in a cloud environment and analyzes a plurality of signals on the database system. The system, in response to analyzing the plurality of signals, detects a likelihood of a service availability disruption on the database system for a first entity of the plurality of entities. The system notifies the first entity of the service availability disruption in response to detecting the likelihood of the service availability disruption.
A system includes at least one hardware processor and at least one memory storing instructions that cause the at least one hardware processor to perform operations. The operations include encoding data stored at a first account of a user. The first account is configured at a primary deployment of a database system. The encoding is based on a first encryption key. The operations include detecting a network intrusion event associated with the first account of the user. The operations include performing a failover of the first account to a second account of the user based on the detecting of the network intrusion event. The failover grants the user access to a replicated version of the data based at least on a second encryption key.
Techniques described herein can allow users to share cached results of an original query with other users while protecting sensitive information. The techniques described herein can check whether the other users have access to the underlying data queried before allowing those users to see the stored query results. That is, the system may perform privilege checks on the shared users before giving them access to the stored query results but without having to re-run the original query.
G06F 16/248 - Présentation des résultats de requêtes
H04L 9/32 - Dispositions pour les communications secrètes ou protégéesProtocoles réseaux de sécurité comprenant des moyens pour vérifier l'identité ou l'autorisation d'un utilisateur du système
The subject technology receives a query. The subject technology generates a set of query blocks based on parsing the query. The subject technology stores query block metadata for each query block from the set of query blocks. The subject technology restores a set of logical query block boundaries. The subject technology performs a hash-based query block matching. The subject technology generates, after performing the hash-based query block matching, a final query plan. In an example, if a (same) query is executed again the same logical query blocks would produce the same metadata (e.g., identifier, name and hash). This metadata can be used to match logical query blocks between multiple executions of the same query.
Techniques for sharing application packages in a multi-tenant database system are described. A provider account can create and share an application package with provider key information. A consumer application can be installed in a consumer account based on the application package and consumer account can be registered using an onboard service user and the provider key information. A unique consumer service user can be registered in the provider account corresponding to the consumer account.
H04L 9/30 - Clé publique, c.-à-d. l'algorithme de chiffrement étant impossible à inverser par ordinateur et les clés de chiffrement des utilisateurs n'exigeant pas le secret
A search engine of a data exchange may receive from a user, a query comprising a set of search terms, and retrieve a set of data listings based on the search terms of the query. A data ranking module of the search engine may analyze each of the set of retrieved data listings to determine, for each of the set of retrieved data listings, a set of listing-specific signals and a set of external signals. Listing-specific signals may correspond to attributes or characteristics of data/content within a data listing, while external signals may correspond to a measure of activity in the data exchange that involves a data listing. Based on the listing-specific signals and the external signals analyzed for each retrieved data listing, the set of retrieved data listings may be ordered and presented to the user.
The subject technology receives first data. The subject technology transforms first data to a wide format, the wide format comprising a second table. The subject technology splits a set of rows into a set of shards of rows. The subject technology, for each shard from the set of shards, sends a particular user defined table function (UDTF), including a particular shard of rows, to a different execution node to perform a set of operations for determining a rolling correlation over a window size. The subject technology provides a set of output values of each particular UDTF corresponding to each shard from the set of shards in a second UDTF. The subject technology sends the second UDTF to a particular execution node to perform an aggregate operation. The subject technology receives a value of the rolling correlation from the particular execution node based on the aggregate operation.