Databricks, Inc.

United States of America

1-100 of 128 for Databricks, Inc.

Sort by

Query


Aggregations
IP Type
Patent	114
Trademark	14

Jurisdiction
United States	116
World	9
Canada	2
Europe	1

Date
New (last 4 weeks)	6
2025 April (MTD)	4
2025 March	7
2025 February	4
2025 January	7
2024 December	1
2025 (YTD)	22
2024	35
2023	16
2022	13
2021	10
2020	14
Before 2020	18
See more See less
IPC Class
G06F 16/22 - IndexingData structures thereforStorage structures	32
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models	19
G06F 16/2453 - Query optimisation	18
G06F 16/2455 - Query execution	16
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation	14
G06F 16/23 - Updating	12
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules	11
G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor	10
G06F 16/14 - Details of searching files based on file metadata	9
G06F 16/21 - Design, administration or maintenance of databases	9
G06F 16/25 - Integrating or interfacing systems involving database management systems	9
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]	9
G06N 20/00 - Machine learning	8
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance	7
G06F 17/30 - Information retrieval; Database structures therefor	7
G06F 11/30 - Monitoring	5
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems	5
G06F 16/242 - Query formulation	5
G06F 16/24 - Querying	4
G06F 16/2458 - Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries	4
G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor	4
G06F 16/901 - IndexingData structures thereforStorage structures	4
G06F 9/54 - Interprogram communication	4
G06N 5/022 - Knowledge engineeringKnowledge acquisition	4
G06F 16/245 - Query processing	3
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]	3
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines	3
G06F 9/46 - Multiprogramming arrangements	3
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt	3
G06F 11/36 - Prevention of errors by analysis, debugging or testing of software	2
See more See less
NICE Class
42 - Scientific, technological and industrial services, research and design	13
09 - Scientific and electric apparatus and instruments	10

Status
Pending	33
Registered / In Force	95

1 2 Next Page

1. DATA ASSET SHARING BETWEEN ACCOUNTS AT A DATA PROCESSING SERVICE USING CLOUD TOKENS

Application Number	18491500
Status	Pending
Filing Date	2023-10-20
First Publication Date	2025-04-24
Owner	Databricks, Inc. (USA)
Inventor	Sun, Xiaotong Chakankar, Abhijit Chandra, Ramesh

Abstract

A data processing service receives indication that a recipient will request access to data assets of a provider and provides a request for credentials from a recipient governance module. The recipient governance module stores a recipient metastore including an object for a provider metastore. In response to determining that the assets are associated with the provider metastore, the service provides a request for credentials to a provider governance module. The provider governance module stores the provider metastore describing data assets of the provider and permissions for accessing data assets. The provider metastore includes a recipient object attached to the data assets with an identifier for the recipient metastore. In response to verifying that the recipient was provided access to the data assets, the service provides a token to the recipient governance module. The service then provides the token to a computing resource to provide access to the data assets.

IPC Classes ?

G06F 21/31 - User authentication

2. DATA SHARING FOR NETWORK CONNECTED SYSTEMS

Application Number	18958728
Status	Pending
Filing Date	2024-11-25
First Publication Date	2025-04-24
Owner	Databricks, Inc. (USA)
Inventor	Zaharia, Matei Zhu, Shixiong Sun, Xiaotong Chandra, Ramesh Armbrust, Michael Paul Ghodsi, Ali

Abstract

The present application discloses a method, system, and computer system for providing access to data. The method includes receiving, by a data manager service from a data requesting service, a request using an identifier for a high-level data object to access a set of data associated with the high-level data object, determining, by the data manager service, low-level data object(s) corresponding to the set of data based on the identifier for the high-level data object, determining whether a user associated with the request has permission to access at least a subset of the low-level data object(s), and in response to determining that the user associated has permission to access the at least the subset of the low-level data object(s), generating, by the data manager service, a uniform resource locator (URL) via which the at least the subset of the one or more low-level data objects is accessible by the user.

IPC Classes ?

G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 21/60 - Protecting data

3. AUTO MAINTENANCE FOR DATA TABLES IN CLOUD STORAGE

Application Number	18986345
Status	Pending
Filing Date	2024-12-18
First Publication Date	2025-04-24
Owner	Databricks, Inc. (USA)
Inventor	Prabhakaran, Vijayan Raja, Himanshu Potharaju, Rahul Bhanoori, Naga Raju Ma, Lin Parangi Sharabhalingappa, Rajesh Liang, Jintian Schuermann, Zachary Vaughn Ting, Kam Cheung

Abstract

Disclosed is a configuration for managing the organization of data tables in cloud-based storage. The configuration receives metrics for data processing operations on the data table. Metrics include at least one of a size of the data table, a size of each file in the data table, and metadata describing the data table. The configuration automatically executes a cost-benefit analysis based on the one or more metrics for each candidate maintenance operation in a plurality of candidate maintenance operations. The configuration automatically selects a maintenance operation from the candidate maintenance operations to automate based on the cost-benefit analysis of the one or more candidate maintenance operations. The selected maintenance operation is automated and scheduled on the data table.

IPC Classes ?

G06F 16/21 - Design, administration or maintenance of databases
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures

4. USING LLM FUNCTIONS TO EVALUATE AND COMPARE LARGE TEXT OUTPUTS OF LLMS

Application Number	18518155
Status	Pending
Filing Date	2023-11-22
First Publication Date	2025-04-17
Owner	Databricks, Inc. (USA)
Inventor	Gupta, Ridhima Kannan, Prithvi Sheth, Sunish Sohil Uhlenhuth, Kasey Zub, Hubert Zumar, Corey

Abstract

A method for evaluating textual output of one or more machine-learned language models is presented. The method includes receiving, from a user of a client device, a first prompt for input to one or more machine-learned language models, providing the first prompt to the one or more models for execution, and receiving a set of generated responses to the first prompt from the one or more models. The method further includes generating a user interface (UI) on the client device displaying the first prompt and generated responses as a table user interface element. The method applies a selected evaluation function to the generated response to evaluate the response with respect to an evaluation objective and identifies words that influence the evaluation. The method generates one or more UI elements on the UI to display the results of the evaluation for the generated responses.

IPC Classes ?

G06F 40/40 - Processing or translation of natural language
G06F 40/103 - Formatting, i.e. changing of presentation of documents
G06F 40/30 - Semantic analysis

5. CONCURRENT OPTIMISTIC TRANSACTIONS FOR TABLES WITH DELETION VECTORS

Application Number	18928982
Status	Pending
Filing Date	2024-10-28
First Publication Date	2025-03-27
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Stavrakakis, Christos

Abstract

A disclosed configuration receives a first indication that a first transaction is committed to update a first subset of records in a data table at a first version to generate a second version of the data table and receiving a second indication to commit a second transaction to update a second subset of records in a data file of the data table at the first version. The configuration determines a logical prerequisite based on whether the first subset of records changes content of one or more records in the second subset of records and determining a physical prerequisite on whether the second subset of records corresponds to respective data records in data files of the second version of the data table. The configuration commits the second transaction to generate a third version of the data table by updating elements of the deletion vector if the prerequisites are satisfied.

IPC Classes ?

G06F 16/23 - Updating

6. Clean room generation for data collaboration and executing clean room task in data processing pipeline

Application Number	18474708
Grant Number	12260003
Status	In Force
Filing Date	2023-09-26
First Publication Date	2025-03-25
Grant Date	2025-03-25
Owner	Databricks, Inc. (USA)
Inventor	Chau, William Chakankar, Abhijit Mahoney, Stephen Michael Morris, Daniel Seth Weiss, Itai Shlomo

Abstract

A data processing service facilitates the creation and processing of data processing pipelines that process data processing jobs defined with respect to a set of tasks in a sequence and with data dependencies associated with each separate task such that the output from one task is used as input for a subsequent task. In various embodiments, the set of tasks include at least one cleanroom task that is executed in a cleanroom station and at least one non-cleanroom task executed in an execution environment of a user where each task is configured to read one or more input datasets and transform the one or more input datasets into one or more output datasets.

IPC Classes ?

G06F 21/00 - Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules

7. RESOURCE MANAGEMENT WITH INTERMEDIARY NODE IN KUBERNETES ENVIRONMENT

Application Number	18368919
Status	Pending
Filing Date	2023-09-15
First Publication Date	2025-03-20
Owner	Databricks, Inc. (USA)
Inventor	Davidson, Aaron Daniel Garnier, Thomas Guo, Lin He, Zhe Li, Manlin Liu, Yang Wang, Feng Zhang, Hong Zhu, Weirong

Abstract

A resource management configuration may receive an API request from an API server. The API request specifies task information from a plurality of tenants. The configuration transmits status information of a plurality of VMs to the API server to assign tasks to one or more VMs based on the task information and the status information. Tasks assigned to a VM of the plurality of VMs are for one tenant of the plurality of tenants. The configuration configures on an untrusted network, network security groups for managing communications of tenants such that a network security group configured for a tenant permits communications between VMs assigned to the same tenant but prevents communications between VMs assigned to different tenants. The configuration pins each assigned VM of the one or more assigned VMs to perform the task based on the task information of the corresponding tenant.

IPC Classes ?

G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines
G06F 9/54 - Interprogram communication

8. STRUCTURED CLUSTER EXECUTION FOR DATA STREAMS

Application Number	18745847
Status	Pending
Filing Date	2024-06-17
First Publication Date	2025-03-13
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Das, Tathagata Xin, Shi Zaharia, Matei

Abstract

A system for executing a streaming query includes an interface and a processor. The interface is configured to receive a logical query plan. The processor is configured to determine a physical query plan based at least in part on the logical query plan. The physical query plan comprises an ordered set of operators. Each operator of the ordered set of operators comprises an operator input mode and an operator output mode. The processor is further configured to execute the physical query plan using the operator input mode and the operator output mode for each operator of the query.

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 16/2455 - Query execution

9. K-D Tree Balanced Splitting

Application Number	18772758
Status	Pending
Filing Date	2024-07-15
First Publication Date	2025-03-13
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Jain, Prakhar

Abstract

A system for clustering data into corresponding files comprises one or more processors and a memory. The one or more processors is/are configured to: 1) determine to cluster a set of data into a set of files; 2) determine a set of split points in a corresponding set of dimensions of the set of data to determine the set of files, wherein each file of the set of files has an approximate target size; and 3) store one or more items of the set of data into a corresponding file of the set of files based at least in part on the set of split points. The memory is coupled to the one or more processors and configured to provide the processor with instructions.

IPC Classes ?

G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

10. Reducing cluster start up time

Application Number	17514988
Grant Number	12248818
Status	In Force
Filing Date	2021-10-29
First Publication Date	2025-03-11
Grant Date	2025-03-11
Owner	Databricks, Inc. (USA)
Inventor	Mao, Yandong Davidson, Aaron Daniel

Abstract

The present application discloses a method, system, and computer system for starting up and maintaining a cluster in a warmed up state, and/or allocating clusters from a warmed up state. The method includes instantiating a set of virtual machines, wherein instantiating the set of virtual machines includes setting a temporary security credential for each virtual machine of the set of virtual machines, receiving a virtual machine allocation request associated with a workspace, a customer, or a tenant, in response to the virtual machine allocation request: allocating a virtual machine, wherein allocating the virtual machine comprises replacing the temporary security credential with a security credential associated with the workspace, the customer, or the tenant.

IPC Classes ?

G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 21/45 - Structures or tools for the administration of authentication

11. Data lineage tracking

Application Number	18162562
Grant Number	12242441
Status	In Force
Filing Date	2023-01-31
First Publication Date	2025-03-04
Grant Date	2025-03-04
Owner	Databricks, Inc. (USA)
Inventor	Feng, Tao Sun, Menglei Wang, Zhuoying

Abstract

The present application discloses a method, system, and computer system for managing lineage data for data entities. The method includes generating lineage data, wherein generating the lineage data, and storing and indexing, in a data structure, the lineage data in association with the selected data entity. The generating the lineage data includes selecting a selected data entity, obtaining a query tree that was used to generate the selected data entity, and determining lineage data for the selected data entity based at least in part on the query tree.

IPC Classes ?

G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
G06F 16/215 - Improving data qualityData cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/23 - Updating
G06F 16/906 - ClusteringClassification
G06F 17/18 - Complex mathematical operations for evaluating statistical data

12. Automated Processing of Multiple Prediction Generation Including Model Tuning

Application Number	18738025
Status	Pending
Filing Date	2024-06-09
First Publication Date	2025-02-20
Owner	Databricks, Inc. (USA)
Inventor	Wilson, Benjamin Thomas Zumar, Corey

Abstract

The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key-value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.

IPC Classes ?

G06N 20/00 - Machine learning
G06F 18/20 - Analysing
G06F 18/2132 - Feature extraction, e.g. by transforming the feature spaceSummarisationMappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis

13. STATE REBALANCING IN STRUCTURED STREAMING

Application Number	18822023
Status	Pending
Filing Date	2024-08-30
First Publication Date	2025-02-20
Owner	Databricks, Inc. (USA)
Inventor	Balikov, Alexander Das, Tathagata Ramasamy, Karthikeyan

Abstract

A data processing service performs a rebalancing process for rebalancing stateful tasks on a cluster computing system. In one instance, the method for rebalancing stateful tasks is performed such that the per-operator partitions are spread across available executors of a cluster of the cluster computing system with respect to one or more statistics of the tasks. In one instance, the method for rebalancing stateful tasks is also performed such that the total number of stateful tasks are balanced per executor as long as this rebalancing does not imbalance the per-operator placements. In this way, the processing of stateful tasks can be spread across multiple executors in a relatively uniform manner, even though there may be an upfront cost of breaking the local caching on an executor.

IPC Classes ?

G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 16/2455 - Query execution

14. Checkpoint and restore based startup of executor nodes of a distributed computing engine for processing queries

Application Number	18412438
Grant Number	12229137
Status	In Force
Filing Date	2024-01-12
First Publication Date	2025-02-18
Grant Date	2025-02-18
Owner	Databricks, Inc. (USA)
Inventor	Ge, Xinyang Ao, Lixiang Jing, Haonan Davidson, Aaron Daniel

Abstract

A system performs efficient startup of executors of a distributed computing engine used for processing queries, for example, database queries. The system starts an executor node and processes a set of queries using the executor node to warm up the executor node. The system performs a checkpoint of the warmed-up executor node to create an image. The image is restored in the target executor nodes. The system may store a checkpoint image for each configuration of an executor node. The configuration is determined based on various factors including the hardware of the executor node, memory allocation of the processes, and so on. The user or restore based on checkpoint images improves efficiency of execution of the startup of executor nodes.

IPC Classes ?

G06F 16/2453 - Query optimisation

15. Clustering key selection based on machine-learned key selection models for data processing service

Application Number	18501830
Grant Number	12229169
Status	In Force
Filing Date	2023-11-03
First Publication Date	2025-02-18
Grant Date	2025-02-18
Owner	Databricks, Inc. (USA)
Inventor	Kim, Terry Ma, Lin Mahadev, Rahul Shivu Potharaju, Rahul

Abstract

The disclosed configurations provide a method (and/or a computer-readable medium or system) for determining, from a table schema describing keys of a data table, one or more clustering keys that can be used to cluster data files of a data table. The method includes generating features for the data table, generating tokens from the features, generating a prediction for each token by applying to the token a machine-learned transformer model trained to predict a likelihood that the key associated with the token is a clustering key for the data table, determining clustering keys based on the predictions, and clustering data records of the data table into data files based on key-values for the clustering keys.

IPC Classes ?

G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
G06F 16/21 - Design, administration or maintenance of databases
G06F 16/22 - IndexingData structures thereforStorage structures

16. MESSAGING DEDPULICATION IN PUBLISH / SUBSCRIBE SYSTEM

Application Number	18224981
Status	Pending
Filing Date	2023-07-21
First Publication Date	2025-01-23
Owner	Databricks, Inc. (USA)
Inventor	Anand, Pranav Gattu, Praveen Shrigondekar, Anish Wang, Huanli

Abstract

A device for using message identifiers for Publish/subscribe messaging deduplication is described. The system may fetch one or more sets of data records from a data source, and each data record is associated with a message identifier. The system may store the one or more sets of data records in a data file, which is associated with a metadata comprising the message identifier, a file path and a row number for each data record. The system may determine whether one or more of the data records are duplicated based on the associated message identifiers. In response to determining that the one or more data records are duplicated, the system may generate a second metadata comprising the file paths and row numbers associated with the duplicated data records.

IPC Classes ?

G06F 16/174 - Redundancy elimination performed by the file system
G06F 16/14 - Details of searching files based on file metadata
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems

17. MODEL ML REGISTRY AND MODEL SERVING

Application Number	18885322
Status	Pending
Filing Date	2024-09-13
First Publication Date	2025-01-16
Owner	Databricks, Inc. (USA)
Inventor	Davidson, Aaron Daniel Mewald, Clemens Nykodym, Tomas

Abstract

A system includes an interface, a processor, and a memory. The interface is configured to receive a version of a model from a model registry. The processor is configured to store the version of the model, start a process running the version of the model, and update a proxy with version information associated with the version of the model, wherein the updated proxy indicates to redirect an indication to invoke the version of the model to the process. The memory is coupled to the processor and configured to provide the processor with instructions.

IPC Classes ?

G06F 16/21 - Design, administration or maintenance of databases
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06N 5/022 - Knowledge engineeringKnowledge acquisition

18. Clean room generation for data collaboration

Application Number	18473992
Grant Number	12197400
Status	In Force
Filing Date	2023-09-25
First Publication Date	2025-01-14
Grant Date	2025-01-14
Owner	Databricks, Inc. (USA)
Inventor	Chau, William Chakankar, Abhijit Mahoney, Stephen Michael Morris, Daniel Seth Weiss, Itai Shlomo

Abstract

A data processing service receives a request from a first collaborator to create a clean room for data sharing collaboration with at least a second collaborator. In response, the data processing service creates an execution environment separate from the data environment of the first collaborator and the second collaborator. The first and second collaborators can then add content into the clean room in the form of data tables and executable notebooks. Approval from each collaborator is required before a notebook can be executed using any data table shared into the clean room. Upon receiving notebook approval from each collaborator, the data processing service creates a notebook job to execute the notebook on one or more cluster computing resources of the data processing service to generate an output.

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/21 - Design, administration or maintenance of databases

19. Efficient Merging of Tabular Data with Post-Processing Compaction

Application Number	18769269
Status	Pending
Filing Date	2024-07-10
First Publication Date	2025-01-09
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Das, Tathagata Kroll, Lars Cui, Yijia Sompolski, Juliusz Van Bussel, Tom Jain, Prakhar

Abstract

A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. The set of processed files has less files than the set of resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

20. DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES

Application Number	18218410
Status	Pending
Filing Date	2023-07-05
First Publication Date	2025-01-09
Owner	Databricks, Inc. (USA)
Inventor	Jain, Prakhar Johnson, Frederick Ryan Kim, Terry Prabhakaran, Vijayan Samwel, Bart

Abstract

A data processing service generates a data classifier tree for managing data files of a data table. The data classifier tree may be configured as a KD-classifier tree and includes a plurality of nodes and edges. A node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. A node of the data classifier tree may be associated with one or more data files assigned to the node. The data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. The data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

IPC Classes ?

G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/13 - File access structures, e.g. distributed indices

21. DATA FILE CLUSTERING WITH KD-EPSILON TREES

Application Number	18218766
Status	Pending
Filing Date	2023-07-06
First Publication Date	2025-01-09
Owner	Databricks, Inc. (USA)
Inventor	Jain, Prakhar Johnson, Frederick Ryan Samwel, Bart

Abstract

A data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. The data tree is configured as a KD-epsilon tree and includes a plurality of nodes and edges. A node of the data tree may represent a splitting condition with respect to key-values for a respective key. A leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. A parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. By using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

IPC Classes ?

G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/2453 - Query optimisation
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

22. Data Retrieval Using Distributed Workers in a Large-Scale Data Access System

Application Number	18771892
Status	Pending
Filing Date	2024-07-12
First Publication Date	2025-01-02
Owner	DATABRICKS, INC. (USA)
Inventor	Khurana, Amandeep Li, Nong

Abstract

Disclosed herein provides enhancements for operating a data access application service executing on a data access server system and an external computing system. In the data access server system, a request is received from a client device executing at least one of multiple application services for a dataset from one or more of multiple storage systems. In the data access server system, a data retrieval instruction is generated for the client device to access the dataset from the one or more of the multiple storage systems. The data retrieval instruction comprises task descriptions and a temporary credential. The data retrieval instruction is transferred to the external computing system via the client device and the requested dataset is retrieved and deployed based on the task descriptions and the temporary credential from the one or more of the multiple storage systems.

IPC Classes ?

G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 9/54 - Interprogram communication
G06F 16/2455 - Query execution
G06F 16/25 - Integrating or interfacing systems involving database management systems
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

23. Data sharing for network connected systems

Application Number	18162353
Grant Number	12182292
Status	In Force
Filing Date	2023-01-31
First Publication Date	2024-12-31
Grant Date	2024-12-31
Owner	Databricks, Inc. (USA)
Inventor	Zaharia, Matei Zhu, Shixiong Sun, Xiaotong Chandra, Ramesh Armbrust, Michael Paul Ghodsi, Ali

Abstract

IPC Classes ?

G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 21/00 - Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
G06F 21/60 - Protecting data

24. FEATURE FUNCTION BASED COMPUTATION OF ON-DEMAND FEATURES OF MACHINE LEARNING MODELS

Application Number	18206460
Status	Pending
Filing Date	2023-06-06
First Publication Date	2024-12-12
Owner	Databricks, Inc. (USA)
Inventor	Zaharia, Matei Singh, Avesh Parkhe, Mani Lukiyanov, Maxim Meng, Xiangrui Talati, Aakrati Liang, Chenen Uhlenhuth, Kasey

Abstract

A system performs training and execution of machine learning models that use on-demand features using feature functions. The system receives commands for registering metadata associated with a machine learning model. The machine learning model may process a set of features including on-demand features as well as other features such as batch features. The system executes the command by storing an association between the machine learning model and the feature functions associated with any on-demand features processed by the machine learning model. The feature functions are executed using an end point of a data asset service. The use of the data asset service for invoking the feature functions ensures that the same set of instructions is executed during model training and model inferencing, thereby avoiding model skew.

IPC Classes ?

G06N 20/00 - Machine learning

25. Fetching Query Results Through Cloud Object Stores

Application Number	18614380
Status	Pending
Filing Date	2024-03-22
First Publication Date	2024-11-28
Owner	Databricks, Inc. (USA)
Inventor	Ghit, Bogdan Ionut Sompolski, Juliusz Xin, Shi Samwel, Bart

Abstract

The system is configured to: 1) receive a client request; 2) determine executor(s) to generate a response to the user request; 3) provide each of the executor(s) with an indication; 4) receive for each indication a response including an output of either a cloud output or an in-line output to generate a group of in-line outputs and a group of cloud outputs; 5) determine whether the group of in-line outputs comprises all outputs; and 6) in response to the group of in-line outputs not comprising all the outputs for the client request: a) convert the group of in-line outputs to a converted group of cloud outputs; b) generate metadata for the converted group of cloud outputs and the group of cloud outputs; and c) provide response to the client request including the metadata for the converted group of cloud outputs and the group of cloud outputs.

IPC Classes ?

G06F 16/2458 - Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/242 - Query formulation
G06F 16/25 - Integrating or interfacing systems involving database management systems

26. Hash based rollup with passthrough

Application Number	18162093
Grant Number	12153558
Status	In Force
Filing Date	2023-01-31
First Publication Date	2024-11-26
Grant Date	2024-11-26
Owner	Databricks, Inc. (USA)
Inventor	Behm, Alexander Dave, Ankur

Abstract

A system includes a plurality of computing units. A first computing unit of the plurality of computing units comprises: a communication interface configured to receive an indication to roll up data in a data table; and a processor coupled to the communication interface and configured to: build a preaggregation hash table based at least in part on a set of columns and the data table by aggregating input rows of the data table; for each preaggregated hash table entry of the preaggregated hash table: provide the preaggregated hash table entry to a second computing unit of the plurality of computing units based at least in part on a distribution hash value; receive a set of received entries from computing units of the plurality of computing units; and build an aggregation hash table based at least in part on the set of received entries by aggregating the set of received entries.

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/13 - File access structures, e.g. distributed indices
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/242 - Query formulation
G06F 16/2455 - Query execution
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

27. Data sharing for network connected systems

Application Number	17733485
Grant Number	12147555
Status	In Force
Filing Date	2022-04-29
First Publication Date	2024-11-19
Grant Date	2024-11-19
Owner	Databricks, Inc. (USA)
Inventor	Zaharia, Matei Zhu, Shixiong Sun, Xiaotong Chandra, Ramesh Armbrust, Michael Paul Ghodsi, Ali

Abstract

IPC Classes ?

G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 21/00 - Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
G06F 21/60 - Protecting data

28. Auto maintenance for data tables in cloud storage

Application Number	18144647
Grant Number	12204510
Status	In Force
Filing Date	2023-05-08
First Publication Date	2024-11-14
Grant Date	2025-01-21
Owner	Databricks, Inc. (USA)
Inventor	Prabhakaran, Vijayan Raja, Himanshu Potharaju, Rahul Bhanoori, Naga Raju Ma, Lin Parangi Sharabhalingappa, Rajesh Liang, Jintian Schuermann, Zachary Vaughn Ting, Kam Cheung

Abstract

IPC Classes ?

G06F 16/21 - Design, administration or maintenance of databases
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures

29. Short query prioritization for data processing service

Application Number	18140323
Grant Number	12210521
Status	In Force
Filing Date	2023-04-27
First Publication Date	2024-10-31
Grant Date	2025-01-28
Owner	Databricks, Inc. (USA)
Inventor	Gudesa, Venkata Sai Akhil Van Hövell Tot Westerflier, Herman Rudolf Petrus Catharina Nakandala, Supun Chathuranga

Abstract

A cluster computing system maintains a first set of queues for short queries and a set second set for longer queries. The first set is allocated a majority of the cluster's processing resources and processes queries on a first in first out basis. The second set is allocated a minority of the cluster's processing resources which are shared among queries in the second set. Accordingly, the system assigns each query to the first set of queues for a fixed amount of resource time. While a query is processing, the system monitors the query's resource time and reassigns the query to the second set of queues if the query has not completed within the allotted amount of resource time. Thus, short queries receive the necessary resources to complete quickly without getting stuck behind longer queries while ensuring that longer queries continue to make progress.

IPC Classes ?

G06F 16/24 - Querying
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/2453 - Query optimisation
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

30. Retrieval and caching of object metadata across data sources and storage systems

Application Number	18135078
Grant Number	12204523
Status	In Force
Filing Date	2023-04-14
First Publication Date	2024-10-17
Grant Date	2025-01-21
Owner	Databricks, Inc. (USA)
Inventor	Li, Zhaoxing Singh, Rayman Preet Efeoglu, Fuat Can Tenedorio, Daniel Cai, Sarah

Abstract

A system for retrieving and caching metadata from a remote data source is described. The system may receive a request from a client device. The request is to perform a query operation on a set of data objects stored in the remote data source. The system may access a metadata cache storing metadata information on one or more data objects of the remote data source and identify metadata corresponding to the set of data objects for the query operation in the metadata cache. The system may determine whether the identified metadata for the set of data objects meets an update condition. In response to the identified metadata meeting the update condition, the system may fetch updated metadata for at least the set of data objects from the remote data source, and store the updated metadata in the metadata cache.

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/23 - Updating
G06F 16/2455 - Query execution

31. Multiple pass sort

Application Number	17875176
Grant Number	12105690
Status	In Force
Filing Date	2022-07-27
First Publication Date	2024-10-01
Grant Date	2024-10-01
Owner	Databricks, Inc. (USA)
Inventor	Armstrong, Timothy Krishnan, Arvind Sai Guliyev, Khayyam

Abstract

A system for multipass sort includes a communication interface and a processor. The communication interface is configured to receive from a client device a request to sort a dataset that includes a plurality of rows. The processor is configured to perform a first sort pass on the dataset in part by: extracting prefixes associated with a first schema element associated with the dataset for the plurality of rows; and sorting the extracted prefixes utilizing an integer sort algorithm based on a sort order included in the request to sort the dataset, where sorting the extracted prefixes includes utilizing NULL values to resolve a tied range that includes at least two rows of the plurality of rows having a same extracted prefix.

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/2455 - Query execution

32. Scaling delta table optimize command

Application Number	18093916
Grant Number	12079167
Status	In Force
Filing Date	2023-01-06
First Publication Date	2024-09-03
Grant Date	2024-09-03
Owner	Databricks, Inc. (USA)
Inventor	Mahadev, Rahul Shivu Yavuz, Burak Das, Tathagata

Abstract

The interface is to receive an indication to execute an optimize command. The processor is to receive a file name; determine whether adding a file of the file name to a current bin causes the current bin to exceed a threshold; associate the file with the current bin in response to determining that adding the file does not cause the current bin to exceed the bin threshold; in response to determining that adding the file to the current bin causes the current bin to exceed the bin threshold: associate the file with a next bin, indicate that the current bin is closed, and add the current bin to a batch of bins; determine whether a measure of the batch of bins exceeds a batch threshold; and in response to determining that the measure exceeds the batch threshold, provide the batch of bins for processing.

IPC Classes ?

G06F 16/172 - Caching, prefetching or hoarding of files
G06F 16/22 - IndexingData structures thereforStorage structures

33. Data ingestion using data file clustering with KD-epsilon trees

Application Number	18218400
Grant Number	12072863
Status	In Force
Filing Date	2023-07-05
First Publication Date	2024-08-27
Grant Date	2024-08-27
Owner	Databricks, Inc. (USA)
Inventor	Jain, Prakhar Johnson, Frederick Ryan Samwel, Bart

Abstract

IPC Classes ?

G06F 16/20 - Information retrievalDatabase structures thereforFile system structures therefor of structured data, e.g. relational data
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/23 - Updating
G06F 16/245 - Query processing
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

34. Data maintenance transaction rollbacks

Application Number	17580475
Grant Number	12072843
Status	In Force
Filing Date	2022-01-20
First Publication Date	2024-08-27
Grant Date	2024-08-27
Owner	Databricks, Inc. (USA)
Inventor	Jain, Prakhar Samwel, Bart Yavuz, Burak

Abstract

The present application discloses a method, system, and computer system for managing a data in a storage system. The method includes receiving a first transaction that modifies or deletes first data stored in a storage system, determining that the first data is subject to an intervening re-arrangement transaction, and in response to determining that the first data is subject to the intervening re-arrangement transaction, rolling back the re-arrangement transaction at least with respect to the first data and committing the first transaction.

IPC Classes ?

G06F 16/174 - Redundancy elimination performed by the file system

35. MULTI-CLUSTER QUERY RESULT CACHING

Application Number	18221735
Status	Pending
Filing Date	2023-07-13
First Publication Date	2024-08-08
Owner	Databricks, Inc. (USA)
Inventor	Garg, Saksham Ghit, Bogdan Ionut Stevens, Christopher Stuart, Christian

Abstract

A multi-cluster computing system which includes a query result caching system is presented. The multi-cluster computing system may include a data processing service and client devices communicatively coupled over a network. The data processing service may include a control layer and a data layer. The control layer may be configured to receive and process requests from the client devices and manage resources in the data layer. The data layer may be configured to include instances of clusters of computing resources for executing jobs. The data layer may include a data storage system, which further includes a remote query result cache Store. The query result cache store may include a cloud storage query result cache which stores data associated with results of previously executed requests. As such, when a cluster encounters a previously executed request, the cluster may efficiently retrieve the cached result of the request from the in-memory query result cache or the cloud storage query result cache.

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 16/25 - Integrating or interfacing systems involving database management systems
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

36. Multi-cluster query result caching

Application Number	18222343
Grant Number	12189625
Status	In Force
Filing Date	2023-07-14
First Publication Date	2024-08-08
Grant Date	2025-01-07
Owner	Databricks, Inc. (USA)
Inventor	Ghit, Bogdan Ionut Garg, Saksham Stuart, Christian Stevens, Christopher

Abstract

IPC Classes ?

G06F 16/24 - Querying
G06F 16/2453 - Query optimisation
G06F 16/25 - Integrating or interfacing systems involving database management systems
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

37. RUNTIME ERROR ATTRIBUTION FOR DATABASE QUERIES SPECIFIED USING A DECLARATIVE DATABASE QUERY LANGUAGE

Application Number	CN2023073691
Publication Number	2024/156113
Status	In Force
Filing Date	2023-01-29
Publication Date	2024-08-02
Owner	DATABRICKS , INC. (USA)
Inventor	Fan, Wenchen Rielau, Serge Shen, Entong

Abstract

A system executes database queries specified using a declarative database query language such as the structured query language (SQL). The system determines whether a runtime error is encountered during execution of a query, for example, a division by zero error, resource usage errors such as out of memory error, time out error, and so on. The system reports such runtime errors encountered during execution of a database query. The system identifies one or more origins of the runtime error in the database query. The origin identifies a portion of the database query that represents a cause of the runtime error. Reporting the origin of a runtime error in the database query simplifies the task of development and testing of database queries.

IPC Classes ?

G06F 16/21 - Design, administration or maintenance of databases
G06F 16/24 - Querying

38. STATIC APPROACH TO LAZY MATERIALIZATION IN DATABASE SCANS USING PUSHED FILTERS

Application Number	18160850
Status	Pending
Filing Date	2023-01-27
First Publication Date	2024-08-01
Owner	Databricks, Inc. (USA)
Inventor	Palkar, Shoumik Behm, Alexander Mokhtar, Mostafa Krishnamurthy, Sriram

Abstract

Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. The method includes receiving a request to perform a new query in a columnar database containing a plurality of columns. A step in the method includes accessing a set of data in a column of the plurality of columns based on the query. The method includes generating an input to a machine-learned model comprising characteristics of the set of data in the column. From the machine-learned model, the method includes generating a likelihood value indicative of whether a filter of a first portion of the set of data in the column has greater efficiency than a download followed by a filter of the set of data in the column. The method further includes comparing the likelihood value to a threshold value. Based on the comparison, the method includes filtering the first portion of the set of data before downloading the set of data if the likelihood value is equal to or above the threshold value.

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 16/22 - IndexingData structures thereforStorage structures

39. Adaptive approach to lazy materialization in database scans using pushed filters

Application Number	18160861
Grant Number	12124450
Status	In Force
Filing Date	2023-01-27
First Publication Date	2024-08-01
Grant Date	2024-10-22
Owner	Databricks, Inc. (USA)
Inventor	Palkar, Shoumik Behm, Alexander Mokhtar, Mostafa Krishnamurthy, Sriram

Abstract

Disclosed herein is a method for determining whether to apply a lazy materialization technique to a query run. A data processing service receives a request to perform a query identifying a filter column and a non-filter column in a columnar database. The data processing service accesses a first task of contiguous rows in the filter column from a cloud-based object storage. The data processing service applies a filter defined by the query to the first task. The data processing service generates filter results for the first task that may include a percentage of the first task discarded and a run-time. The data processing service determines, based on the filter results for the first task, a likelihood value that indicates a likelihood of gaining a performance benefit by applying the lazy materialization technique to a second task of the query.

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures

40. Evaluating expressions over dictionary data

Application Number	18162607
Grant Number	12210528
Status	In Force
Filing Date	2023-01-31
First Publication Date	2024-08-01
Grant Date	2025-01-28
Owner	Databricks, Inc. (USA)
Inventor	Agarwal, Utkarsh Palkar, Shoumik Behm, Alexander Krishnamurthy, Sriram

Abstract

Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least an operator for a columnar dataset on cloud storage. At least one column in the dataset is based on a dictionary, and the dictionary maps one or more values for a column to one or more respective identifiers. The method evaluates the operator on one or more values of the dictionary to generate an updated dictionary comprising updated values. The method may decode the updated dictionary into an updated column comprising updated data values.

IPC Classes ?

G06F 16/2455 - Query execution
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures

41. Dictionary filtering and evaluation in columnar databases

Application Number	18162616
Grant Number	12242485
Status	In Force
Filing Date	2023-01-31
First Publication Date	2024-08-01
Grant Date	2025-03-04
Owner	Databricks, Inc. (USA)
Inventor	Agarwal, Utkarsh Palkar, Shoumik Behm, Alexander Krishnamurthy, Sriram

Abstract

Disclosed herein is a method, system, or non-transitory computer readable medium for evaluating a query on a columnar dataset comprising one or more dictionaries associated with columns in the dataset. The method includes receiving a request to perform a query comprising at least a operator and a request to return information about a value of interest in a columnar dataset stored on cloud storage. At least one column in the columnar dataset is based on a dictionary. The dictionary maps one or more values for a column to one or more respective identifiers. The method determines whether to perform dictionary filtering for the query by calculating a metric based on one or more factors. Responsive to the metric being below a threshold, which may be predetermined, the method performs the dictionary filtering.

IPC Classes ?

G06F 16/24 - Querying
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/2455 - Query execution

42. EXECUTION AND ATTESTATION OF USER DEFINED FUNCTIONS IN DATABASES

Application Number	18161475
Status	Pending
Filing Date	2023-01-30
First Publication Date	2024-08-01
Owner	Databricks, Inc. (USA)
Inventor	Grund, Martin Van Hövell Tot Westerflier, Herman Rudolf Petrus Catharina Leone, Stefania

Abstract

A system executes user defined functions (UDFs) invoked by database queries. The UDF includes UDF code specified using a programing language distinct from a database query language. A hash value from the UDF code provided by a client application for creating the UDF is compared with a hash value generated from UDF code invoked by database queries to determine whether the two UDF codes match. If the two hash values fail to match, the system takes an action, for example, storing an indication of UDF code mismatch or disabling subsequent executions of the database queries invoking the UDF. The system may use encoded UDF code that is decoded by the system at runtime using a key obtained from a separate system such as the client application. The client application can disable execution of database queries executing the UDF code by refusing to provide the key.

IPC Classes ?

G06F 16/242 - Query formulation
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 21/60 - Protecting data

43. NUMA AWARENESS ARCHITECTURE FOR VM-BASED CONTAINER IN KUBERNETES ENVIRONMENT

Application Number	18162659
Status	Pending
Filing Date	2023-01-31
First Publication Date	2024-08-01
Owner	Databricks, Inc. (USA)
Inventor	Chen, Shuo Qiao, Yuming Liu, Anders

Abstract

Disclosed herein is a method for resource management in a web-based container orchestrating environment. A disclosed method includes initializing a set of micro-virtual machines (VMs) within a macro-VM environment. The method each container within a micro-VM based sandbox. The method assigns a virtual central processing unit (vCPU) to a micro-VM based on an estimated memory required by the micro-VM and the estimated available memory associated with the vCPU. The method pins the vCPU with a physical CPU based on the pod location of the physical CPU and an estimated available memory associated with the vCPU and an available local memory of the physical CPU. The method maintains a state of the vCPU and the physical CPU in a resource manager.

IPC Classes ?

G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 9/455 - EmulationInterpretationSoftware simulation, e.g. virtualisation or emulation of application or operating system execution engines

44. RUNTIME ERROR ATTRIBUTION FOR DATABASE QUERIES SPECIFIED USING A DECLARATIVE DATABASE QUERY LANGUAGE

Application Number	18296876
Status	Pending
Filing Date	2023-04-06
First Publication Date	2024-08-01
Owner	Databricks, Inc. (USA)
Inventor	Wang, Gengliang Fan, Wenchen Rielau, Serge Shen, Entong

Abstract

IPC Classes ?

G06F 11/36 - Prevention of errors by analysis, debugging or testing of software
G06F 16/25 - Integrating or interfacing systems involving database management systems
G06F 16/901 - IndexingData structures thereforStorage structures

45. Concurrent optimistic transactions for tables with deletion vectors

Application Number	18156109
Grant Number	12147412
Status	In Force
Filing Date	2023-01-18
First Publication Date	2024-07-18
Grant Date	2024-11-19
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Stavrakakis, Christos

Abstract

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/23 - Updating

46. State rebalancing in structured streaming

Application Number	18219314
Grant Number	12099525
Status	In Force
Filing Date	2023-07-07
First Publication Date	2024-06-20
Grant Date	2024-09-24
Owner	Databricks, Inc. (USA)
Inventor	Balikov, Alexander Das, Tathagata Ramasamy, Karthikeyan

Abstract

IPC Classes ?

G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 16/2455 - Query execution

47. SYSTEMS AND METHODS FOR A VIRTUAL SANDBOX DATABASE

Application Number	18429163
Status	Pending
Filing Date	2024-01-31
First Publication Date	2024-05-23
Owner	DATABRICKS, INC. (USA)
Inventor	Khurana, Amandeep Li, Nong

Abstract

Various embodiments of the present technology generally relate to management of big data storage and data access control systems. In some embodiments, a data access system for use in multiple application service and multiple storage service environments comprises a sandbox database for users, wherein the sandbox database is a virtual database environment via which a user may access datasets according to one or more access policies. In some embodiments, the data access system receives a user request to access a dataset stored in a database into the sandbox environment, wherein the database is associated with the data access system. In response to the request, the data access system may retrieve the corresponding data from the database, determine any associated sandbox access policies, and generate an anonymized data table in the sandbox environment.

IPC Classes ?

G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 16/248 - Presentation of query results
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules

48. Model ML registry and model serving

Application Number	18512028
Grant Number	12117983
Status	In Force
Filing Date	2023-11-17
First Publication Date	2024-05-09
Grant Date	2024-10-15
Owner	Databricks, Inc. (USA)
Inventor	Davidson, Aaron Daniel Mewald, Clemens Nykodym, Tomas

Abstract

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/21 - Design, administration or maintenance of databases
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06N 5/022 - Knowledge engineeringKnowledge acquisition

49. EFFICIENTLY VECTORIZED IMPLEMENTATION OF OPERATIONS IN A GLOBAL GRID INDEXING LIBRARY

Application Number	18501839
Status	Pending
Filing Date	2023-11-03
First Publication Date	2024-05-09
Owner	Databricks, Inc. (USA)
Inventor	Cheong Zhi Xi, Desmond Karavelas, Menelaos

Abstract

A data processing service generates for iteratively applying a geospatial function to geospatial data. The generated code includes at least a first iterative loop and a second iterative loop. The data processing service compiles the generated code to generate compiled code that vectorized at least the second iterative loop. The data processing service receives a request from a client device to perform one or more data processing operations including applying the geospatial function to a data table of geospatial cell indices. The data processing service compiles the request into one or more tasks including at least a vectorized operation based on the compiled code and executes the one or more tasks by at least invoking the vectorized operation on the set of worker nodes.

IPC Classes ?

G06F 8/41 - Compilation

50. Fetching query results through cloud object stores

Application Number	17841946
Grant Number	11960494
Status	In Force
Filing Date	2022-06-16
First Publication Date	2024-04-16
Grant Date	2024-04-16
Owner	Databricks, Inc. (USA)
Inventor	Ghit, Bogdan Ionut Sompolski, Juliusz Xin, Shi Samwel, Bart

Abstract

IPC Classes ?

G06F 16/2458 - Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/242 - Query formulation
G06F 16/25 - Integrating or interfacing systems involving database management systems

51. Function creation for database execution of deep learning model

Application Number	18162291
Grant Number	11948084
Status	In Force
Filing Date	2023-01-31
First Publication Date	2024-04-02
Grant Date	2024-04-02
Owner	Databricks, Inc. (USA)
Inventor	Hong, Sue Ann Xin, Shi Hunter, Timothee Ghodsi, Ali

Abstract

A function creation method is disclosed. The method comprises defining one or more database function inputs, defining cluster processing information, defining a deep learning model, and defining one or more database function outputs. A database function is created based at least in part on the one or more database function inputs, the cluster set-up information, the deep learning model, and the one or more database function outputs. In some embodiments, the database function enables a non-technical user to utilize deep learning models.

IPC Classes ?

G06N 3/08 - Learning methods
G06N 3/04 - Architecture, e.g. interconnection topology
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06N 5/02 - Knowledge representationSymbolic representation
G06N 5/022 - Knowledge engineeringKnowledge acquisition
G06F 16/14 - Details of searching files based on file metadata
G06F 16/22 - IndexingData structures thereforStorage structures

52. EFFICIENT MERGE OF TABULAR DATA USING A PROCESSING FILTER

Application Number	17895872
Status	Pending
Filing Date	2022-08-25
First Publication Date	2024-02-29
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Das, Tathagata Kroll, Lars Cui, Yijia Sompolski, Juliusz Van Bussel, Tom

Abstract

A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first, second and a third jobs, and obtaining a resulting table based at least in part on the second job resulting file(s) and third job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s). Performing the third job includes determining unmatched rows for target table files and storing the unmatched rows in third job resulting file(s).

IPC Classes ?

G06F 7/14 - Merging, i.e. combining at least two sets of record carriers each arranged in the same ordered sequence to produce a single set having the same ordered sequence
G06F 16/14 - Details of searching files based on file metadata
G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems

53. Efficient merging of tabular data with post-processing compaction

Application Number	17895877
Grant Number	12056126
Status	In Force
Filing Date	2022-08-25
First Publication Date	2024-02-29
Grant Date	2024-08-06
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Das, Tathagata Kroll, Lars Cui, Yijia Sompolski, Juliusz Van Bussel, Tom Jain, Prakhar

Abstract

IPC Classes ?

G06F 17/30 - Information retrieval; Database structures therefor
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/2453 - Query optimisation
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

54. EFFICIENT MERGE OF TABULAR DATA USING MIXING

Application Number	17895882
Status	Pending
Filing Date	2022-08-25
First Publication Date	2024-02-29
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Das, Tathagata Kroll, Lars Cui, Yijia Sompolski, Juliusz Van Bussel, Tom

Abstract

A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and obtaining other resulting files based at least in part on a second set of unmatched rows among the target table and the source table that results from the first set of unmatched rows having been processed in the second job, and obtaining a resulting table based on (i) second job resulting file(s), and (ii) other resulting files. Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a first matching action based on matched rows and a second matching action based on a subset of unmatched rows.

IPC Classes ?

G06F 16/2455 - Query execution
G06F 16/22 - IndexingData structures thereforStorage structures

55. Efficient merge of tabular data with deletion indications

Application Number	17895890
Grant Number	12045220
Status	In Force
Filing Date	2022-08-25
First Publication Date	2024-02-29
Grant Date	2024-07-23
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Das, Tathagata Kroll, Lars Cui, Yijia Sompolski, Juliusz Stavrakakis, Chirstos

Abstract

A method, system, and computer system for performing an operation with respect to a target table are disclosed. The method includes performing first and second jobs, and persist, in one or more deletion vector files, one or more deletion vectors for corresponding rows of the one or more target table files, and obtaining a resulting table based at least in part on the second job resulting file(s). Performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. Performing the second job includes performing a matching action based on matched rows and one or more deletion of vectors associated with previously removed rows of the matching target table files and obtaining the second job resulting file(s).

IPC Classes ?

G06F 17/30 - Information retrieval; Database structures therefor
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06F 16/22 - IndexingData structures thereforStorage structures

56. Scan parsing

Application Number	18162366
Grant Number	12189628
Status	In Force
Filing Date	2023-01-31
First Publication Date	2024-02-22
Grant Date	2025-01-07
Owner	Databricks, Inc. (USA)
Inventor	Menon, Prashanth Behm, Alexander Krishnamurthy, Sriram

Abstract

The present application discloses a method, system, and computer system for parsing files. The method includes receiving an indication that a first file is to be processed, determining to begin processing the first file using a first processing engine based at least in part on one or more predefined heuristics, indicating to process the first file using a first processing engine, determining whether a particular error in processing the first file using the first processing engine has been detected, in response to determining that the particular error has been detected, indicate to stop processing the first file using the first processing engine and indicate to continue processing using a second processing engine, and storing in memory information obtained based on processing the first file by one or more of the first processing engine and the second processing engine.

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/2453 - Query optimisation
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

57. Scan parsing

Application Number	17892376
Grant Number	12072880
Status	In Force
Filing Date	2022-08-22
First Publication Date	2024-02-22
Grant Date	2024-08-27
Owner	Databricks, Inc. (USA)
Inventor	Menon, Prashanth Behm, Alexander Krishnamurthy, Sriram

Abstract

IPC Classes ?

G06F 9/00 - Arrangements for program control, e.g. control units
G06F 16/2453 - Query optimisation
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

58. Update and query of a large collection of files that represent a single dataset stored on a blob store

Application Number	18236516
Grant Number	12189607
Status	In Force
Filing Date	2023-08-22
First Publication Date	2023-12-07
Grant Date	2025-01-07
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Zhu, Shixiong Yavuz, Burak

Abstract

A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log, determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

IPC Classes ?

G06F 16/14 - Details of searching files based on file metadata
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/23 - Updating

59. K-D tree balanced splitting

Application Number	17738609
Grant Number	12061586
Status	In Force
Filing Date	2022-05-06
First Publication Date	2023-11-09
Grant Date	2024-08-13
Owner	Databricks, Inc. (USA)
Inventor	Samwel, Bart Jain, Prakhar

Abstract

IPC Classes ?

G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

60. QUERY WATCHDOG

Application Number	18200316
Status	Pending
Filing Date	2023-05-22
First Publication Date	2023-11-09
Owner	Databricks, Inc. (USA)
Inventor	Luszczak, Alicja Shankar, Srinath Xin, Shi

Abstract

A system for monitoring job execution includes an interface and a processor. The interface is configured to receive an indication to start a cluster processing job. The processor is configured to determine whether processing a data instance associated with the cluster processing job satisfies a watchdog criterion; and in the event that processing the data instance satisfies the watchdog criterion, cause the processing of the data instance to be killed.

IPC Classes ?

G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 11/30 - Monitoring

61. Automated processing of multiple prediction generation including model tuning

Application Number	17896281
Grant Number	12033041
Status	In Force
Filing Date	2022-08-26
First Publication Date	2023-08-03
Grant Date	2024-07-09
Owner	Databricks, Inc. (USA)
Inventor	Wilson, Benjamin Thomas Zumar, Corey

Abstract

IPC Classes ?

G06N 20/00 - Machine learning
G06F 18/20 - Analysing
G06F 18/2132 - Feature extraction, e.g. by transforming the feature spaceSummarisationMappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis

62. OPTIMIZATION OF TUNING FOR MODELS USED FOR MULTIPLE PREDICTION GENERATION

Application Number	17587793
Status	Pending
Filing Date	2022-01-28
First Publication Date	2023-08-03
Owner	Databricks Inc. (USA)
Inventor	Wilson, Benjamin Thomas Zumar, Corey

Abstract

The present application discloses a method, system, and computer system for tuning a set of models. The method includes determining a set of one or more models to optimize, determining a plurality of optimizer modules with which to optimize the set of one or more models, causing the plurality of optimizer modules to respectively perform a respective optimizing process with respect to at least one model of the set of one or more models, and deploying an optimized model obtained based at least in part on optimizing metrics of the set of the one or more models.

IPC Classes ?

G06N 20/00 - Machine learning

63. ACCESS OF DATA AND MODELS ASSOCIATED WITH MULTIPLE PREDICTION GENERATION

Application Number	17587820
Status	Pending
Filing Date	2022-01-28
First Publication Date	2023-08-03
Owner	Databricks Inc. (USA)
Inventor	Wilson, Benjamin Thomas Zumar, Corey

Abstract

The present application discloses a method, system, and computer system for querying a model associated with a dataset. The method includes providing an input interface via which a first entity inputs a dataset, receiving the dataset, and providing a selection interface that exposes to a second entity the plurality of models determined for the dataset and/or the plurality of results corresponding to the plurality of models using the index entries. The dataset comprises a plurality of keys and a plurality of key-value relationships, and the dataset is formatted according to a predefined format, wherein index entries are generated for a plurality of models and a plurality of results corresponding to the plurality of models.

IPC Classes ?

G06F 16/903 - Querying
G06N 20/00 - Machine learning

64. AUTOMATED PROCESSING OF MULTIPLE PREDICTION GENERATION INCLUDING MODEL TUNING

Application Number	US2022014580
Publication Number	2023/146549
Status	In Force
Filing Date	2022-01-31
Publication Date	2023-08-03
Owner	DATABRICKS INC. (USA)
Inventor	Wilson, Benjamin, Thomas Zumar, Corey

Abstract

The present application discloses a method, system, and computer system for building a model associated with a dataset. The method includes receiving a data set, the dataset comprising a plurality of keys and a plurality of key- value relationships, determining a plurality of models to build based at least in part on the dataset, wherein determining the plurality of models to build comprises using the dataset format information to identify the plurality of models, building the plurality of models, and optimizing at least one of the plurality of models.

IPC Classes ?

G06N 7/00 - Computing arrangements based on specific mathematical models
G06N 20/00 - Machine learning

65. Systems and methods for a virtual sandbox database

Application Number	18170585
Grant Number	11971981
Status	In Force
Filing Date	2023-02-17
First Publication Date	2023-06-22
Grant Date	2024-04-30
Owner	DATABRICKS, INC. (USA)
Inventor	Khurana, Amandeep Li, Nong

Abstract

IPC Classes ?

G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 16/248 - Presentation of query results
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules

66. Hash based rollup with passthrough

Application Number	17099467
Grant Number	11675767
Status	In Force
Filing Date	2020-11-16
First Publication Date	2023-06-13
Grant Date	2023-06-13
Owner	Databricks, Inc. (USA)
Inventor	Behm, Alexander Dave, Ankur

Abstract

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
G06F 16/242 - Query formulation
G06F 16/2455 - Query execution
G06F 16/13 - File access structures, e.g. distributed indices

67. Model ML registry and model serving

Application Number	18162579
Grant Number	11853277
Status	In Force
Filing Date	2023-01-31
First Publication Date	2023-06-08
Grant Date	2023-12-26
Owner	Databricks, Inc. (USA)
Inventor	Davidson, Aaron Daniel Nykodym, Tomas Mewald, Clemens

Abstract

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/21 - Design, administration or maintenance of databases
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06N 5/022 - Knowledge engineeringKnowledge acquisition

68. FEATURE STORE WITH INTEGRATED TRACKING

Application Number	18162625
Status	Pending
Filing Date	2023-01-31
First Publication Date	2023-06-08
Owner	Databricks, Inc. (USA)
Inventor	Parkhe, Mani Mewald, Clemens Zaharia, Matei Singh, Avesh

Abstract

The present application discloses a method, system, and computer system for managing a plurality of features and storing lineage information pertaining to the features. The method includes obtaining one or more datasets, determining a first feature, wherein the first feature is determined based at least in part on the one or more datasets, and storing the first feature in a feature store. The first feature is stored in association with a dataset indication of the one or more datasets from which the first feature is determined. The feature store comprises a plurality of features.

IPC Classes ?

G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

69. Integrated native vectorized engine for computation

Application Number	18158258
Grant Number	11874832
Status	In Force
Filing Date	2023-01-23
First Publication Date	2023-05-25
Grant Date	2024-01-16
Owner	Databricks, Inc. (USA)
Inventor	Xin, Shi Behm, Alexander Palkar, Shoumik Van Hovell Tot Westerflier, Herman Rudolf Petrus Catharina

Abstract

A system comprises an interface, a processor, and a memory. The interface is configured to receive a query. The processor is configured to: determine a set of nodes for the query; determine whether a node of the set of nodes comprises a first engine node type or a second engine node type, wherein determining whether the node of the set of nodes comprises the first engine node type or the second engine node type is based at least in part on determining whether the node is able to be executed in a second engine; and generate a plan based at least in part on the set of nodes. The memory is coupled to the processor and is configured to provide the processor with instructions.

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 16/2458 - Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
G06F 16/25 - Integrating or interfacing systems involving database management systems

70. Structured cluster execution for data streams

Application Number	17976361
Grant Number	12032573
Status	In Force
Filing Date	2022-10-28
First Publication Date	2023-05-11
Grant Date	2024-07-09
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Das, Tathagata Xin, Shi Zaharia, Matei

Abstract

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 16/2455 - Query execution

71. Dataflow graph processing

Application Number	18089349
Grant Number	12019682
Status	In Force
Filing Date	2022-12-27
First Publication Date	2023-05-04
Grant Date	2024-06-25
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Neumann, Andreas Murthy, Mukul Mio, Jonathan

Abstract

A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries and/or commands. The processor is coupled to the communication interface and configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; determine the dataflow graph by determining in-line expressions for tables of the dataflow graph aggregating calculations associated with a subset of dataflow graph nodes designated as view nodes; and provide the dataflow graph.

IPC Classes ?

G06F 16/901 - IndexingData structures thereforStorage structures
G06F 16/215 - Improving data qualityData cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/245 - Query processing

72. Function creation for database execution of deep learning model

Application Number	15610062
Grant Number	11599783
Status	In Force
Filing Date	2017-05-31
First Publication Date	2023-03-07
Grant Date	2023-03-07
Owner	Databricks, Inc. (USA)
Inventor	Hong, Sue Ann Xin, Shi Hunter, Timothee Ghodsi, Ali

Abstract

IPC Classes ?

G06N 3/08 - Learning methods
G06N 3/063 - Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
G06N 5/02 - Knowledge representationSymbolic representation
G06N 3/04 - Architecture, e.g. interconnection topology
G06N 5/022 - Knowledge engineeringKnowledge acquisition
G06F 16/14 - Details of searching files based on file metadata
G06F 16/22 - IndexingData structures thereforStorage structures

73. Scaling delta table optimize command

Application Number	17384486
Grant Number	11567900
Status	In Force
Filing Date	2021-07-23
First Publication Date	2023-01-31
Grant Date	2023-01-31
Owner	Databricks, Inc. (USA)
Inventor	Mahadev, Rahul Shivu Yavuz, Burak Das, Tathagata

Abstract

IPC Classes ?

G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/172 - Caching, prefetching or hoarding of files

74. Managed metastorage

Application Number	17514982
Grant Number	12277237
Status	In Force
Filing Date	2021-10-29
First Publication Date	2022-11-24
Grant Date	2025-04-15
Owner	Databricks, Inc. (USA)
Inventor	Zaharia, Matei Lewis, David Lian, Cheng Huo, Yuchen Ghodsi, Ali

Abstract

The present application discloses a method, system, and computer system for providing access to information stored on system for data storage. The method includes receiving a data request from a user, determining data corresponding to the data request, determining whether the user has requisite permissions to access the data, and in response to determining that the user has requisite permissions to access the data: determining a manner by which to provide access to the data, wherein the data comprises a filtered subset of stored data, and generating a token based at least in part on the user and the manner by which access to the data is to be provided.

IPC Classes ?

G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 3/06 - Digital input from, or digital output to, record carriers

75. FEATURE STORE WITH INTEGRATED TRACKING

Application Number	17514997
Status	Pending
Filing Date	2021-10-29
First Publication Date	2022-11-24
Owner	Databricks Inc. (USA)
Inventor	Parkhe, Mani Mewald, Clemens Zaharia, Matei Singh, Avesh

Abstract

IPC Classes ?

G06F 16/28 - Databases characterised by their database models, e.g. relational or object models
G06F 30/27 - Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model

76. FEATURE STORE WITH INTEGRATED TRACKING

Application Number	US2022027387
Publication Number	2022/245536
Status	In Force
Filing Date	2022-05-03
Publication Date	2022-11-24
Owner	DATABRICKS INC. (USA)
Inventor	Parkhe, Mani Mewald, Clemens Zaharia, Matei Singh, Avesh

Abstract

IPC Classes ?

G06F 8/65 - Updates
G06F 8/60 - Software deployment
G06F 8/71 - Version control Configuration management
G06F 17/18 - Complex mathematical operations for evaluating statistical data
G06N 20/00 - Machine learning

77. LIFO based spilling for grouping aggregation

Application Number	17116230
Grant Number	11481398
Status	In Force
Filing Date	2020-12-09
First Publication Date	2022-10-25
Grant Date	2022-10-25
Owner	Databricks Inc. (USA)
Inventor	Behm, Alexander Dave, Ankur Deng, Ryan Palkar, Shoumik

Abstract

A system for spilling comprises an interface and a processor. The interface is configured to receive an indication to perform a GROUP BY operation, wherein the indication comprises an input table and a grouping column. The processor is configured to: for each input table entry of the input table, determine a key, wherein the key is based at least in part on the input table entry and the grouping column; add the key to a grouping hash table, wherein adding the key to the grouping hash table comprises last-in, first-out (LIFO) spilling when necessary; create an output table based at least in part on the grouping hash table; and provide the output table.

IPC Classes ?

G06F 16/2455 - Query execution
G06F 16/22 - IndexingData structures thereforStorage structures

78. Automated processing of multiple prediction generation including model tuning

Application Number	17587806
Grant Number	11468369
Status	In Force
Filing Date	2022-01-28
First Publication Date	2022-10-11
Grant Date	2022-10-11
Owner	Databricks Inc. (USA)
Inventor	Wilson, Benjamin Thomas Zumar, Corey

Abstract

IPC Classes ?

G06N 20/00 - Machine learning
G06K 9/62 - Methods or arrangements for recognition using electronic means

79. Dataflow graph processing

Application Number	17362450
Grant Number	11567998
Status	In Force
Filing Date	2021-06-29
First Publication Date	2022-09-29
Grant Date	2023-01-31
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Neumann, Andreas Murthy, Mukul Mio, Jonathan

Abstract

IPC Classes ?

G06F 16/901 - IndexingData structures thereforStorage structures
G06F 16/245 - Query processing
G06F 16/22 - IndexingData structures thereforStorage structures

80. DATAFLOW GRAPH PROCESSING WITH EXPECTATIONS

Application Number	US2022020378
Publication Number	2022/203903
Status	In Force
Filing Date	2022-03-15
Publication Date	2022-09-29
Owner	DATABRICKS INC. (USA)
Inventor	Armbrust, Michael Paul Neumann, Andreas Murthy, Mukul Mio, Jonathan

Abstract

A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.

IPC Classes ?

G06F 16/2453 - Query optimisation

81. Dataflow graph processing with expectations

Application Number	17362456
Grant Number	12008040
Status	In Force
Filing Date	2021-06-29
First Publication Date	2022-09-29
Grant Date	2024-06-11
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Neumann, Andreas Murthy, Mukul Mio, Jonathan

Abstract

A system for dataflow graph processing comprises a communication interface and a processor. The communication interface is configured receive an indication to generate a dataflow graph, wherein the indication includes a set of queries. The processor is coupled to the communication interface and is configured to: determine dependencies of each query in the set of queries on another query; determine a DAG of nodes based at least in part on the dependencies; insert a node in the DAG of nodes to generate an updated DAG to enforce an expectation; determine a dataflow graph based on the updated DAG; and provide the dataflow graph.

IPC Classes ?

G06F 16/901 - IndexingData structures thereforStorage structures
G06F 16/215 - Improving data qualityData cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/245 - Query processing

82. Update and query of a large collection of files that represent a single dataset stored on a blob store

Application Number	17695411
Grant Number	11775499
Status	In Force
Filing Date	2022-03-15
First Publication Date	2022-08-11
Grant Date	2023-10-03
Owner	Databricks, Inc. (USA)
Inventor	Armbrust, Michael Paul Zhu, Shixiong Yavuz, Burak

Abstract

A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

IPC Classes ?

G06F 16/14 - Details of searching files based on file metadata
G06F 16/22 - IndexingData structures thereforStorage structures
G06F 16/23 - Updating

83. INTEGRATED NATIVE VECTORIZED ENGINE FOR COMPUTATION

Application Number	US2021050581
Publication Number	2022/066490
Status	In Force
Filing Date	2021-09-16
Publication Date	2022-03-31
Owner	DATABRICKS INC. (USA)
Inventor	Xin, Shi Behm, Alexander Palkar, Shoumik Van Hovell Tot Westerflier, Herman Rudolf Petrus Catharin

Abstract

IPC Classes ?

G06F 12/126 - Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
G06T 1/60 - Memory management
G06F 16/20 - Information retrievalDatabase structures thereforFile system structures therefor of structured data, e.g. relational data

84. Integrated native vectorized engine for computation

Application Number	17237979
Grant Number	11586624
Status	In Force
Filing Date	2021-04-22
First Publication Date	2022-03-31
Grant Date	2023-02-21
Owner	Databricks, Inc. (USA)
Inventor	Xin, Shi Behm, Alexander Palkar, Shoumik Van Hövell Tot Westerflier, Herman Rudolf Petrus Catharina

Abstract

IPC Classes ?

G06F 16/2453 - Query optimisation
G06F 16/2458 - Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
G06F 16/25 - Integrating or interfacing systems involving database management systems

85. Model ML registry and model serving

Application Number	17324907
Grant Number	11693837
Status	In Force
Filing Date	2021-05-19
First Publication Date	2022-03-24
Grant Date	2023-07-04
Owner	Databricks, Inc. (USA)
Inventor	Davidson, Aaron Daniel Nykodym, Tomas Mewald, Clemens

Abstract

IPC Classes ?

G06F 16/00 - Information retrievalDatabase structures thereforFile system structures therefor
G06F 16/21 - Design, administration or maintenance of databases
G06F 16/955 - Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
G06N 5/022 - Knowledge engineeringKnowledge acquisition

86. Query watchdog

Application Number	17537124
Grant Number	11693723
Status	In Force
Filing Date	2021-11-29
First Publication Date	2022-03-17
Grant Date	2023-07-04
Owner	Databricks, Inc. (USA)
Inventor	Luszczak, Alicja Shankar, Srinath Xin, Shi

Abstract

IPC Classes ?

G06F 11/07 - Responding to the occurrence of a fault, e.g. fault tolerance
G06F 11/34 - Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation
G06F 11/30 - Monitoring

87. MOSAICML

Serial Number	90788504
Status	Registered
Filing Date	2021-06-22
Registration Date	2022-08-23
Owner	DATABRICKS, INC. ()
NICE Classes ?	42 - Scientific, technological and industrial services, research and design

Goods & Services

Providing temporary use of online non-downloadable computer software for use in accessing algorithmic speed-up modules in standard deep learning frameworks; Providing temporary use of online non-downloadable computer software for use in visualizing costs and performance for deep learning training jobs; Providing temporary use of online non-downloadable computer software for use in training employees in deep learning, including performance analysis, visualization and computing resources

88. MOSAICML

Serial Number	90788499
Status	Registered
Filing Date	2021-06-22
Registration Date	2022-08-23
Owner	DATABRICKS, INC. ()
NICE Classes ?	09 - Scientific and electric apparatus and instruments

Goods & Services

Downloadable computer software for use in accessing algorithmic speed-up modules in standard deep learning frameworks; Downloadable computer software for use in visualizing costs and performance for deep learning training jobs; Downloadable computer software for use in training employees in deep learning, including performance analysis, visualization and computing resources

89. Systems and methods for a virtual sandbox database

Application Number	16935690
Grant Number	11609986
Status	In Force
Filing Date	2020-07-22
First Publication Date	2021-06-10
Grant Date	2023-03-21
Owner	DATABRICKS, INC. (USA)
Inventor	Khurana, Amandeep Li, Nong

Abstract

IPC Classes ?

G06F 21/53 - Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity, buffer overflow or preventing unwanted data erasure by executing in a restricted environment, e.g. sandbox or secure virtual machine
G06F 16/248 - Presentation of query results
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules

90. Data retrieval using distributed workers in a large-scale data access system

Application Number	17026772
Grant Number	12050619
Status	In Force
Filing Date	2020-09-21
First Publication Date	2021-03-25
Grant Date	2024-07-30
Owner	DATABRICKS, INC. (USA)
Inventor	Khurana, Amandeep Li, Nong

Abstract

IPC Classes ?

G06F 16/27 - Replication, distribution or synchronisation of data between databases or within a distributed database systemDistributed database system architectures therefor
G06F 9/54 - Interprogram communication
G06F 16/2455 - Query execution
G06F 16/25 - Integrating or interfacing systems involving database management systems
G06F 16/28 - Databases characterised by their database models, e.g. relational or object models

91. BACKGROUND DATASET MAINTENANCE

Application Number	16935654
Status	Pending
Filing Date	2020-07-22
First Publication Date	2021-03-18
Owner	DATABRICKS, INC. (USA)
Inventor	Khurana, Amandeep Li, Nong

Abstract

Various embodiments of the present technology generally relate to management of big data storage and the physical removal of data via data access systems for large data processing environments having multiple application services and multiple storage services. In some embodiments, a method of physically removing data from a storage system provides for identifying one or more files needing data removal treatment, determining that a file needing data removal treatment should be queued, and populating a queue with the file. Determining that a file should be queued is based, at least in part, on a staleness tolerance. The method further provides for treating the file and replacing a previous version of the file in storage with the updated file. In some implementations, treating the file includes removing data from the file to create an updated file and may further include additional changes to the file.

IPC Classes ?

G06F 16/16 - File or folder operations, e.g. details of user interfaces specifically adapted to file systems
G06F 16/17 - Details of further file system functions
G06F 16/23 - Updating
G06F 21/62 - Protecting access to data via a platform, e.g. using keys or access control rules
G06F 9/54 - Interprogram communication

92. DELTA ENGINE

Application Number	1577778
Status	Registered
Filing Date	2021-01-25
Registration Date	2021-01-25
Owner	Databricks, Inc. (USA)
NICE Classes ?	42 - Scientific, technological and industrial services, research and design

Goods & Services

Software as a service (SAAS) services; software as a service (SAAS) services featuring cloud-based software for performing data queries; software as a service (SAAS) services featuring cloud-based software for use in data optimization, data processing, data analytics, data integration, data warehousing, data mining, data sharing, data collection, data interpretation, and data visualization.

93. REDASH

Application Number	1575917
Status	Registered
Filing Date	2020-12-23
Registration Date	2020-12-23
Owner	Databricks, Inc. (USA)
NICE Classes ?	09 - Scientific and electric apparatus and instruments 42 - Scientific, technological and industrial services, research and design

Goods & Services

Downloadable software for use in data visualization, data queries, data analytics, data processing, data integration, data warehousing, data mining, data sharing, data collection, and data interpretation. Software as a service (SAAS) services featuring cloud-based software for use in data visualization, data queries, data analytics, data processing, data integration, data warehousing, data mining, data sharing, data collection, and data interpretation.

94. DELTA LIVE TABLES

Serial Number	90495216
Status	Registered
Filing Date	2021-01-28
Registration Date	2022-09-20
Owner	Databricks, Inc. ()
NICE Classes ?	42 - Scientific, technological and industrial services, research and design

Goods & Services

Software as a service (SAAS) services featuring cloud-based software for building data pipelines; Software as a service (SAAS) services featuring cloud-based software for data transformation (ETL); Software as a service (SAAS) services featuring cloud-based software for use in data optimization, data processing, data analytics, data integration, data warehousing, data mining, data sharing, data collection, data interpretation, and data visualization

95. DELTA ENGINE

Serial Number	90478105
Status	Registered
Filing Date	2021-01-20
Registration Date	2022-01-11
Owner	Databricks, Inc. ()
NICE Classes ?	42 - Scientific, technological and industrial services, research and design

Goods & Services

Software as a service (SAAS) services featuring cloud-based software for performing data queries; Software as a service (SAAS) services featuring cloud-based software for use in data optimization, data processing, data analytics, data integration, data warehousing, data mining, data sharing, data collection, data interpretation, and data visualization

96. Update and query of a large collection of files that represent a single dataset stored on a blob store

Application Number	16941227
Grant Number	11308071
Status	In Force
Filing Date	2020-07-28
First Publication Date	2021-01-14
Grant Date	2022-04-19
Owner	Databricks Inc. (USA)
Inventor	Armbrust, Michael Paul Zhu, Shixiong Yavuz, Burak

Abstract

A system includes an interface and a processor. The interface is configured to receive a table indication of a data table and to receive a transaction indication to perform a transaction. The processor is configured to determine a current position N in a transaction log; determine a current state of the metadata; determine a read set associated with a transaction; attempt to write an update to the transaction log associated with a next position N+1; in response to a transaction determination that a simultaneous transaction associated with the next position N+1 already exists, determine a set of updated files; and in response to a determination that there is not an overlap between the read set associated with the current transaction and the set of updated files associated with the simultaneous transaction, attempt to write the update to the transaction to the transaction log associated with a further position N+2.

IPC Classes ?

G06F 16/23 - Updating
G06F 16/14 - Details of searching files based on file metadata
G06F 16/22 - IndexingData structures thereforStorage structures

97. Autoscaling using file access or cache usage for cluster machines

Application Number	17020573
Grant Number	11379272
Status	In Force
Filing Date	2020-09-14
First Publication Date	2020-12-31
Grant Date	2022-07-05
Owner	Databricks Inc. (USA)
Inventor	Shankar, Srinath Liang, Eric Keng-Hao

Abstract

The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a list of cluster machines storing one or more intermediate data files of a set of intermediate data files; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of a set of tasks executing or pending on the set of cluster machines nor storing the one or more intermediate data files of the set of intermediate data files, where the set of intermediate data files is associated with the set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.

IPC Classes ?

G06F 9/46 - Multiprogramming arrangements
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
H04L 67/5682 - Policies or rules for updating, deleting or replacing the stored data

98.

Miscellaneous Design

Application Number	1564813
Status	Registered
Filing Date	2020-10-02
Registration Date	2020-10-02
Owner	Databricks, Inc. (USA)
NICE Classes ?	09 - Scientific and electric apparatus and instruments 42 - Scientific, technological and industrial services, research and design

Goods & Services

Downloadable computer software for big data analysis; downloadable computer software for use in data integration, data warehousing, data mining, data processing, data sharing, data collection, data interpretation, data queries, data visualization, and data analytics; downloadable computer software platforms for data integration, data warehousing, data mining, data processing, data sharing, data collection, data interpretation, data queries, data visualization, and data analytics; downloadable cloud computer software for data integration, data warehousing, data mining, data processing, data sharing, data collection, data interpretation, data queries, data visualization, and data analytics; downloadable computer software for application database integration; desktop and mobile computing and operating platforms consisting of data transceivers, wireless networks and gateways, for collection, analysis, sharing, interpretation and management of data. Data mining; software as a service (SAAS) services, namely, hosting software for use by others for big data processing; software as a service (SAAS) services, namely, hosting software for use by others for data integration, data warehousing, data mining, data processing, data sharing, data collection, data interpretation, data queries, data visualization, and data analytics; providing temporary use of nondownloadable analytics software for data importing, data wrangling, data mining, data processing, data sharing, data collection, data interpretation, data queries, and data visualization; custom design and development of computer software; software as a service (SAAS) services featuring software for big data processing and analytics; development and creation of computer programs for data processing and analysis; software as a service (SAAS) services featuring software for data storage, data computation, data analysis, data processing, and database management; computer services, namely, hosting of search platforms on the Internet to allow users to index, integrate, warehouse, mine, process, share, collect, interpret, research, query, visualize, and analyze data; platform as a services (PAAS) featuring computer software platforms for use in data management, integration, warehousing, mining, interpretation, processing, sharing, collecting, research, queries, visualization, and analysis; providing temporary use of on-line non-downloadable cloud computing software for big data processing; providing temporary use of on-line non-downloadable cloud computing software for data integration, data warehousing, data mining, data processing, data sharing, data collection, data interpretation, data queries, data visualization, and data analytics; application service provider, namely, hosting, managing, developing and maintaining applications, and software of others in the fields of data importing, data storage, data management, data queries, data processing, data interpretation, data analytics, and data visualization.

99. Autoscaling using file access or cache usage for cluster machines

Application Number	16188989
Grant Number	10810051
Status	In Force
Filing Date	2018-11-13
First Publication Date	2020-10-20
Grant Date	2020-10-20
Owner	Databricks Inc. (USA)
Inventor	Shankar, Srinath Liang, Eric Keng-Hao

Abstract

The allocation system comprises an interface and a processor. The interface is configured to receive an indication to deactivate idle cluster machines of a set of cluster machines. The processor is configured to determine a set of tasks executing or pending on the set of cluster machines; determine a set of idle cluster machines of the set of cluster machines that are neither running one or more tasks of the set of tasks nor storing one or more intermediate data files of a set of intermediate data files, where the set of intermediate data files is associated with a set of tasks executing or pending on the cluster machines; and deactivate each cluster machine of the set of idle cluster machines.

IPC Classes ?

G06F 9/46 - Multiprogramming arrangements
G06F 9/50 - Allocation of resources, e.g. of the central processing unit [CPU]
G06F 9/48 - Program initiatingProgram switching, e.g. by interrupt
G06F 9/38 - Concurrent instruction execution, e.g. pipeline or look ahead
H04L 29/08 - Transmission control procedure, e.g. data link level control procedure

100.

CHEVRON DESIGN

Application Number	206978400
Status	Registered
Filing Date	2020-10-02
Registration Date	2023-09-27
Owner	Databricks, Inc. (USA)
NICE Classes ?	09 - Scientific and electric apparatus and instruments 42 - Scientific, technological and industrial services, research and design

Goods & Services

(1) Downloadable computer software for big data analysis; downloadable computer software for use in data integration, data warehousing, data mining, and data visualization; downloadable computer software for use in data processing, data sharing, data collection, data interpretation, data queries, and data analytics, namely, computer software that provides real-time, integrated business management intelligence by combining information from various databases, for database management, and for software for the integration of artificial intelligence and machine learning in the field of Big Data; downloadable computer software platforms for data integration, data warehousing, data mining, and data visualization; downloadable computer software platforms for data processing, data sharing, data collection, data interpretation, data queries, and data analytics, namely, computer software that provides real-time, integrated business management intelligence by combining information from various databases, for database management, and for software for the integration of artificial intelligence and machine learning in the field of Big Data; downloadable cloud computer software for data integration, data warehousing, data mining, and data visualization; downloadable cloud computer software for data processing, data sharing, data collection, data interpretation, data queries, and data analytics, namely, computer software that provides real-time, integrated business management intelligence by combining information from various databases, for database management, and for software for the integration of artificial intelligence and machine learning in the field of Big Data; downloadable computer software for application database integration; desktop and mobile computing and operating platforms consisting of data transceivers, wireless networks and gateways, for collection, analysis, sharing, interpretation and management of data, namely, database management software. (1) Data mining; software as a service (SAAS) services, namely, hosting software for use by others for big data processing; software as a service (SAAS) services, namely, web hosting of computer software applications of others for data integration, data warehousing, data mining, data processing, data sharing, data collection, data interpretation, data queries, data visualization, and data analytics; providing temporary use of non-downloadable analytics software for data wrangling, data mining and data visualization; providing temporary use of non-downloadable analytics software for data importing, data processing, data sharing, data collection, data interpretation, and data queries, namely, for providing real-time integrated business management intelligence by combining information from various databases, for database management, and for software for the integration of artificial intelligence and machine learning in the field of Big Data; custom design and development of computer software; software as a service (SAAS) services featuring software for big data processing and analytics; development and creation of computer programs for data processing and analysis; software as a service (SAAS) services featuring software for data storage, namely, providing cloud storage facilities for use as a data center for others, data computation, data analysis, and data processing, namely, database management; computer services, namely, providing search platforms to allow users to index, integrate, warehouse, mine, process, share, collect, interpret, research, query, visualize, and analyze data; platform as a services (PAAS) featuring computer software platforms for use in data integration, warehousing, mining, and visualization; platform as a services (PAAS) featuring computer software platforms for use in data management, interpretation, processing, sharing, collecting, research, queries, and analysis, namely, for providing real-time integrated business management intelligence by combining information from various databases, for database management, and for software for the integration of artificial intelligence and machine learning in the field of Big Data; providing temporary use of on-line non-downloadable cloud computing software for big data processing; providing temporary use of on-line non-downloadable cloud computing software for data integration, data warehousing, data mining, and data visualization; providing temporary use of on-line non-downloadable cloud computing software for data processing, data sharing, data collection, data interpretation, data queries, and data analytics, namely, for providing real-time integrated business management intelligence by combining information from various databases, for database management, and for software for the integration of artificial intelligence and machine learning in the field of Big Data; application service provider, namely, hosting, managing, developing and maintaining applications, and software of others in the fields of data importing, data storage, data management, data queries, data processing, data interpretation, data analytics, and data visualization.

1 2 Next Page