So lets see what and how Azure Cosmos DB partitions data and first of all what is actually a partition key.
What is a partition Key?
A partition Key is a primary lookup to find a set of rows, i.e a partition.
In Azure the partition key is a property that will exist on every single object that is best used to group similar objects together.
In the above image one can see all the car data is logically partitioned using car colors.
Logical partitions are formed based on the value of a partition key that is associated with each item in a container. All the items in a logical partition have the same partition key value.
Now what is a logical Partition
The data grouped by partition key is not stored under some physical different memory instead all the data which is having same partition key is stored together in the memory.
For example, in a container that contains data about food nutrition, all items contain a
foodGroup property. You can use
foodGroup as the partition key for the container. Groups of items that have specific values for
foodGroup, such as
Baked Products, and
Sausages and Luncheon Meats, form distinct logical partitions. You don’t have to worry about deleting a logical partition when the underlying data is deleted.
Note: Selecting a partition key with a wide range of possible values ensures that the container is able to scale. Also a partition can grow upto 20 GB of individual size.
Now the main point of logical partitions is that internally one or more logical partitions are mapped to a single physical partition. Hence is smaller containers have many physical partitions still they may require a single physical partition.
Physical partitions are an internal implementation of the system and they are entirely managed by Azure Cosmos DB.
Note: Physical partitions are an internal implementation of the system and they are entirely managed by Azure Cosmos DB. When developing your solutions, don’t focus on physical partitions because you can’t control them. Instead, focus on your partition keys. If you choose a partition key that evenly distributes throughput consumption across logical partitions, you will ensure that throughput consumption across physical partitions is balanced.
Benefits of Partition Keys
For large read-heavy containers, one might want to choose a partition key that appears frequently as a filter in your queries. Queries can be efficiently routed to only the relevant physical partitions by including the partition key in the filter predicate.
This means that while creating structure of database the field which is to queried more frequently should be chosen as a partition key because it will reduce the number of cross partition query hence improve overall speed of the query
For example, if you frequently run a query that filters on
UserID, then selecting
UserID as the partition key would reduce the number of cross-partition queries.
However, if your container is small, you probably don’t have enough physical partitions to need to worry about the performance impact of cross-partition queries. Most small containers in Azure Cosmos DB only require one or two physical partitions.
If your container could grow to more than a few physical partitions, then you should make sure you pick a partition key that minimizes cross-partition queries. Your container will require more than a few physical partitions when either of the following are true:
- Your container will have over 30,000 RU’s provisioned
- Your container will store over 100 GB of data