4

Part6: Using sharding to build multi-tenant apps using ASP.NET Core and EF Core

 2 years ago
source link: https://www.thereformedprogrammer.net/part6-using-sharding-to-build-multi-tenant-apps-using-asp-net-core-and-ef-core/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Part6: Using sharding to build multi-tenant apps using ASP.NET Core and EF Core

This article describes how to use EF Core and ASP.NET Core to create a multi-tenant application where each different groups (known as tenants) has its own database – this is known as sharding. A second section describes how to build a hybrid a multi-tenant design which supports multiple tenants in one database and tenants that has its own database, i.e. sharding. 

NOTE: Multi-tenant applications are also referred to as SaaS (Software as a Service). In these articles I use the term multi-tenant as SaaS can also cover having one application + database for each company / tenant.

This article is part of the series that covers .NET multi-tenant applications in general. Also the designs shown in this article comes from the library called AuthPermissions.AspNetCorelibrary (shortened to AuthP in these articles) which provide pre-built (and tested) code to help you build multi-tenant apps using ASP.NET Core and EF Core. The other articles in “Building ASP.NET Core and EF Core multi-tenant apps” series are:

TL;DR; – Summary of this article

  • Multi-tenant applications provide a service to many tenants. Each tenant has their own set of data that is private to them.
  • Sharding is the name given to multi-tenant applications where each different tenant has its own database. The other approach is to put all the tenants in one database where each tenant has a unique key which makes sure only data marked with the tenant’s key is returned (known as shared-database).
  • The good parts of the sharding approach are its faster and data is more secure than the shared-database approach, but sharding comes with the price of having many databases.
  • There is a third, hybrid approach which supports both shared-database and sharding at the same time – this allows you to manage the cost / performance by putting a group of tenants with small data / usage into one database while tenants with high data / usage can have their own database.
  • I detail 7 steps to create a sharding multi-tenant application using EF Core and ASP.NET Core. The steps are a mixture of ASP.NET Core code and EF Core’s code.
  • Then I detail 8 steps (3 of which the same as the sharding approach) that implements the hybrid approach. This implementation has been added in version 3 of the AuthP library and there is an example called Example6.SingleLevelSharding which you can run.

Setting the scene – what is sharding and why it is useful?

Wikipedia says that database sharding “A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load”. I emphasized the last sentence because that’s the key part – a multi-tenant / SaaS application will have a database for each separate tenant. The alternative to using sharding is to store all the data in one database and the tenant’s data are differentiated by a unique key for each tenant – I will refer to as shared-database, while sharding is a dedicated-database approach or sharding, which is shorter.

There are a number of pros / cons to each approach, but the biggest is the cost verses performance issue. A sharding approach should be quicker than the shared-database approach, but sharding’s performance comes from having lots of databases, which costs more money. The other pro for the sharding approach is that each tenant’s data is more isolated from each other, as each tenant has its own database.

NOTE: This Microsoft document describes some other differences between sharding and shared-database, plus a comparison of three ways to provide a service to many tenants.

There is a third, hybrid approach that allows you to balance the cost /performance. This design uses sharding for tenants that has a lot of data / demand, while tenants with less data / demand go the shared-database approach. The benefit of this approach is you can offer smaller tenants a lower price by putting them in shared database, while tenants that have higher demands will pay for a dedicated database.

Some years ago, I was asked to design and build a multi-tenant application with thousands of tenants, with tenants ranging from a less than a hundred users to a few large tenants with thousands of users. My client wanted a hybrid approach to cover this wide range of tenant types, which is why I have added both sharding and the hybrid approach to my AuthP library.

Here is a diagram to show all three approaches with a summary of their pros and cons.

Finally, I should add that Azure has a useful feature called SQL Server Elastic Pools which can help with the cost / performance by providing an overall level of database performance which is shared across all the databases in the pool. I will talk more about that in the next article.

How I implemented sharding in version 3 of my AuthP library

In one of my tweets about building multi-tenant applications a number of people said they used sharding. AuthP version 2 only supports the shared-database approach, but this feedback made me make the focus of version 3 release of the library to implementing sharding for multi-tenant application.

In addition, the AuthP sharding feature is designed to support the hybrid approach as well, which means you can use the shared-database approach and / or dedicated-database (sharding) approach. As I have already explained, this allows you to balance the cost / performance for each tenant if you want to.

I have split the description of the EF Core / ASP.NET Core code into two parts:

  1. Implement a sharding-only multi-tenant application.
  2. Implement a hybrid multi-tenant application.

1. Implement a sharding-only multi-tenant application

The figure below shows what the sharding-only design would look like, with a database containing information about the users and tenants (top left) and a database for each tenant (bottom).

Here are the steps to implement sharding for a multi-tenant application:

  1. Decide on where the database connection strings should be stored
  2. Hold information of the tenant and its users in admin database
  3. When a user logs in, then add a ConnectionName claim to their claims
  4. Provide a service to convert the user’s ConnectionName claim to a database connection
  5. Provide the connection string to the application’s DbContext
  6. Use EF Core’s SetConnectionString method to set the database connection
  7. Migrate the tenant database if not used before

1. Decide on how the database connection strings are managed

You need a list of connection strings for each tenant database and the most obvious place to hold connection strings is in the ASP.NET Core appsettings.json file. Another reason to using the appsettings file is security: connection strings can contain sensitive data such as username and password by ASP.NET Core / Azure offers lots of ways to hide / replace the connection strings.

The second decision is how we obtain to the connection string when a tenant user accesses the application? The best performance approach is to add a claim with the connection string when a tenant user logs in, but if you are using JWT Bearer Token that would expose the connection string with its username / password sensitive data.

In the end I used the names of the connection strings from the “ConnectionString” section in the appsetting file – see below:

{
"ConnectionStrings": {
"DefaultConnection": "Server=… admin database connection string.",
"Shard1": "Server=… tenant database connection string.",
"Shard2": "Server=… tenant database connection string.",
…etc.
},
//… other parts left out
}

By referring to the database by the connection string name, e.g. “DefaultConnection”, “Shard1”,  the sensitive data is hidden and the admin user has a simple / obvious name to assign to a tenant.

NOTE: In the next article I will describe how you can add extra connection strings to the appsetting file while the application is running.

2. Hold information of the tenant and its users in admin database

The way a multi-tenant application is there are tenants, with many users linked to each tenant. Typically, the tenant will have a unique key, often a primary key provided by the admin database, a name, e.g. “Company XYZ”, and in this case it would contain the name of the connection string.

ASP.NET Core handles the authentication of a user and provides a unique id, often a string, for each user. You need to add extra data to link a user’s id to a tenant – one simple way would add a collection of users’ id using a one-to-many relationship.

NOTE: The AuthP library has built-in AuthUser and Tenant classes, with admin code to manage these and link to the ASP.NET Core authentication handler. This the AuthP documentation page called “Multi-tenant explained” for how that works.

3. When a user logs in, then add a ConnectionName claim to their claims

When a user logs in you need to detect if they are linked to a tenant, then you need to add a claim containing the connection string name. This requires you to intercept the login process and use the user’s id to obtain the connection string name held in the tenant admin class.

Intercepting the login process depends on the ASP.NET Core authentication handler you are using – the article called “Advanced techniques around ASP.NET Core Users and their claims” provides information on the main authentication handlers.

NOTE: The AuthP library automatically adds the ConnectionName claim if sharding is turned on.

4. Provide a service to convert the user’s ConnectionName claim to a database connection

Having decided to use the connection string name in the claim, you need a way to access the “ConnectionStrings” object in the appsetting file. At the same time, I want to be able to add new connection strings while the application is running. This means can’t use the normal IOption<T> options, but I have to use the IOptionsSnapshot<T> option which reads current data in the appsetting file.

I built a service called ShardingConnections which uses IOptionsSnapshot<T> to get the latest “ConnectionStrings”. The code below shows the specific parts of this service to get the connection string from a connection name (thanks to ferarias answer to this stack overflow question). This service should be set up as a scoped service.

public class ConnectionStringsOption : Dictionary<string, string> { }
public class ShardingConnections : IShardingConnections
{
private readonly ConnectionStringsOption _connectionDict;
public ShardingConnections(
IOptionsSnapshot<ConnectionStringsOption> optionsAccessor)
{
_connectionDict = optionsAccessor.Value;
}
public string GetNamedConnectionString(string connectionName)
{
return _connectionDict.ContainsKey(connectionName)
? _connectionDict[connectionName]
: null;
}
//Other methods not shown
}

NOTE: The full service contains extra methods useful for the admin when assigning a connection name to a tenant.

5. Provide the connection string to the application’s DbContext

You need to inject a service into the tenant application’s DbContext which contains the connection string for the current user. To do that we need two parts:

  • Get the ConnectionName claim from the current user
  • Use the ShardingConnections service to get the connection string

The code below shows a Scoped service that uses the IHttpContextAccessor to get the logged-in user (if present) with its claims. From this it can obtain the ConnectionName claim and passes the connextion string name to the ShardingConnections’s GetNamedConnectionString method. This returns the connection string that the DbContext.

public class GetShardingData : IGetShardingDataFromUser
{
public GetShardingDataUserNormal(IHttpContextAccessor accessor,
IShardingConnections connectionService)
{
var connectionStringName = accessor.HttpContext?
.User?.Claims.SingleOrDefault(x =>
x.Type == PermissionConstants.ConnectionNameType)?.Value
if (connectionStringName != null)
ConnectionString = connectionService
.GetNamedConnectionString(connectionStringName);
}
public string ConnectionString { get; }
}

6. Use EF Core’s SetConnectionString method to set the database connection

The tenant application’s DbContext needs to link to the database that has been provided by the GetShardingData service in the last section. The  EF Core’s SetConnectionString method (added in EF Core 5) sets the connection string to be used for this instance of the DbContext. The code below shows the constructor on the DbContext than handles the tanant’s data.

public class ShardingSingleDbContext : DbContext
{
public ShardingSingleDbContext(
DbContextOptions<ShardingSingleDbContext> options,
IGetShardingDataFromUser shardingData)
: base(options)
{
Database.SetConnectionString
(shardingData.ConnectionString);
}
//… other parts left out
}

NOTE: The EF Core team says using SetConnectionString doesn’t have much of an overhead so sharding shouldn’t be slowed down by changing databases. You may also be interested how you would use DbContext pooling when building multi-tenant applications.

7. Migrate a database if not used before

At some point you need to migrate a database that hasn’t been used before. In the AuthP library the creation of a new tenant causes a call a method in a service written by the developer that follows the ITenantChangeService interface. When working with sharding I added the following code to check the database exists and migrate if it has no tables in it. It returns an error string if the database isn’t found, or null if it finished successfully.

NOTE: This stack overflow question has lots of useful ways to detect if a database exist and so on.

private static async Task<string> CheckDatabaseExistsAndMigrateIfNew(
ShardingSingleDbContext context, Tenant tenant,
bool migrateEvenIfNoDb)
{
if (!await context.Database.CanConnectAsync())
{
//The database doesn't exist
if (migrateEvenIfNoDb)
await context.Database.MigrateAsync();
else
{
return $"The database defined by the connection string"+
"'{tenant.ConnectionName}' doesn't exist.";
}
}
else if (!await context.Database
.GetService<IRelationalDatabaseCreator>()
.HasTablesAsync())
//The database exists but needs migrating
await context.Database.MigrateAsync();
return null;
}

The migrateEvenIfNoDb parameter is there because EF Core’s Migrate can create a database if you have the authority, e.g. when in development mode and are using a local SQL Server. But if you don’t have authority, e.g. when in production and using Azure SQL Server, then the code must return an error if there isn’t an database.

That’s the end of the code needed to implement the sharding-only approach to multi-tenant applications. The next section shows the extra steps to support the hybrid approach that allows you to use shared-database approach and / or dedicated-database (sharding) approach at the same time.

2. Implement a hybrid multi-tenant application.

The figure below shows what the hybrid design for multi-tenants where each database can either have many tenants in one database (see left and right databases) or only one tenant in a database (see the middle two databases).

HybridMultiTenantApproach.png

NOTE: You don’t have to use both shared-database approach and dedicated-database (sharding). You can use just use sharding if you want.

There is a runnable example of a hybrid multi-tenant in the AuthP repo. Simply clone the repo and set the Example6.MvcWebApp.Sharding project as the startup project. The example assumes you have a locadb SQL Server and seeds the DefaultConnection database with three, non-sharding tenants. The home page shows you how you can move one of the tenants to another database and make it a sharding tenant.

Here is a screenshot just after a non-sharding tenant called “Pets Ltd.” has been moved into its own database and the tenant is now using sharding.

MoveTenantToOwnDatabase-1024x570.png

Here are the steps for the hybrid approach, with changes from the sharding approach shown in bold.

  1. Decide on where the database connection strings should be stored
  2. Hold extra information of the tenant and its users in admin database
  3. When a user logs in, then add a ConnectionName and DataKey claim to their claims
  4. Provide a service to convert the user’s ConnectionName claim to a database connection
  5. Provide the connection string and DataKey to the application’s DbContext
  6. Use EF Core’s SetConnectionString method to set the database connection and DataKey
    1. “Turn off” the query filter
    2. Stop setting the DataKey on entites
  7. Migrate the tenant database if not used before
  8. Extra features available in a hybrid design.

1 Decide on where the database connection strings should be stored

Same as sharding-only approach – see this section.

2. Hold extra information of the tenant and its users in admin database

The hybrid approach needs an additional way to handle and shared-database tenants, as they need some form of filter when accessing one tenant out of all the tenants in that database. In the AuthP library the Tenant creates a unique string for each tenant called the DataKey. This DataKey in injected to the tenant application’s DbContext and used in EF Core’s global query filter to only return the data linked to the Tenant. This is explained in the in the article called “Building ASP.NET Core and EF Core multi-tenant apps – Part1: the database”.

In addition, the AuthP Tenant contains a HasOwnDb boolean property, which is true if the tenant is using sharding. This HasOwnDb property is used in a few ways, for instance to remove the query filter on sharding tenants and to return an error if someone tries to add another tenant into a database that has already go a sharding tenant in it – see section 2fi later for more on that.

3. When a user logs in, then add a ConnectionName and DataKey claim to their claims

The hybrid approach needs both the ConnectionName claim and DataKey claim to handle the two types of database arrangement: the ConnectionName is used by every tenant to get the correct database and the DataKey is needed for the shared-database tenants.

However, you have one tenant DbContext to handle both shared-database tenants and sharding tenants and you can’t change the query filter. This means you would be running a query filter on a sharding tenant which doesn’t need it – see section 2e for how I “turn off” the query filter on sharding tenants.

4. Provide a service to convert the user’s ConnectionName claim to a database connection

Same as sharding-only approach – see this section.

5. Provide the connection string and DataKey to the application’s DbContext

For the hybrid approach we need the connection string and DataKey property is added to the Scoped service that uses the IHttpContextAccessor to get the logged-in user (if present) with its claims. The updated service (see GetShardingDataUserNormal class) provides both the ConnectionString and the DataKey to the tenant application’s DbContext.

Obtaining the DataKey is much easier than the connection string because the DataKey was already calculated when the claim was added, so it just about copying the DataKey claim’s Value into a DataKey property in the service.

6. Use EF Core’s SetConnectionString method to set the database connection and DataKey

In a hybrid approach a tenant can be using the shared-database approach or the sharding approach. Therefore, you have to add extra code to every tenant (including sharding) to handle the shared-database DataKey. This extra code includes adding a DataKey property / column to a tenant data classes and the tenant DbContext must have a global query filter configuring on all of the tenant data classes – this is covered in detail in the Part1 article in sections 6 and 7.   

However, we don’t want a sharding tenant to take a performance hit because of the (unnecessary) global query filter, so how do we handle that? The solution is to use the Tenant’s HasOwnDb property to alter the DataKey.

In the AuthP library if the Tenant’s HasOwnDb property is true (and the tenant type is single-level), then the GetTenantDataKey method doesn’t return the normal DataKey, but returns the “NoQueryFilter” string. This allows two things to happen:

6.1. The query filter is “turned off”

The AuthP contains an extension method called SetupSingleTenantShardingQueryFilter which adds a global query filter with a query that can be forced to true if the special DataKey string of “NoQueryFilter” – the code below shows what manual setup of what the code does (NOTE that the recommended automatic approach uses EF Core’s  metadata methods).

modelBuilder.Entity<Invoice>().HasQueryFilter(
x => DataKey == "NoQueryFilter" ||
x.DataKey == DataKey);
modelBuilder.Entity<Invoice>().HasIndex(x => x.DataKey);
modelBuilder.Entity<Invoice>().Property(x => DataKey).IsUnicode(false);
modelBuilder.Entity<Invoice>().Property(x =>
DataKey).HasMaxLength("NoQueryFilter".Length);

The important line is line 2, where the DataKey from the claim is compared with the “NoQueryFilter” string. If the 2 part of the query filter is true, then there is no need to filter on a DataKey . The SQL Server’s execution planner will see that the WHERE clause is always true and will remove WHERE clause from the execution.

NOTE: The AuthP library also supports a hierarchical multi-tenant type (see this article about a hierarchical multi-tenant) and in that case you still need a DataKey to access the various levels in the hierarchical data. Therefore, AuthP won’t turn off the query filer for hierarchical multi-tenant even if you give its own database.

6.2. It stops the setting of the DataKey

The other part of using a DataKey is to set the DataKey in any newly created entity. In a hybrid design if the DataKey is “NoQueryFilter”, then it returns immediately, thus removing the compute time to detect and update the entities that needed a DataKey. See the code below for the updated MarkWithDataKeyIfNeeded method.

public static void MarkWithDataKeyIfNeeded(this DbContext context, string accessKey)
{
if (accessKey == MultiTenantExtensions.DataKeyNoQueryFilter)
//Not using query filter so don't take the time to update the
return;
foreach (var entityEntry in context.ChangeTracker.Entries()
.Where(e => e.State == EntityState.Added))
{
var hasDataKey = entityEntry.Entity as IDataKeyFilterReadWrite;
if (hasDataKey != null && hasDataKey.DataKey == null)
hasDataKey.DataKey = accessKey;
}
}

This change doesn’t save much compute time, but still think its worth doing.

7. Migrate the tenant database if not used before

Same as sharding-only approach – see this section.

8. Extra features available in a hybrid design.

To make this work in a real application you need some extra code, for instance how to add new connection string the appsetting file while the application is running. There are also maintenance issues, such as converting a tenant from a shared-database to a dedicated-database tenant (or the other way around) while the application is running.

I will cover these issues and using Azure SQL elastic pools for sharding in a future article.

Conclusion

Multi-tenant applications allow you to serve many tenants from one software source. Having different levels of your service is also a good idea, as it allows your tenants to choose what level of service they want to pay for. And that payment needs to cover your costs for all the cloud service and databases you need to provide your multi-tenant application.

Within the features of your multi-tenant application is its performance, that is the speed (how quickly a query takes) and scalability (Wikipedia defines scalability as the property of a system to handle a growing amount of work by adding resources to the system). The more tenants you have then it takes more work to provide a good performance.

Providing a good performance requires lots of different parts: running multiple instances of the ASP.NET Core applications, upgrading your web server / database server, caching and so on. Sharding is a good approach for handing tenant’s data, but like the other performance improving options it increases the costs.

The client I talked about wanted something working as soon as possible (no surprize there!) so I build a shared-database tenant approach first. But they knew their big tenants would want their own database not just for the performance but for the security. The hybrid handles that and that’s why the AuthP library supports that.

So, if you are building a multi-tenant application for a company you might consider using my open source AuthP library to help you. The library contains all the setup and admin code which there is lot to look through, but that’s because building a multi-tenant application isn’t simple to do. There is a lot of documentation, article and a few videos and if something isn’t clear, then raise an issue and I will try to update the documentation.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK