If as a .Net developer you’re serious about making sure your code is properly tested, one of the biggest problem areas has always been around database code. Even with the more widespread adoption of Object Relational Mapping (ORM) frameworks that abstract some of the complexity of database access, unit testing code that accesses the database is still difficult.
Over the years there have been various strategies to unit test database code that developers have come up with, and at various times I’ve seen projects that use all of them. I’ve also seen examples of large projects where you could see several of these techniques used in different parts of the code base.
The simplest technique – which isn’t really a technique as such – is just to test with the real database. Often this will be a special instance of the database created by the test run into which test data is loaded. The biggest argument against this idea is this isn’t really a unit test and should more correctly be considered an integration test. The biggest problem with this technique is using the real database is pretty slow, and that often leads to compromises to allow the test suite to be run in a reasonable time frame, either with reducing the number of tests, or not starting each test with a clean database configuration. Reducing the tests increases the risk that important conditions may not be properly tested, whilst not cleaning the database can lead to unexpected interactions between different tests. However in situations where you have complex logic in stored procedures in the database, sometimes this is the only way you can test them.
If you are practising Test Driven Development, where you are running unit tests repeatedly, having a unit test suite that takes even just minutes to run is a real problem.
A step on from using the real database is to use an alternative that is faster than your real database, for example an in memory database. This idea has come to more prominence recently as Microsoft has added an in memory database provider to their latest version of their current ORM, Entity Framework Core. However there have been third-party in memory options around for a while such as Effort. In both the official offering and third-party options they are drop in providers that can use the same entity framework code, but just go to an in memory provider instead. Purists will argue that even using an in memory provider this is still really an integration test rather than a unit test, you are merely replacing the dependent database rather than removing it. However to a software developer it can be an attractive option compared to the effort required in stubbing, mocking or faking a full ADO.Net provider. The other criticism of this technique is that because this is a different type of database being used from the live system there is the risk of behavioural differences between that and the real database. Having said that since Microsoft are highlighting testing as a benefit of their new in memory provider hopefully those will be few and far between.
Moving on from using an in memory database, the next option, at least until Entity Framework version 6 came along was to build a fake context object that could be used for testing. I’m not going to go into a detailed explanation of how to do this, as there are a number of good tutorials around, including this one from a Microsoft employee. The basic idea is that you construct a complete fake context object that gets injected into the code being tested instead of the real database context. Although you generally only need to construct the fake database context once, it is comparatively a lot of code, so it is pretty obvious why developers are delighted at the in memory provider included in Entity Framework Core. If you’re not needing to use the full abilities of the context, you do have the option of only partially implementing the fake context. The main criticism of using fakes is that again you’re running the risk of behavioural differences. This time it is because you’re using a different type of context, in particular under the covers you’re using the Microsoft LINQ to Objects classes to talk to the fake object, whereas the real database code will be using LINQ to Entities classes. Put simply whilst the syntax will be the same, you’re not exercising the actual database access code you will be using in the live system. You’re relying on Microsoft LINQ to Objects and Microsoft LINQ to Entities behaving in a similar fashion.
With the arrival of Entity Framework 6, there were some changes made that made it a lot easier to use a mocking framework instead of fake objects. Microsoft have a good guide to testing using a Mocking Framework in their Entity Framework documentation, alongside a revised guide to using a fake object as a test double. The amount of code to fully mock a context is similar to a faked context, but again if you only need to use part of the functionality of the context in your tests you only need mock the parts of the context you need. As with any mocked object it’s important that your mock behaviour is the same as the real object you’re trying to simulate for the tests, and this can be pretty complex with an object like a database context. Particularly problematic areas are around the behaviour of the SaveChanges functionality, where some fairly subtle bugs can creep in with code that passes a test but doesn’t work in production if for example you test by just expecting the SaveChanges method to be called.
That takes us on to a collection of other techniques that are more about isolating the database access code to make it easier to test.
The long standing way to do this is based around the Repository and Unit of Work patterns. There are a variety of ways you can implement these, for example you don’t necessarily need the Unit of Work and could just use the Repository pattern alone. There is a good Microsoft tutorial on the pattern using Entity Framework 5. The basic idea with the repository is to wrap the database code in the repository, and then mock the repository for subsequent tests. The database code in the repository just consists of simple create, read, update and delete (CRUD) functions. Whilst this was a common pattern before Entity Framework, and persisted with early versions of Entity Framework that are difficult to mock or fake, it has largely gone out of fashion. This is not least because the Entity Framework DbSet is an implementation of the same repository pattern so it is totally unnecessary to create an additional implementation of the Repository pattern for mocking now you can just mock or fake DbSet itself.
The other method that has been used for a long while is a traditional data access layer. The actual database code is hidden abstracted behind a series of method calls that take the parameters and return the data which can be easily mocked. Rather than being generic, the code inside each of those methods is for particular queries, and whilst that will be fairly simple database code that can be easily tested, there will be a single function for each query. There are good ways and bad ways of doing this, for example I have seen projects with vast library classes containing all of the queries used by the business logic – a bit of a maintenance nightmare at times. Probably a better design and more in keeping with SOLID principles is to have smaller classes more closely related to how the queries are being used. Either way there is a big overhead with lots of query functions together in a big data access layer.
Data access layers again have started to go out of fashion, however some of the principles behind them can still be applied. The single responsibility principle part of SOLID can be interpreted as suggesting that even if you don’t have a formal data access layer, you shouldn’t be putting database access code in the same method as business logic. The business logic should be working taking and receiving generic collections, rather than retrieving data and working directly on DbSets all in one method. You really shouldn’t have one method that queries data, manipulates it and writes it back. That application of the single responsibility then gives the separation of concerns that can make your code easier to test. The business logic can be tested using simple unit tests rather than having to write complicated unit tests that prime an in memory database or mock, call a function and then examine database contents to see what has happened. The database access methods are again a lot simpler, often just retrieving data, and can easily be supported by a simple mock of the part of the database context being used – a full blown in memory database, or fake or mock context isn’t needed.
In conclusion unit testing code that is accessing a database has always been difficult, and whilst some of the problems have been addressed over the years, it is still not easy. However if you are following good design practices such as DRY and SOLID the occasions when the only way to test something is via a database context should be fairly minimal. If you are finding that you are needing to do that it is well worth looking again at whether you have inadvertently violated the single responsibility principle. Even though the advent of the in memory database makes database context based testing easier, that doesn’t mean you should be using it everywhere. A simple unit test of a loosely coupled method will always be faster than testing a more complex method even using an in memory database. It is well worth considering whether your design would be improved by not coupling your business logic directly to you database access code.
Also published on Medium.