Tag Archives: Software Engineering

How Do I Unit Test Database Access?

If as a .Net developer you’re serious about making sure your code is properly tested, one of the biggest problem areas has always been around database code. Even with the more widespread adoption of Object Relational Mapping (ORM) frameworks that abstract some of the complexity of database access, unit testing code that accesses the database is still difficult.

Over the years there have been various strategies to unit test database code that developers have come up with, and at various times I’ve seen projects that use all of them. I’ve also seen examples of large projects where you could see several of these techniques used in different parts of the code base.

The simplest technique – which isn’t really a technique as such – is just to test with the real database. Often this will be a special instance of the database created by the test run into which test data is loaded. The biggest argument against this idea is this isn’t really a unit test and should more correctly be considered an integration test. The biggest problem with this technique is using the real database is pretty slow, and that often leads to compromises to allow the test suite to be run in a reasonable time frame, either with reducing the number of tests, or not starting each test with a clean database configuration. Reducing the tests increases the risk that important conditions may not be properly tested, whilst not cleaning the database can lead to unexpected interactions between different tests. However in situations where you have complex logic in stored procedures in the database, sometimes this is the only way you can test them.

If you are practising Test Driven Development, where you are running unit tests repeatedly, having a unit test suite that takes even just minutes to run is a real problem.

A step on from using the real database is to use an alternative that is faster than your real database, for example an in memory database. This idea has come to more prominence recently as Microsoft has added an in memory database provider to their latest version of their current ORM, Entity Framework Core. However there have been third-party in memory options around for a while such as Effort. In both the official offering and third-party options they are drop in providers that can use the same entity framework code, but just go to an in memory provider instead. Purists will argue that even using an in memory provider this is still really an integration test rather than a unit test, you are merely replacing the dependent database rather than removing it. However to a software developer it can be an attractive option compared to the effort required in stubbing, mocking or faking a full ADO.Net provider. The other criticism of this technique is that because this is a different type of database being used from the live system there is the risk of behavioural differences between that and the real database. Having said that since Microsoft are highlighting testing as a benefit of their new in memory provider hopefully those will be few and far between.

Moving on from using an in memory database, the next option, at least until Entity Framework version 6 came along was to build a fake context object that could be used for testing. I’m not going to go into a detailed explanation of how to do this, as there are a number of good tutorials around, including this one from a Microsoft employee. The basic idea is that you construct a complete fake context object that gets injected into the code being tested instead of the real database context. Although you generally only need to construct the fake database context once, it is comparatively a lot of code, so it is pretty obvious why developers are delighted at the in memory provider included in Entity Framework Core. If you’re not needing to use the full abilities of the context, you do have the option of only partially implementing the fake context. The main criticism of using fakes is that again you’re running the risk of behavioural differences. This time it is because you’re using a different type of context, in particular under the covers you’re using the Microsoft LINQ to Objects classes to talk to the fake object, whereas the real database code will be using LINQ to Entities classes. Put simply whilst the syntax will be the same, you’re not exercising the actual database access code you will be using in the live system. You’re relying on Microsoft LINQ to Objects and Microsoft LINQ to Entities behaving in a similar fashion.

With the arrival of Entity Framework 6, there were some changes made that made it a lot easier to use a mocking framework instead of fake objects. Microsoft have a good guide to testing using a Mocking Framework in their Entity Framework documentation, alongside a revised guide to using a fake object as a test double. The amount of code to fully mock a context is similar to a faked context, but again if you only need to use part of the functionality of the context in your tests you only need mock the parts of the context you need. As with any mocked object it’s important that your mock behaviour is the same as the real object you’re trying to simulate for the tests, and this can be pretty complex with an object like a database context. Particularly problematic areas are around the behaviour of the SaveChanges functionality, where some fairly subtle bugs can creep in with code that passes a test but doesn’t work in production if for example you test by just expecting the SaveChanges method to be called.

That takes us on to a collection of other techniques that are more about isolating the database access code to make it easier to test.

The long standing way to do this is based around the Repository and Unit of Work patterns. There are a variety of ways you can implement these, for example you don’t necessarily need the Unit of Work and could just use the Repository pattern alone. There is a good Microsoft tutorial on the pattern using Entity Framework 5. The basic idea with the repository is to wrap the database code in the repository, and then mock the repository for subsequent tests. The database code in the repository just consists of simple create, read, update and delete (CRUD) functions. Whilst this was a common pattern before Entity Framework, and persisted with early versions of Entity Framework that are difficult to mock or fake, it has largely gone out of fashion. This is not least because the Entity Framework DbSet is an implementation of the same repository pattern so it is totally unnecessary to create an additional implementation of the Repository pattern for mocking now you can just mock or fake DbSet itself.

The other method that has been used for a long while is a traditional data access layer. The actual database code is hidden abstracted behind a series of method calls that take the parameters and return the data which can be easily mocked. Rather than being generic, the code inside each of those methods is for particular queries, and whilst that will be fairly simple database code that can be easily tested, there will be a single function for each query. There are good ways and bad ways of doing this, for example I have seen projects with vast library classes containing all of the queries used by the business logic – a bit of a maintenance nightmare at times. Probably a better design and more in keeping with SOLID principles is to have smaller classes more closely related to how the queries are being used. Either way there is a big overhead with lots of query functions together in a big data access layer.

Data access layers again have started to go out of fashion, however some of the principles behind them can still be applied. The single responsibility principle part of SOLID can be interpreted as suggesting that even if you don’t have a formal data access layer, you shouldn’t be putting database access code in the same method as business logic. The business logic should be working taking and receiving generic collections, rather than retrieving data and working directly on DbSets all in one method. You really shouldn’t have one method that queries data, manipulates it and writes it back. That application of the single responsibility then gives the separation of concerns that can make your code easier to test. The business logic can be tested using simple unit tests rather than having to write complicated unit tests that prime an in memory database or mock, call a function and then examine database contents to see what has happened. The database access methods are again a lot simpler, often just retrieving data, and can easily be supported by a simple mock of the part of the database context being used – a full blown in memory database, or fake or mock context isn’t needed.

In conclusion unit testing code that is accessing a database has always been difficult, and whilst some of the problems have been addressed over the years, it is still not easy. However if you are following good design practices such as DRY and SOLID the occasions when the only way to test something is via a database context should be fairly minimal. If you are finding that you are needing to do that it is well worth looking again at whether you have inadvertently violated the single responsibility principle. Even though the advent of the in memory database makes database context based testing easier, that doesn’t mean you should be using it everywhere. A simple unit test of a loosely coupled method will always be faster than testing a more complex method even using an in memory database. It is well worth considering whether your design would be improved by not coupling your business logic directly to you database access code.

How I Got Started in Software Development

Craig tagged me for this ages ago, so I guess I’d probably best put in my answers!

How old were you when you started programming?
I think I probably started aged around ten or eleven, with the arrival of a Sinclair Spectrum at home and a BBC Model B at school.

How did you get started in programming?
I’m not really sure of this one, but I guess it was just the interest in how the respective computers worked, and the challenge of getting them to do things.

What was your first language?
Definitely Spectrum Basic. Until I was actually doing a proper Computer Science course at school – something that seems to have fallen by the wayside for generic ICT classes now – most of the software development was at home, thanks in part to the listings that the computer magazines produced. Indeed back then there was even an entire magazine devoted to listings, Sincliar Programs, which I used to read and copy in the listings from.

What was the first real program that you wrote?
I’m not really sure, probably the old classic “Hello World” in Spectrum Basic. In terms of a real serious project, that was probably the programming project for my A-Level in Computer Science.

What languages have you used since you started programming?
Basic – Spectrum/BBC/VB6/VB.Net, C, C++, Modula-2, 68000 assembler, Pop-11, Occam, COBOL, SQLWindows, Java, C#.

What Was Your First Programming Gig?
This was a summer job as a result of a work experience week, working for the British Holstein Friesian Cattle Society in Rickmansworth. Their business was keeping pedigree records for several different breeds of cattle and sheep, and on request producing the lineage of an animal, usually when the animal was being bought or sold. I spent the summer there during a key period from the point of view of their computers as they were moving from an old system written in COBOL running on an ageing ICL ME-29, over to a system written in C on a UNIX box. The COBOL was probably the experience that left the biggest impression, as in COBOL the indentation of the lines of code is important, and the editor only went forwards through the file, and only allowed you to edit the line at the bottom of the screen!

If you knew then what you know now, would you have started programming?
Definitely yes, not least because if I hadn’t have gone into programming professionally I would have probably become one of the worryingly large group of amateur programmers I’ve come across hacking together key business applications, all of whom would much rather be doing a programming job than the one they ended up in.

If there is one thing you learned along the way that you would tell new developers what would that be?
One of the main tips I’d give is to give yourself a broad and general base – having done a couple of crash courses in VB doesn’t make you a programmer. Most of the best ones I’ve come across over the years have been flexible and adaptable, and usuall spent several years doing a broad based course – generally a degree of some sort – where they have been given the broad basis that allows them to cross-train quickly as technologies change. They’ve also got the broad IT knowledge to understand what else is going on around their job – you might not ever program professionally in assembly, or do any deep level AI research, but it is surprising how much of that sort of stuff is useful in ‘normal’ programming. Don’t get me wrong, I’ve met quite a few people who have learnt programming on-the-job and can cope, but equally I’ve met a lot of others who aren’t adaptable and end up stuck. For example one contractor we’ve had at work is self-taught and writes great ASP code – unfortunately he was hired to write ASP.Net, and he really doesn’t get the differences. With our entire suite of software being rewritten in a multi-tier C# design, using an object-oriented design – because he’d been taught how to write ASP pages in VB, rather than being taught software engineering he was effectively unable to do what we needed.

Along those lines, I’d also advise anybody to keep your skills current, and be especially wary of companies that don’t keep up to date, or program in something that isn’t mainstream. The biggest problem to get into is ending up in a very small niche programming market as I did with SQLWindows. Luckily I found a new employer who was happy to cross-train me into VB. You’re never going to totally escape from maintaining the creaky old VB6 systems, but make sure you’re in a company where you get a balance.

What’s the most fun you’ve ever had… programming?
I’ve had times when I’ve enjoyed programming professionally, but still one of the most fun bits of programming I’ve had was on the LPMud we had running at university. The game was all running in a variation of C, and one you reached wizard level you got to add to the game by writing code – indeed the wizards got programming level access to the guts of the game so you could actually manipulate the game environment on the fly. Programming for that there wasn’t really any pressure, and in terms of the game there weren’t really any limitations on what you could produce. Professional programming, the majority of the time you are working to a spec, and for a customer, and you don’t get nearly so much freedom in what you do.