Testing Databases

github repository : https://github.com/DarrenForsythe/testing-databases

Welcome to the first in hopefully a series of blogs that will discuss some tools on how to test more effectively.

While my examples are in java the core concepts discussed not specific, even some tooling has implementations in other languages. I’ll finish each post with links to tools used and try to include other language implementations.

These blogs will discuss what I’ve seen as poorly, under-tested, or simply not tested integrations in services.

I’ll discuss some examples and how we could improve them to increase our confidence.

1
2
3
public static void main(String[] args){
    System.out.println("Lets go");
}

Mocks are always the answer

How often have you came across pieces of code close to this,

A simple service saves some arbitrary entity.

Note: The UserRepository could be any data access client e.g. JdbcTemplate with a row mapper provided, mongoose, or SQLAlchemy/Django-ORM. The point is all of these are higher level abstractions.

public class UserService {

    private final UserRepository userRepository;

    public UserService(final UserRepository userRepository) {
        this.userRepository = userRepository;
    }

    public void saveUser(String name, String employeeId) {
        var user = new User();
        user.setName(name);
        user.setEmployeeId(employeeId);
        userRepository.save(user);
    }
}

Our user entity is a simple class,

@Entity
public class User {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;
    private String name;
    private String employeeId;
    
    //Getters, Setters, equals, hashcode omitted for clarity.
}

With an accompanying test,

class UserServiceTest {

    private UserService userService;
    private UserRepository userRepository;

    @BeforeEach
    void setUp() {
        userRepository = mock(UserRepository.class);
        userService = new UserService(userRepository);
    }

    @Test
    void userShouldSaveWithoutModificationOfNameOrEmployeeId() {
        userService.saveUser("darren", "1");
        var savedUser = new User();
        savedUser.setName("darren");
        savedUser.setEmployeeId("1");
        verify(userRepository).save(savedUser);
        verifyNoMoreInteractions(userRepository);
    }
}

Looks fine? If we run a coverage tool against the test it will report that the class is 100% tested! Yay, job done.

Wrong!

Let’s take a look at the UserRepository,

public interface UserRepository extends JpaRepository<User, Long> {
}

That’s it, the only thing we needed to do was add some arguments the type the repository will handle, User and what the ID will be a long in this case. There is a ton of complexity that Spring Data handles for us that developers do not either care about, or think about until it goes wrong.

The Spring Data team has tested its JpaRepository implementation that when the save(S entity) method is called an entity it will work its way through Hibernate, and eventually the entity will end up in a database.

Should we care about testing this any more than expected?

Yes we should. Of the top of my head in any semi-complex application,

This is “only” a “simple” save method.

Unless you work on the Spring Data team then I would doubt that a person can know all the potential flows, gotchas, and possible outcomes of the save method, don’t forget that this is abstracting Hibernate too, which in turns requires a datasource, which in turn would require a driver.

I do not trust any code base which only has mocked data access clients, as I am 100% sure any testing will involve the following.

me: I’ve updated the entity, the changelogs, and added some new methods to the UserRepository. How do we test that?

developer: We just deploy it and run through some [Rest API/UI] and verify there’s no errors

Would it not be better to write a better test to increase your confidence that you are using the JpaRepository and other interacting components/tools as expected?

What’s the alternative?

Hopefully it has been made obvious we need some sort of database to connect to and use real objects over mocks for these complex abstractions.

What’s our choices?

In-Memory

I am sure most people are well aware using an in-memory database is a pretty good solution to avoiding mocking and give us our first step to gaining more confidence.

Using some Spring Boot Test black magic, @DataJpaTest, an in-memory database can easily be created with a datasource automatically created for us.

Note: While this does contain Spring Boot black magic the same concept can be applied to any language. Configure an in-memory database, configure a datasource and connect the client to it etc.

@DataJpaTest
class UserServiceH2Test {

    private UserService userService;

    @BeforeEach
    void setUp(@Autowired UserRepository userRepository) {
        userService = new UserService(userRepository);
    }

    @Test
    void userShouldSaveWithoutModificationOfNameOrEmployeeId(
            @Autowired UserRepository userRepository) {
        userService.saveUser("darren", "1");

        assertThat(userRepository.findAll())
                .isNotEmpty()
                .hasSize(1)
                .element(0)
                .extracting(User::getEmployeeId, User::getName, User::getId)
                .containsOnly("1", "darren", 1L);
    }
}

No longer do we need to write a tedious bit of verification code that the object passed to the UserRepository#save method has changed. We have in fact went further and verified it hasn’t changed down to the database layer.

On the flip side we have introduced some Spring Boot annotations and magic which can lead to some interesting exceptions on start-up of the test, but I’ll discuss that in another blog.

Is this now good enough? We are,

No, it’s not. You do not deploy against H2. It does its very best with compatability modes, but it cannot replicate what would be used at runtime e.g. MySQL, DB2, Postgres, MSSQL etc.

Ideally you would not be using Hibernate’s DDL Auto mode to create the table for you, this has its own set of issues. Race conditions for greater than 1 instance, not knowing exactly what types for what database it should use. What next?

Test against what you will use at runtime

If you want to get to the highest levels of confidence you should be testing against what you will be connecting to at runtime.

Two years ago I still setup tests that would enable if they found a local MySql database that could be connected. We even found an issue between our in-memory database, and our deployed applications specifically around MySQL’s JSON Data Type. We didn’t need to wait until runtime and pickup on increased errors rates or manual tests.

That isn’t a great solution, it’s a local stateful database. We need a blank slate each time if we want them to be repeatable, and contained with in the repository to allow a CI pipeline to run them.

Testcontainers

Testcontainers to the rescue. I believe I found Testcontainers sometime mid 2019, and it has changed my life when it comes to testing abstractions or dealing with something that would cause network I/O.

Note: Various implementations exist, it was a java library at first (or rather a github gist ) but now has support with multiple languages with differing levels of support, but the same core idea.

Let’s make our simple application a little more complicated and introduce a changelog that will setup the required schema for the entity to be saved to the table, and assume we are going to use MySql at runtime.

<changeSet id="create-user-table" author="darrenforsythe">
    <createTable tableName="user"> 
        <column name="id" type="bigint" autoIncrement="true">
            <constraints primaryKey="true"/>
        </column>
        <column name="name" type="varchar(255)">
            <constraints nullable="false"/>
        </column>
        <column name="employee_id" type="varchar(255)"/>
    </createTable>
</changeSet>

Let’s breakdown our changelog,

  1. We are creating a user table for our user entity
  2. First column is our id table. The column is auto-incrementing of type BIGINT which maps directly to Java’s Long as both hold 8 Bytes of data.
  3. Next the name column which is not nullable and of type varchar(255)
  4. Finally, the employee_id column of type varchar(255) with nothing fancy about it.

We cannot easily test this against H2 it will run against MySQL. This is also a simple example, which I had to correct three times as I had named the employee_id column named wrong, and the incorrect column types that Hibernate was expecting.

Now let’s update our test to give us even more confidence by using Testcontainers

@DataJpaTest(excludeAutoConfiguration = TestDatabaseAutoConfiguration.class)
@Testcontainers
class UserServiceTestcontainersTest {

    @Container private static final MySQLContainer<?> MY_SQL_CONTAINER = new MySQLContainer<>();

    @DynamicPropertySource
    static void setupMySQLDB(DynamicPropertyRegistry propertyRegistry) {
        propertyRegistry.add("spring.datasource.username", MY_SQL_CONTAINER::getUsername);
        propertyRegistry.add("spring.datasource.password", MY_SQL_CONTAINER::getPassword);
        propertyRegistry.add("spring.datasource.url", MY_SQL_CONTAINER::getJdbcUrl);
    }

    private UserService userService;

    @BeforeEach
    void setUp(@Autowired UserRepository userRepository) {
        userService = new UserService(userRepository);
    }

    @Test
    void userShouldSaveWithoutModificationOfNameOrEmployeeId(
            @Autowired UserRepository userRepository) {
        userService.saveUser("darren", "1");

        assertThat(userRepository.findAll())
                .isNotEmpty()
                .hasSize(1)
                .element(0)
                .extracting(User::getEmployeeId, User::getName)
                .containsOnly("1", "darren");
    }
}

Not much has changed, we’ve disabled the TestDatabaseAutoConfiguration which was creating the H2 in-memory database for us, and added a MySQLContainer and a bit of Spring Framework magic will allow us to set dynamic properties easily.

We have not changed our assertion, and it still passes successfully but now we have increased confidence that our User entity, Liquibase changelog, and MySQL will all work together.

We could also add some more tests to further test our changelogs are acting as we expect.

@Test
void nameCannotBeNull() {
    assertThatThrownBy(() -> userService.saveUser(null, "1"))
            .hasCauseInstanceOf(ConstraintViolationException.class);
}

With 5 lines of code there’s now a guarantee that if the name column ever was updated nullable it would be caught before it ever got to into a development region.

Let’s do something more ridiculous,

@RepeatedTest(5)
void idShouldIncrementCorrectly(
        @Autowired UserRepository userRepository, RepetitionInfo repetitionInfo) {
    userService.saveUser("darren_" + repetitionInfo.getCurrentRepetition(), "1");
    assertThat(
                    userRepository.findById(
                            Integer.toUnsignedLong(repetitionInfo.getCurrentRepetition())))
            .isNotEmpty()
            .get()
            .extracting(User::getName)
            .isEqualTo("darren_" + repetitionInfo.getCurrentRepetition());
}

Using the Junit5 RepeatedTest annotation we repeat the same test 5 times, set the name to be darren_<currentIncrement> and then validate that on each pass we finding a record with the correct name by the current test increment.

Not something you would normally test for, but something that may work differently on different databases or slightly differently is easily tested for.

Conclusion

Hopefully it’s easy to see why you should seriously consider using testcontainers the next time you writing a test for a component in which a component uses a database client please do not mock it. The amount of confidence that the test will give you is superficial here.

Testcontainers is super simple to set up, all it requires is Docker which is ubiquitous across software development. There is no need to assume the functionality of the abstractions you are mocking.

It gets you incredibly close to being able to write complex tests, and re-create bugs, far more simply than if you start mocking everything.

I will show off more complex uses of testcontainers in the future, it is one of my favourite tools and have uncovered many bugs before they ever got into any release.

Working examples of the code in this blog is available at the top!

Thanks for reading.