How to test a multi-join mysql query



Hello all,
I have inherited a fairly gnarly multi-join mysql query (mariadb). The results of this query are used to populate invoice data - so it’s essential that it’s accurate. The data that the query references is created in several steps by different users of a website.

I’ve been googling for ways to test the output of the query, as I’m not happy to put it into production without at least some verification.

I have a database snapshot and non-production website deployed locally to work from, so I was wondering if the following would work:

  • follow the standard user workflow to create very specific data in the database (date, time, customer, location etc)
  • test the method which creates the invoice data to see if the invoice matches what I’ve put in

However, this will mean having to redo the manual workflow each time. Can anyone suggest a more repeatable way of doing this?


TBH I don’t know SQL so I don’t really know what I’m talking about, buuut:

Can you grab the SQL statements that are run during the standard user workflow and run those independently to create the test data rather than running it manually through the system? That could make it easier to automate testing?


There are a bunch of different ways to test something like this, depending on what the goal of the testing is.

If the goal of the testing is to determine whether the query itself works in isolation, you should create a set of data unit tests.

Start with the “happy path”. Make a test database with the same schema as the production snapshot, but no data. Add just enough data using SQL INSERT statements run by your testing system (which might just be a SQL script) to show the query working correctly.

Next explore every other way you can think of that the query is expected to perform correctly. Maybe in the first test, you identified a customer by their customer_name, but sometimes someone entering an invoice enters a customer_id instead. Set up data for each of these cases. Start from an empty database each time, so that you have a controlled environment for each test. Use the data in your database snapshot to figure out what all of these cases are. Are there IFs or CASEs in your query? Make sure you test every branch (THEN and ELSE).

Then try to think of each way the query could fail. Maybe you’re trying to join to a table that doesn’t have a matching row? Or has more matching rows than you expect? Is it possible for the person entering data to mix up the customer_name and customer_id fields? For each of these, create another test case and make sure the behavior is right.

The end result of this is a set of tests. Maybe they’re SQL scripts that return known values. Maybe you’re using a testing framework in the language of your choice. The goal here is to have a set of things you can run and if they all pass you can be pretty confident that the query is performing as expected.

If the goal of the testing is to determine if the system works correctly, then yes, you probably do want to follow the usual workflow (and every variation you can think of) to populate the data in the usual ways. You can start out by making test cases as lists of steps: “enter this in this field. submit the form. make sure the invoice says 123” that a human tester can step through. Again, explore every way the test should succeed, and as many ways as you can think of that it might fail.

Once you know what these tests should be you can investigate running through these steps automatically. Does each step have an API call associated with it? Maybe you could write a script using something like Requests to write a script that steps through them. Or if browser-side JS is critical to the flow, you could use a tool that automates a browser, like Selenium.