The Icon Bar: Testing_on_RISC_OS: The Icon Bar: Testing on RISC OS

Introduction

It's my belief that when writing RISC OS software, people don't consider writing code that is testable (ie abstracting it to modules that have discrete testable elements)... which means that their code is mixed into the logic of the harness that it lives within (whether that be command line, RISC OS module, desktop tool or whatever), and thus means that they argue that it's not possible to do testing easily.

The argument that they cannot easily test desktop applications is reasonable - but only to that point. Testing an application in the desktop is a form of system testing, and requires you to actually be able to automate those operations that a user might perform. There are tools for RISC OS, like KeyStroke, which allow system testing, and of course others could be written. But it's not commonly done. Doing system testing, though, is a later state of automated testing, usually preceded by Unit testing and Integration testing.

Similarly, it is a fact that testing modules which execute in SVC mode can be difficult. If something goes wrong, the machine may just die. This is also a system test - the product is being tested as a whole.

Unit testing exercises the smallest units of the code. Integration testing brings those units together to test them in combination. Then system testing puts the whole together and tests the product's functionality. Usually there's system integration testing above that, where you're testing not just the product, but the product's interactions with the systems that it will actually be used in - the desktop, possibly in conjunction with real hardware.

Knowing what these are, how does this help with testing a module or desktop application? Well, all applications consist of interaction points where the user (who in the case of a module, might be another program) does something, and the application or module acts on it. In general these points are wired to the Wimp_Poll loop, or module entry points (SWIs, services, commands). Well structured applications and modules will have the work within those Poll entry and module entry points set up to just call internal functions. Those internal functions do the actual work, so it's easy to see how you might split off the code to test those internal functions.

But not everyone thinks that way - this is one of the traits of 'legacy code'. 'Legacy code'¹ is generally accepted to be any code that doesn't have tests. One of the reasons that it doesn't have tests is because the code is heavily entangled, and it is difficult to see where you can test. If you've got Wimp_Poll code that does a lot of its work inside the poll loop, or in logic that is heavily reliant on its execution within the desktop, then this is the sort of code we're talking about. Similarly, if your SWI handler does a lot of its logic in the middle of the SWI dispatch code then this is the same problem. It's usually not this simple - you might have desktop-specific code in the depths of the code that lives inside functions you call from the poll loop. But it's important to start from where there are points of contention if you're going to make your code testable.

Where can you split code?

In some places you should be able to see that there are points where the code joins two things together - an interaction between the back end code that handles your documents, and the front end that handles you interacting with it like clicks, keys or messages or whatever. You should hopefully be able to see where you switch from handling a click to what that means for the documents or objects that your application handles. It will almost certainly mean moving some things around - refactoring the code so that it's easier to see the transition. You can then begin to split your code apart so that the logic that handles the desktop application (or the module SWI interface, or whatever) is on one side, and your document specific bits are on the other side.

If you keep doing this, you will eventually get to a point where the bit of code that initialises your document state (creating new documents, loading them, starting from templates, etc) is freed from any dependency on the application. And at that point you shouldn't have actually changed the operation of the application - just moved some things around. That always has a risk, but the point of testing is to reduce the risk of things breaking, so this initial pain will just have to be suffered. Trust that you're doing the right thing - not only will the code be testable, but it will become easier to follow because what's being done will be separate from how it is triggered².

There's a number of benefits to splitting code up in this way, in addition to that of being able to test more easily. You should be able to reason about what each section of code is responsible for more easily. This should make it easier to see when it's doing something that it shouldn't be - if you see some system calls to do hourglass operations in the middle of data operations, consider splitting them out to separate routines, because they're not related to the data operations. If you're producing separate pieces of code that have defined functions, it's easier to reuse them in other places. And, if you have split off a lot of the operations from the interactions, it becomes easier to re-target your code - instead of being a desktop application, your code can become a command line tool, or if it were appropriate, a module. And if it's less tied to the user interface, it can be made cross-platform more easily. There are many benefits to modular and reusable designs.

But maybe it's not so easy to see how you split up the user interface logic from the document logic. Try moving to a different level of the code - instead of working from the top, try working from a different point. Say you have an application that knows how to import a different document format - is it possible to split that off so that you can make it easy to be invoked with no (or few) dependencies? Yes? Do it!

Or maybe you can spot a smaller library that handles certain types of resources - fonts, images, structured application data, etc - which doesn't need to know about any other part of the system. They might be lower down the application, but if you can split them off so that they have minimal dependencies, then it will make it easier to see other parts that are isolatable, or at least less dependant.

What does splitting the code mean?

I've suggested that the code be split up, but not said what that means. That may depend on your particular code that you have.

In BASIC, I would suggest moving the code to a separate library that has prefixes on its procedure and function names - FNdocument_create, for example. I have very little BASIC code from which to draw examples, but in the dim past I wrote an IRC client which had little bubbles for the conversation¹⁰. The IRC client part had separate BASIC library files for the connection, for handling Channels, for Colouring, for messages related to mode changes, for configuration options, for processing user commands, and others. When the bubbles were added, they had three major parts, text block handling sizing and rendering, bubble rendering with text blocks, and lists of bubbles which contain text. Including many libraries in a BASIC program shouldn't be a concern, and you can always use a compressor to make a single file if you like.

In assembler, you certainly want to move them to a separate file¹¹. It's also useful to be able to make those files able to describe their resources so that you're not mixing the resources with the rest of the assembler code. What I mean by that is to provide symbols that say how large the resource data is, so that other modules can allocate your space, without caring about the layout of your file's workspace. Or you could provide a couple of functions in the file that create workspace blocks - cnp_create to create the structures that are used by the cut-and-paste code, for example. This means you might have a few extra pointers to blocks sitting around in your assembler workspace, but if that's painful to you it may be time to consider moving to a high level language which allows more structured code. Do it anyway. It'll be better all around (smile).

In C, you may move them to a separate file to make them less dependant on other parts of your application. For C code you might create a completely separate library, which is built on its own, and you link against. In either case, I would use the number of #include directives you have in your source as an indication of how entwined the code is. If you can get down to just the system include files, then you've definitely got a unit-testable file.

In object oriented languages, like C++ or Python, it might be that you can move to a class first, and if that class needs to be separated from the parent file, you move it into its own module. In Python you might create small modules within a package - splitting out small sections of code from the main code because they themselves are more isolatable.

A way to split the code up

I have found that a way to make this work on existing code - whether it be BASIC, Assembler, C, Perl, Python, Javascript, or whatever (those are the ones I've actively worked with recently) - there is a pattern that I have found to be useful. It might be different for you, but here's what I've found works when you've got a file that mixes things together that you want to refactor¹². This works for cases where you've got a reasonable sized file, and it mixes code for different purposes, and may be dotted about all over the place.

Start to bring the related code together in the file.
- This means moving the code around so that sections that work on the same things are close together.
- Maybe you delimit the code with big ################# marks to show that you're dealing with something else.
- If you need to create new functions for pieces of code that you can see as isolated, do so, but only where that code lives at the end of another function - eg if you have a function that responds to the wimp, does some system calls, and then processes the data, split off the data processing part first.
- The reason for not doing more than simple movement and refactors at this point is to focus subsequent changes on the structured area, and not be working all over the file. This is particularly important if you're working with a group of people on the same code¹³, because moving all the section you're working on together will mean that you don't tred on them as much.
Create functions for the loose code, in your new section.
- There will be the odd bits and pieces of code in the rest of the file that do things related to the code you're refactoring. Start to bring them into that section as new functions.
- If there's a place where the code is directly poking into structures that your refactored code owns, create a function for that. If you're writing assembler you might hesitate at changing an STR r0, some_place into a function. Tough luck... you're trying to make the code more maintainable and testable, so that means that special cases like that need to be really special to be valid. If you're feeling generous, make those sorts of operations into macros so that they're hiding the operation inside something that the refactored code controls. That's probably good enough for now.
- If a sequence of operations is repeated in the code in a number of places, then consider the right way to refactor it. For example, you might have code that does remove cursors, add some text, restore cursors in multiple places in the code. If the sequence of operations are always related to the refactored code, then you can place them into the refactored area. But if they're not related, they can be moved into a function outside the refactored code, instead of repeated. In this case, it's not the testing of the refactored code that we're improving, but the manner in which the refactored code is being called - if it's always going through the same function to do an operation we can make sure that the function does it exactly the same way.
Restructure your resource usage
- This goes back to the example I gave previously of having a function which creates the workspace for the newly refactored code. Essentially, this means making the data used by the refactored code isolated as well.
- In assembler, you may get away with moving the definitions of the memory used into a separate header and referencing from your main block definitions. But you may be forced to create new functions for creating memory blocks for the refactored code.
- In BASIC it's similar, but you're going to be hurt by the fact that your variables are global. Still, try to use individual variables with prefixes on them, rather than repeating variable names that might clash with other parts of the code. And rather than sharing memory blocks for general usage, try to allocate dedicated memory for your new refactored code.
- In C its very similar, really. Disentangle any structures you might have from the main application data. Create new structs for the new code, and don't include anything that's not needed in it. Maybe that means that you have to pass in the structure pointer to the functions of the new code, rather than referencing global variables. If your code can only ever be used once, then maybe you can get away with globals, but consider the implications.
- In C++ and Python (or other object oriented systems) these sorts of resource problems are less of an issue, as moving to using classes or namespaces will take care of them for you.
Lift the code into a new file and start using it.
- This stage is simple, if the others have gone well. You just take all the code out of the block in the file you were working in and you move it to a new file so that it's physically isolated from the rest of the code, rather than just logically isolated.
- You may need to create headers as well, if you're writing C or assembler, but they should only be moving the existing code.
- At this point the binary you build (in the case of compiled code) may actually be identical to how it was before the move. But even if it's not, you're changing no code or calls, just where the code lives.
- Inevitably it won't be quite that easy, and you'll have missed one or two things, but they should be easier to fix.
Try to use the file on its own.
- Having isolated the code into its own file, you may be able to use it on its own without a lot of the application. Name the file something clear about its function. Names are important to focus the intent. Naming the file Wimp13, or vduk is not an option¹⁴.
- Add to the top of a file a 'file prologue comment' (some form of comment block at the top of the file) which explains what the file is and how it relates to the rest of the system¹⁵. This should describe the idea you had in your head about what you were refactoring out. This is mostly a development thing, but it means that the scope of the file is constrained, and anyone adding things that don't relate to its intent is changing that scope. This should keep the file on target, and not turn into a collection of other functions, which you have to split up again.
- Try to create a small program that can use this library on its own. Maybe you need to include some support and lower level functions to create the program, but if you're finding you need higher level functions, you'll want to move things around so that you can avoid that.
- Sometimes you find that you do actually need the higher level functions, and it's not as clean as it was. That's ok. It's irritating, but it's ok. Because now your code is more split up and structured than it was, and you can at least see those functions. You may need to do some real design and restructuring to make it nicer rather than this simple refactor.

This process has worked for me in many cases where there's been code that's tied up in different ways - and it does work in many languages. It also works on writing new code. You can introduce the code into an existing file in a block, and start wiring your new features in directly. Essentially, you're writing code in the form that it will be in the first stage. If it turns out it needs to stay with the rest of the code because, it's not worth moving, or you don't have time, or it just isn't as easy to isolate, then it's already in a place on its own. And if you do find it's worth moving, it's easier to do so.

How do you test?

Let's say that you've found some interface points and split some code from the main application. How do you test them? An argument I've heard is that there isn't a good testing framework to make things easy. It might be true that you don't have such a framework. But they exist elsewhere, and you can create your own and - most importantly - that's just passing the problem off to someone else. If you care about your product, you take steps to remove those barriers. So basically I toss that argument aside as being invalid, but also an excuse for not getting on and doing what needs to be done. But how do you test things without having a library to help you? You know how your code works, so use it in a program and check the results. That's all testing is, at the end of the day.

What does that mean? Say, you've got a small section of code that can load a document, or create a new one. You want to test that it works. It will almost certainly have a number of different forms it'll need to be called with - if you're loading a document you could have many stock documents to try. So that's what you do, you store a number of documents with different features in your code. And you write a little program that tries to call the 'load document' code with those files. At its most basic level, you've now got a test. If your little program crashes with those documents, you've got a bug, and your test is doing its job. Create a makefile that can build your code. Create a target 'test' which runs the result of that build. And run amu test or make test whenever you need to check that you've not broken things.

Is that really testing?

You'll surely be yelling "but I didn't test anything" at this point. Yes, you did. You created a very simple regression test that checks that when given a bunch of files your loader code doesn't crash. Maybe it doesn't load the right thing. Maybe it creates a document with garbage in, but the point is that that function doesn't crash. We'll come to more specific testing later, but this is a huge step up from where you were before with a desktop application and manual testing.

Knowing that it doesn't crash when you load the document means that if a user reports that your application crashed when you loaded it, you can reason two things... either their document exhibits something special that you weren't testing for, or the problem lies outside the loader. You can hopefully take their document and add it to your collection of stock documents, and run it through your loader test. If it doesn't crash in the loader test, then the crash lies elsewhere, and you've cut your search space. If it does crash then you add the document to those that loader will test (now your tests are failing), and fix the problem (now the tests are fixed). These stock documents should either be checked in with your source, or available from a repository that you can find them in ³ ¹⁷.

You'll notice that I said 'if it doesn't crash in the loader test, then the crash lies elsewhere', and not '... the problem lies elsewhere'. The loader test is only checking that you're not crashing. As I mentioned earlier (and you yelled about), we're not actually checking anything in the code. Ok. That doesn't mean that the loader test isn't valuable - it was helping with the basic crash case, but also (and equally importantly) it's given you some test code that you can expand when you find more things that are important.

Let us say that the document that your user supplied triggers a crash some time later because the loader put things in the wrong place - it created a structure wrongly - and this crashes when you try to use it. The loader test didn't catch that. But it probably should have. You've found where it crashes, and what variables it was trying to use. At this point you might be trying to debug how it got to that point, putting in logging to show the value of variables and the like. That's ok, but that sort of debugging is generally only useful for the instance that you use it. The point of automated tests is to make sure that you don't get into this state again. So add a little code into your loader test that checks whether the structure was constructed right. Hopefully, you would find that the test crashes, because the loader had in fact done the wrong thing. Then you can fix it, and your tests will work again!

But if the structure was all fine after the loader did its job, then either that's not the problem, or the problem lies elsewhere. Doesn't matter - you've improved your test code. Even though you haven't fixed the user's problem (which is, after all, the ultimate goal) you have a better product for it. Not in a way that changes the features that the user sees, or even in its current stability, but because in the future there's one fewer thing to go wrong because it's being checked. So be happy.

Ok, that's enough celebration - the user's still unhappy because their document crashes. If it's not in the loader that there's a problem, then something else must be affecting the code. Maybe you've got some data corruption? Maybe there's an oddity in how that document is processed when it's displayed on the screen? We can guess on what's going on, but one way to deal with the issue at this point is to look at what the product (and the system) is doing. Presumably your application calls the loader, then it does a bunch of desktop things, and maybe opens a window on that document. Any of those operations could have caused the loaded document to be corrupted or otherwise able to crash.

So have a look at them. Are there any that can be isolated out in the same way we did with the loader? We can create a new test, which calls the loader and then does the bit that you've isolated on the loaded document. Maybe this finds the problem. Maybe it doesn't. But again, you've got a little bit more test code to make you feel better about the product. What about memory corruption? If there's something weird going on then maybe we're overwriting things when some condition is triggered by this user's magic document.

We probably won't find this if we build these little tests as they stand, because the memory usage patterns will be completely different. But these are now little programs that do the smallest amount of work. We can introduce memory checking to them, without hurting the main application's performance and memory usage. I recommend the C 'Fortify' library for this, although other solutions exist⁴. Maybe it turns out that your loader was allocating a structure with too few bytes, and then the important data was being overwritten in a later call - Fortify would tell you. Fortify (and others of its ilk) try to replace your memory allocation functions with code that tracks those heap blocks, and who allocated them. Every so often (or every time, if you wish) they can check that these blocks haven't become broken - that nobody has written anything before or after them accidentally. This tells you whether your blocks were too small or someone is doing things wrong. Similarly, because it is tracking the memory you use, it can report whether the memory you have used was not freed on leaving a block of code - if you apply this at the end of your program's execution you can see all the memory that you haven't freed. It's useful to ensure that you're not intentionally leaving blocks allocated, so that these diagnostics are clearer.

You can, of course, build your whole application with memory checking, but it'll be slower and more memory hungry and probably not releasable like that. Do that, for your own use, then when something goes wrong you'll be able to see warnings about corruption and double frees well in advance of it reaching a user. Just because you want to create unit and integration tests, doesn't mean you can't make your life easier when you're manually testing. Maybe when a user sees a problem you give them your special build and it might tell you about a problem you weren't catching in your tests.

Any time that you improve the unit test to make it recognise a problem, or to exercise some code that would be buried in the middle of your application, you've made the debugging more isolated. You can tie a problem to a smaller area of the code, and you should only need to look at that area of the code - and none of the rest of the application. This saves you time, and can improve your confidence in the product.

I've talked about the example of a desktop application here. This won't be possible in some applications that are purely interacting with other applications, but that sort of application is actually less common. Almost all applications have some internal state that they work with, data that they process, or files that they load or create, which you can isolate. And if you can isolate it, you can test it.

What about modules?

But what about modules? Like applications they will also have interface points where two functions or operations interact with one another. These can be extracted into simple command line tools, just like we did with the above examples for the applications. If the modules are in assembler, then a) ask yourself what you're doing with your life⁵, and b) make the code able to be called from your little C program. There's lots of tricks for doing this, but essentially, you're trying to make something that you can test in isolation, and whilst assembler isn't quite so amenable to that, it's still quite possible.

There will always be special cases that are hard - the more tied up in the RISC OS interfaces you become, the harder it is to test. The point of testing is to give you more confidence in the code and remove bugs, and if the code is tied up in difficult to test interfaces, it will be difficult to understand, so start reworking the code now.

The point for module code is to test it in the safe environment of regular applications, before it makes it into the dangerous world of SVC mode. It's the same reason as we use for desktop applications, but with far more destructive consequences when things go wrong.

In the same sort of way that you extract code out of your application, you can extract parts of your module for testing. They have defined interfaces - commands or SWIs, or even service calls - which can be exercised. You can make calls into the code, running as a simple command line tool, which give it different sets of parameters. This is easier if you're writing modules in C (which of course you should) because you can extract the interfaces that are called by the CMHG header interfaces and see what happens. If, for example, you were building a module like MimeMap which reads in a file and then uses that data to respond to requests about it, there's nothing at all special about it that needs to be in a module. Testing the parts of the code that read files and that respond to particular requests, in that case, should be a trivial exercise.

What about cases that deal with hardware? Well, if you're poking parts of memory mapped devices, you're not going to find that so easy to do in your non-module code - and you shouldn't. Instead of directly poking the devices, separate that code into an access library that you call instead. Then, when you build your tests, link the main code against a fake access library that provides results that 'kinda look like the device you're talking to'. Yes, you'll be implementing a faked little bit of the hardware that you're talking to, and working to what you expect it to do, but that's the point of testing. Your code will work just fine if the thing that you're talking to works the way that you're expecting it to, and you can check that without involving the hardware. Which is also useful if someone without that hardware wants to update things.

You're never going to be able to handle all the cases like this, but you can handle some, and that may be enough - enough to check that the rest of the code, which handles the actual data, is able to do the right things.

But these tests don't test the right thing!

Some of you may be arguing that what matters for the application is that the operations that the user performs work, and not that these little bits of the implementation work. Quite right, that's the ultimate goal. The reasoning is that it's hard to do the system testing, but easier to do the unit and integration testing. So if you do those tests, the number of things that can wrong will be fewer. In theory the things that can go wrong are only related to the way that the system as a whole is put together, and the calls that it makes to those tested areas of the code. That hugely reduces the amount of things that need to be checked when something goes wrong. And if there are fewer things to go wrong for the user, because you've tested all the back end bits, then that's likely to be a better experience for the user, and a better experience for you when a bug report comes in and you have fewer areas of problem to consider. If you can do system testing, then that's great, but having confidence about the lower level features means that you can build on them with less worry.

Of course, testing is never perfect, so you never completely trust that you've tested every possible thing that could go wrong. So you may still end up fixing things that were in areas that you are testing. That's good. Each time you make such a fix, add a test to check that it was fixed. That way you won't break it in a future change - these sorts of checks are 'regression tests'⁶, because they stop you regressing back to a bug you already fixed, or decision that you previously made.

Maybe you're thinking that these tests get in the way of adding new features. That's certainly an effect - adding a new feature means you have to test it, and it may in the process break other tests because the behaviour you expected has changed. I'm pretty much going to dismiss the objection that you have to test your new feature, because if you're not willing to test the new feature then you wouldn't be reading this, and you should always try to increase the amount of things you have tested with everything you do.

The second of these effects is that you break other tests because the behaviour changes. Again, this is good. Either those tests weren't right in the first place and you're now using the code in a way that was expected but not being tested right, or you have changed the behaviour. The former is a test that's wrong, and you can change it to do what you really expected. In the latter case, the warning that you're changing the behaviour is telling you that there will be impact on anyone that is using the interface. Any callers of that interface will have worked the way that the code was expecting before, but now needs to be checked to see if it's still expecting the same thing. Yes, that failure told you that the tests changed, but it was a warning that there is a bigger impact of the change than you considered. Isn't it better that your tests tell you that, than your users tell you?

Ways to extend your testing

Let's say that you've added a bunch of tests for your application, or module, or whatever. You have a feeling that something is going wrong on a user's system. Maybe you can send them one of your test programs to run on their machine. Probably it'll be just fine because you've isolated that code away from any dependencies, but maybe something odd shows up. Maybe you'll actually end up with one of your test programs becoming a command line version of your application - able to import your documents, process them and write them out. Maybe that's not a great feature for many users, but in some cases it's amazingly useful.

Or maybe you're feeling that there's a bug in extreme circumstances. If you're running the code in isolation then you may be able to inject failures into the code more easily - in C use some #ifdefs to check for a testing mode and to fail randomly or on request. You might create a variant of your loader test which fails every xth allocation, then run it a few thousand iterations to check that it doesn't crash when its not got a lot of memory ⁷. Or a variant that exhaustively calls an interface with all possible (or a representative set of) combinations of a parameter. Neither of these sorts of test will be fast, but that's fine. You might not include them in your day-to-day testing, but you can run them now and then to check that things still work reliably in the most extreme circumstances.

There isn't an end to this process, because there's never enough testing ⁸, but once you have some of these tests in place you can make sure that they get run regularly. Build them into Makefiles so that you can invoke them all in one go. Yeah, it might take a while to run. Do that when you know you're going for a cup of tea. If you've got an Obey file or Makefile that creates you a new release, make it run the tests before it does anything. If you haven't got such a tool, make one - you'll thank yourself later when you get consistent releases created, which have been tested. If you've got a check list of things that you do manually before release, add running any long tests to that list.

And, of course, if you can run these tests on every commit to your source control, do it ⁹. There is nothing better than having an email turn up that tells you that you've broken the application when the last change you made. Except maybe having an email turn up to tell you that everything was working (smile)

Finally...

There's a lot more that can be said on testing. I did a presentation recently which includes some of the discussion here, but also included some worked examples. The slides and speaker notes have been published, along with some references to other sources of information that may be helpful.

You can find them here.

If you disagree with any of these methods, or feel that my view on the focus of testing is wrong, then that's just fine. Obviously I feel that the way I've been doing things works reasonably well for me, but that doesn't mean that it's right for anyone else. If you're thinking about testing, and applying any sort of testing methodology to your product, then that's a good thing - if any of this article or my presentation makes you want to put more or better testing into your product, then that is a win for everyone!

This is a term that comes from "Working effectively with Legacy Code". It's a good book, if you care about trying to test things that just don't seem amenable to testing. If you want to just understand some of the ideas, there's a great summary. ↩
If you've written applications in Toolbox, you'll already have seen that it encourages you to think about how to separate the operation from the trigger. Toolbox user events are there to allow you to attach the operations to many sources - a menu operation, an action button, a key press, etc. ↩
It's generally a bad idea to have large binaries in your source control. However, if they're not stupidly large, and they're unchanging (which such documents should be) there shouldn't be much problem with this. If your sources get large and you don't feel comfortable doing this, then move the stock documents to another repository, or some different storage that is accessible and safe from deletion. ↩
Fortify is a C library that replaces your malloc/free/realloc/etc calls with calls to itself which have checks on the ends of the allocations, and checks for double frees and other bad operations. You can find a copy on the Internet; specifically it's used inside the Nettle source code. It's used in most of my software testing. Other solutions like the Artifex 'memento' library do the same job with a different license. Dr Smith's tools also include HeapCheck which will do a very similar job. On unix systems valgrind can do a similar job. macOS has similar tools in its libraries for checking of memory, and dedicated leak checking tools. ↩
Yeah, I know it's legacy code and you can't help that, but you're not making it worse are you? ↩
Actually most tests that you write this way are regression tests. Because you're automating them to check that you keep the same behaviour they fall into this category of 'regression tests'. ↩
I'm a big fan of this type of 'memory squeeze' testing - introducing failures that trigger the failure code path means that you get to find out whether that path works. Invariably that code path won't be tested often, but when it needs to be used, it is 'unfortunate' if the application crashes because although you wrote code to handle the failure, it was broken! ↩
There may never be enough testing to hit every combination (because the number of combinations of any program rapidly approaches more time than the lifetime of humans, or even the heat death of the universe), but it is important to be pragmatic about testing. You may want to do full system testing, but it is impractical, so you accept unit and integration testing. And you might want to test the core of your product, but initially you can only easily test the utility function. Maybe the pragmatic decision is to test those utilities, because they're your foundations. But then they might be considered 'battle hardened' because if they were broken, nothing would work - on the other hand, they may be the cause of your problems because just now and then they go wrong and give rise to odd failures. Or you may choose to put the effort into refactoring the code so that it's possible to test closer to the core of your code. Where do you gain the most benefit? Benefit might mean that you feel more confident of the code. Or that you know that there's an area that is scary and you know you'll fix problems when you refactor. Or that you know that you have to rework a section soon, so adding tests now would make reworking it safer. Or that you know that you have to rework a section soon, so it's not worth testing it now. Only you will know what is the best area to attack in the testing of your code. ↩
build.riscos.online. ↩
MyRC was written in about 2002, I guess, and was heavily influenced by Comic Chat (I still think that its goal of making conversations into stories was pretty awesome, and its implementation - excepting the effect on other IRC users - was impressive). It looks hugely dated now (probably looked dated then), but the original was thrown together in about a day, and then the bubbles hacked on to it later. ↩
Stop writing assembler. Please. Or at least only for those bits that you absolutely must. ↩
Anyone who has used IDE refactoring tools will know that they only really work if your code is already reasonably structured. If you can use those tools, go for it! But if you cannot, hopefully these notes help. ↩
Or working with yourself because you're doing multiple branches of work at once. ↩
(facepalm) ↩
You will see that lots of open source components prefer to instead place the legalese of their license at the top of the file. Instead of explaining what the file is and how it works - you know, things that might actually help understand the code and be related to its development - they repeat the license details in every single file. Resist the urge to do this. Use a single line reference to the license that you place it under. What is more important to understanding the file? Details of what it does and how it works, or repeating the same thing you've said elsewhere. If you must include license text, ensure that your explanation of the file is longer than the license text you quote within it. To do otherwise is misplacing the emphasis of what you are trying to do. Sorry, that's got nothing to do with testing, but lots to do with the mindset you should be in when writing code - and if you're thinking more about the restrictions you want to place on your users, then helping them to use your code, you have misplaced values ¹⁶. ↩
In my opinion, of course. Feel free to do what you like, but... (eye-roll) ↩
I'll give an example of testing using a document repository, which I created a couple of years ago. I wanted to check that BBC BASIC was working properly. It's kinda important and I wanted to know that I was building it right, and this was when I was starting out making sure that I could test things, using Pyromaniac.

Creating a collection of documents to test - in this case BBC BASIC programs - was relatively easy. I had written a library¹⁸ to trawl the Rosetta Code site for C files, so that I could feed them to the C compiler (for exactly the same reason), so I just downloaded all the BBC BASIC files.

Next I set up the a testing script to run BBC BASIC to load every file and tell me what happened. Either it'll work, or it won't. At least that was the hope. It turned out some of the programs don't work, either because they try to use libraries from 'BBC BASIC for Windows', or because they try to read input (which I cannot generically give them) or because they actually never end. All the extracted documents (because they're text) are actually checked into the repository, so that I have a copy of them exactly as they were.

This means that I can now check that BBC BASIC hasn't regressed through running these tests now and then. Not that I've changed it much recently, but the fun thing was adding the tests.

The results:

Pass 94
Fail 264
Crash 0
Skip 104

There's a lot of failures because of errors due to BBC BASIC for Windows assumptions, but it doesn't crash. ↩
https://github.com/gerph/rosettacode ↩