- Computers & Software»
- Computer Science & Programming»
- Programming Languages
How to Refactor Code in Python: A Beginner's Guide
What is Code Refactoring?
Refactoring is just a fancy way of describing a process in which you re-write parts of your code to make them better. Don't confuse it with re-writing though - rewriting means starting again from scratch to create a new improved version, whereas refactoring is an incremental process of improving existing code by making changes to small parts, one at a time.
Usually when we write a piece of programming code, in Python or any other language, we start out by just trying to find any solution which works. At this early stage, finding the best possible solution is a very difficult task - it will probably be hard enough just to find something which produces the correct result with no errors! Because there will inevitably be many different solutions to any given problem, there is a high chance that whatever code you write will turn out to be unnecessarily slow to run and difficult to read. This is, of course, doubly the case when you are a beginner.
The process of writing code by finding any answer which works, and then looking at how to improve that code, is formalized in the programming method known as 'Test Driven Development'.
Test driven development (TDD) is probably the most widely used method for writing code in any language. A programmer using this method will begin by creating an example case which will be used test whether the code they are setting out to write will work. They then focus purely on writing code which passes the test - which has the desired functionality and doesn't produce any errors. Once the test case has been passed, the developer will then focus on 'refactoring', or improving their code so that it runs faster, is more readable, and adheres to best practices.
Refactoring might also be required for old code, to bring it into line with modern standards or allow for functionality which wasn't available when the program was originally written.
Defining Your Goals When Refactoring Python Code
Before starting to refactor Python code it is a good idea to have a clear picture in your mind of what you want to achieve. This may not be as simple as it sounds, after all - what is 'perfect code'? There really is no such thing, and you will come across many cases for which there are multiple solutions which may seem just as good as each other.
Generally speaking there are four main objectives:
- Making your code more readable for both yourself and others
- Improved extensibility - making it easier to build on your code and add new functions, or in Python to re-use your code as a module imported to other programs.
- Reducing the runtime of your program / increasing speed.
- Reducing memory use by your program
Sometimes there may be some conflict between these different objectives. For example, in Python a recursive function call, in which a function calls itself (with a different input, obviously) requires very little code compared to using a loop to perform the same task, and may be much more readable because of this. But it is inefficient, and so will probably have a much longer runtime.
Ideally you should define your priorities before you begin refactoring. Beginners, however, may not be concerned about performance and may want to skip this step to just focus on finding the 'low hanging fruit' detailed below.
If you are a beginner trying to learn what good Python code looks like then you you should take the time to read through the 'Hitchhiker's Guide to Python'. This guide will teach you how to recognise the most 'Pythonic' way to write code.
Python 'Code Smell' for Beginners
The term 'code smell' is often used by programmers as a catch-all term to describe a range of indicators which may suggest that your code needs to be refactored. Learning how to identify different types of code smell will therefore point you in the right direction for how to refactor your Python code. Here are some common examples which are good for beginners to look out for:
- Repetition: One of the golden rules of programming is 'don't repeat yourself' (DRY). If you notice that a piece of code is repeated multiple times throughout your program then you should try to consolidate this by putting the repeated code into its own function or assigning it to a variable, which can then be used wherever it is needed. This will make your code shorter and easier to read. It will also make your program more extensible, by making that piece of code easily available in any new functionality which you write.
- Inappropriate Naming: The names of variables, lists, dictionaries and so on should be representative of the things that they contain in some way, but without being overly long. A common practice is to use the first letter or two of an appropriate descriptive word ('s' for sentence, for example, or 'add' for address). As your program grows, however, you may find that what seemed like a good variable name when you wrote it now seems ambiguous when you read it back (Does that s for sentence or string, for example, and what the hell was I adding something to?) Alternatively, perhaps, a letter or word you have already used for something seems like it would fit better with a new variable you are introducing to the program. A good code editor or IDE will help here, as it will allow you to find and replace every instance of a variable name without having to read through the whole program yourself (see A Beginners Guide to Notepad++).
- Bloated Functions or Classes: Your functions should be concise and perform a well defined task. Sometimes we write functions or classes which end up taking on too much, because we might have to add to them as the program grows to stop them from breaking. Look through your longest and most widely used functions and classes - sometimes dividing them up making new functions to perform necessary 'side tasks' can improve performance and make your code more readable.
- Unnecessarily Complicated Code: Often a problem will seem more complicated than it really is, and the first solution you come up with will be unnecessarily long and complicated. Simplifying and shortening code often improves both performance and readability. One of the best ways for beginners to do this is to look for built-in features which can perform a task more efficiently that the code you have written. There are quite a few built-in functions, and beginners often take a long time before they start to remember enough of them to write 'Pythonic' code. You can find built in functions using IDLE, which you should have installed when you downloaded Python to your machine. For example, if you have a long and complicated looking piece of code performing some kind of string manipulation, just type dir (string) into IDLE and hit enter - this will give you a list of the built-in functions you can use with strings (you may be able to do something similar in your preferred IDE, depending on what it is). You can do the same with dir (dict) for dictionaries, dir (list) for lists, and so on.
- Overly Large Pages: You should make sure that you are not trying to fit too much into a single page. If a single page of code is very large, consider ways that you might improve it by breaking it up into separate modules.
- Lack of Useful Comments: If you look back over a piece of code and find that it is difficult to read, but you can't find a way to improve it, then make sure that you have a good comment line (or lines) to explain what is happening.
Code to be Refactored - Looks Complicated and Hard to Read
def censor(text,word): while word in text: text = text[:text.find(word)] + "*"*len(word) + text[text.find(word) + len(word):] return text
Refactored Using Built-in Split & Join Functions
def censor(text, word): censor_string = '*' * len(word) return censor_string.join(text.split(word))
In most cases going through the methods above to remove repetition and unnecessary complexity is enough to improve the performance of your program. Sometimes, however, you might need to place a particular focus on maximising your performance, and it may not be immediately obvious which one of multiple possible solutions would be best for optimizing performance.
In this case there really is no better way than simply experimenting with as many different options as you can think of and comparing their performance. One way to do this is by using time.clock and the eval() function. The eval() function will run any code placed between the brackets, whilst time.clock can be used to measure the time at start and the time and finish, to see how long it takes to run. See below for an example module using this method.
Python Speed Test Example
import time import function_to_test def time_execution(code): start = time.clock() result = eval(code) run_time = time.clock - start return result, run_time print time_execution(function_to_test(using_this))
Python Code Refactoring Tools
If you are using an 'Integrated Development Environment' (IDE) then there is a good chance that this will have some handy tools which you can use. Because there are so many different IDEs, each with different features and different approaches, there is no way that I can list them all or explain how to use them. The best thing for you to do is to make sure that you have taken the time to learn about your code editor or IDE and what it is capable of doing. Look through the documentation - especially for the search features to see if you can search only variable names, regular expressions and so on.
Another useful tool is 'Rope', which is a pretty comprehensive refactoring library for Python. Beginners may find that it takes a bit of time to learn the ropes (pun intended), but it is worth doing if you can. Rope can also be used to upgrade the features of your IDE, such a improving auto-completion, adding a feature for automatic removal unused or duplicate module imports, and accessing the Python docs from inside your IDE.
If you are using Vi or Emacs then another good library which can be used to upgrade your IDE's refactoring tools is Bicycle Repairman. This is very similar to Rope, only with a better name. It is used for automating common refactoring tasks. The name itself is based on the name of a superhero from a Monty Python skit.