Python 005: Text Generation

This will help you:

Work with dictionaries, read and write text files, and generate semi-random text.

You can create some truly amusing things with a random text generator. In this activity, you'll learn to write your own version of a Markov chain generator, and soon you'll be churning out the next New York Times Worstseller - leaving all the creativity to the computer. If you're pretty confident, skip to the main activity, or do the warm-ups first.

Time: 1-2 hours / Level: B2

You should already:

Get the code and resources for this activity by clicking below. It will allow you to download the files from a Google Drive folder. Unzip the folder and save it in a sensible location.

Step 1: Warm-up - Working with files

Open file_ops.py and look through it. Once you have looked it over, run it by typing python file_ops.py in the terminal. Complete the following tasks and read the code and comments as you go.

  1. file cursor: Fix the code so the file will be read again, print backwards, and then print each of the first 10 words on a line.

  2. read-only files: Keep the code from giving an error by commenting out the line that doesn't work.

  3. write-only files: Keep the code from giving an error by commenting out the 2 lines that don't work.

  4. overwriting: once w_file is closed, open it to view its contents, then watch what happens when you reopen it with w+ mode.

  5. appending: once rw_file is closed, view its contents, then watch what happens when you reopen and edit it with a+ mode.

Here is an official tutorial going over file operations.

Step 2: Warm-up - Dictionaries

Open dictionary.py and read through it. Run it by typing python dictionary.py in the terminal. Make sure you understand what's going on. Here is an official tutorial on dictionaries, with further reference here.

Step 3: Activity - Text generation

Open the file generate_text_simple.py. Read the header comment for the function get_words(), then fill in the lines necessary to make the function work. To find lines that need completing, look for # TODO: (try using Ctrl+F or your find tool.) To test your code, type python generate_text_simple.py in the terminal. If your function is working, it should display "Text loaded, 202 words" depending on your source text.

Once that's working, read the header comment for the function create_lookup(), then fill in the lines necessary to make the function work. You can test your code by running it as before. If the function is working, each unique word should print with a list of the words that follow it.

Some functions which may be useful to understand are choice() from the random module and list.append()str.join(), and dict.get(), which are built-in.

Finally, read the header comment for the gen_text() function and fill in the lines necessary to make it work. Make sure to read the comments for each line. If you get this function working, the program should generate random text when run.

Step 4: Going further

There is another file, generate_text.py, which has a different implementation of text generation. It uses prefixes of a certain length to choose the next possible words, and if there are no options for a prefix of a certain length, it shortens the prefix.

generate_text.py uses a lot of the same code as generate_text_simple.py, so see if you can get it working. Can you find other ways to improve the text generation? What if it keeps shrinking the prefix until there is more than 1 word choice, so you get repeated phrases less?

Step 5: Make it your own

Find some text online (or multiple sources!), copy it into a text file in the same folder as the text generator, and generate your own nonsense. Print it out (optionally with your name and what text you used) and add it the random poetry board.