Skip to content

CS 21 25.2 Laboratory Exercise 3

Runtime Memory Manipulation and Disassembly

Grading change

The regular points of this activity amounts to 150% of a Lab Exercise with a maximum score of 200% when including bonus points.

Overview

In this activity, you will reverse-engineer parts of a graphical RISC-V program and changing its memory contents to directly influence its state.

Prerequisites

You will need the following on your machine:

Required MIT license and copyright distribution for flappy: link

Flappy executable

For this activity, you will be running a program called flappy which is a simple clone of the Flappy Bird game. Pressing any key or clicking the window with the mouse will cause the bird to fly up.

The C source code for flappy will not be given. Instead, you will be supplied with an executable file of it compiled for the RISC-V architecture.

Your goal is to be able to change the data in the memory of flappy while it is running to affect different aspects of the game.

QEMU

Since flappy is a RISC-V executable file, it contains RISC-V machine code that cannot be executed natively on machines having the x86-64 (most laptops) and ARM (Apple Silicon) architectures.

Instead of procuring an actual RISC-V machine for each student, it is more practical to set up an emulated RISC-V machine that runs on an actual x86-64 or ARM machine using an emulator such as QEMU.

QEMU is a program that allows a compatible machine to do full-system emulation for a different architecture. In particular, the qemu-system-riscv64 emulator allows us to run a full 64-bit RISC-V system despite not having actual RISC-V hardware.

Unless you are using the DCS TL machines, you will need to install this yourself by following the instructions for your OS here: https://www.qemu.org/download/

Image file

An actual machine requires a storage device (usually a disk drive) from which a program (usually the operating system) can be booted up.

A full-system emulator may be supplied with a file called a disk image which contains a snapshot of the content of a storage device that it could use as a virtual hard disk.

For this activity, you are to use a premade image file containing the following:

  • RISC-V version of the Linux-based Debian operating system (Ubuntu alternative)
  • RISC-V version of GDB (to be introduced later)
  • The flappy executable

Running qemu-system-riscv64

To boot up the Debian installation from the image using QEMU, first ensure that cs21252lab03.qcow2, fw_jump.elf and uboot.elf are in the same directory.

Then, navigate to the said directory and run the following command:

qemu-system-riscv64 -machine 'virt' \
-cpu 'rv64' \
-m 1G -device virtio-blk-device,drive=hd \
-drive file=cs21252lab03.qcow2,if=none,id=hd \
-device virtio-net-device,netdev=net \
-netdev user,id=net,hostfwd=tcp::2222-:22 \
-bios fw_jump.elf \
-kernel uboot.elf \
-append "root=LABEL=rootfs console=ttyS0" \
-display gtk \
-device VGA \
-display vnc=:0 \
-device virtio-keyboard-pci \
-device virtio-mouse-pci \
-device intel-hda \
-device hda-duplex \
-serial mon:stdio

If done correctly, a new window should be created that represents the monitor of the virtual RISC-V system emulated by QEMU.

You may log into Debian using the following credentials:

  • Username: root
  • Password: cs21

Once logged in, verify that:

  • The flappy executable file is visible on the desktop
  • The terminal is accessible by clicking the icon at the lower-left corner of the screen and selecting System > LXTerminal

64-bit RISC-V architecture

This activity will involve the 64-bit variant of RISC-V instead of the 32-bit variant discussed in class.

Notable changes are as follows:

  • General-purpose registers are 64 bits wide (from 32 bits)
  • Some instructions are two bytes wide

New instructions are added to support 64-bit words ("doublewords"):

Instruction Description
ld rd, imm(rs1) Loads the 64-bit value into register rd from the memory address rs1 + imm
sd rs2, imm(rs1) Stores the 64-bit value in rs2 to the memory address rs1 + imm
addiw rd, rs1, imm Computes rs1[31:0] + imm, truncates result to 32-bits, then sign-extends truncated result to rd
sext.w rd, rs1 Sign-extends rs1[31:0] to 64 bits and places result in rd

In addition, floating-point registers and instructions are present:

  • 32 floating-point registers: ft0-ft11, fs0-fs11, and fa0-fa7
Instruction Description
flw frd, imm(rs1) Loads the 64-bit value into floating-point register frd from the memory address rs1 + imm
fsw frs2, imm(rs1) Stores the 64-bit value in frs2 to the memory address rs1 + imm

Part 1: Symbol table

For this part, your task is to print out the symbols of flappy.

Prerequisites

A symbol is any named entity tracked by the assembler, linker, or debugger such as a label, function name, section name, or constant.

The symbol table is a collection of all the symbols of the program and is normally stored in the executable file.

The Linux utility nm can be used to print the contents of the symbol table (i.e., the name list) of an executable file.

Actual task

To print the symbol table of flappy, navigate to the folder containing flappy using the terminal, then run nm flappy.

The output of nm has three columns:

  • 1st column: Byte offset from the beginning of the executable file; location contains the data of the address of the symbol
  • 2nd column: Type of symbol (nonexhaustive list of possible values below)
    • T: Part of text segment
    • B: Part of uninitialized global data (BSS)
    • D: Part of initialized global data
  • 3rd column: Symbol name

For this exercise, the output of nm will be used to determine the names of functions and global variables present in the original C source (i.e., only the second and third columns are relevant).

Do now!

Go through the output of nm flappy and verify that a symbol named main exists. Which segment does it belong to?

You will need to go through the output of nm flappy for the tasks that follow.

Part 2: Changing the game state

For this part, your task is to identify how the overall state of the game (e.g., ongoing, game over) is represented, and how it can be changed by modifying memory address data.

Before proceeding, ensure that you are aware of how the game over screen of flappy looks like.

GDB overview

The GNU Debugger (GDB) is a command-line debugger that allows you to do the following for a running program:

  • Execute a program one assembly instruction at a time
  • Pause the execution of a program when an arbitrary instruction in memory is executed
  • Examine and modify arbitrary memory locations

gdb -p <PID> makes GDB attach to the running program with process ID (PID) equal to <PID>. For example, the command gdb -p 21140 makes GDB attach to the running program with PID 21140.

pgrep is another Linux utility that returns the PIDs of all running programs with filenames matching the given pattern. Assuming only one running program matches the pattern flappy, pgrep flappy prints out its PID.

Tip: Command substitution

You can enclose a command in $() to use its output as part of the text of another command.

For example, if pgrep flappy outputs 21140, gdb -p $(pgrep flappy) effectively runs gdb -p 21140 by just a single line.

This may also be done with backticks (e.g., gdb -p `pgrep flappy`), but careful of edge cases: https://www.shellcheck.net/wiki/SC2006

Do now!

While flappy is running, open another terminal and run gdb -p $(pgrep flappy) to run GDB and attach it to the running flappy program. This causes the flappy program to pause execution.

Once attached, GDB will present a prompt in which commands can be entered while the program is paused. Entering continue (or simply c) will cause program execution to continue. Similarly, hitting Ctrl-C or Cmd-C while in the prompt causes the program to pause.

To quit GDB, enter quit or hit Ctrl-d while the console prompt is empty.

Other GDB commands will introduced as needed.

Finding and disassembling functions

Ensure that flappy is paused while the game is ongoing (i.e., not in a game-over state) before proceeding. You may need to wait several seconds before a key press can be detected while in the Press any key game over screen.

Do now!

Go through the output of nm flappy and verify that a function named set_game_over exists. What does this function likely do?

While flappy is paused, enter disassemble set_game_over (shorthand: disas set_game_over) to see its corresponding assembly code.

Each line of the disassembly is comprised of the following:

  • Address of the instruction
  • Offset from the starting address of the disassembled function
  • Assembly instruction

Your disassembly will have something similar to the following line:

0x000055556ff97128 <+8>:  auipc  a5,0x3 # 0x55556ff97128 <gamestate>

The line above means that the function starting at address 0x000055556ff97120 contains the instruction auipc a5, 0x3 at 8 bytes from the start (<+8>), or equivalently at address 0x000055556ff97128.

Segment locations are randomized

As mentioned previously, segment locations are randomized to make certain vulnerability exploitation techniques more impractical to execute.

Your disassembly will most likely not have the address 0x000055556ff97128 for the line with auipc a5,0x3. Record the address that you see as that of set_game_over + 8.

set_game_over has this form in the original C source:

void set_game_over() {
    gamestate = ____;  // Value intentionally omitted
}

As gamestate is a global variable here, its corresponding address should not be computed via sp at the assembly level.

Do now!

Recall that auipc rd, imm20 performs rd = PC + (imm20 << 12).

Compute the exact address written to by the line sw a4, 0(a5) of set_game_over which corresponds to the assignment of a value to gamestate. What is the value written by this instruction?

The address written to by sw a4, 0(a5) is exactly the address containing the gamestate variable. Record this address.

Do now!

Based on the disassembly, what should ____ in gamestate = ____; contain? What does this value likely represent?

Printing memory address values

print <expr> (shorthand: p <expr>) prints the result of evaluating <expr>. For example, p 10 + 20 prints out something similar to $1 = 30 where 30 is the result, and $1 is a new variable you can use to refer to the result later on.

The C dereferencing operator *, when used on an integer, takes the integer as an address and returns a value. Supposing 0x55556ff9a108 is the address of some global integer variable, p *0x55556ff9a108 prints out its value.

print/x <expr> (shorthand: p/x <expr>) formats the result as hex. Other notable variants include /d for integers and /f for floats.

Do now!

Recall that set_game_over + 8 contains the instruction auipc a5, 0x3. Retrieve its address <A> that you recorded earlier. Run p/x *<A> (e.g., p/x *0x000055556ff97128).

Noting that leading zeros are omitted, how is the printed value related to auipc a5, 0x3?

Before proceeding, ensure the following:

  • The game is ongoing (i.e., not in a game-over state)
  • Execution is paused

Actual task

Suppose the value set by set_game_over to the gamestate variable is Z, the address recorded earlier is A (i.e., the address of gamestate), and the current value in A is X.

Determine X using print, then use the command set *A = Z to change the content of address A to Z.

Verify that the change has been applied correctly via print *A, then execute continue. If done correctly, the game over screen should be displayed despite the regular game over condition not being met.

Next, hit Ctrl-c or Cmd-c while in the GDB console to pause execution. Use the command set *A = X to revert gamestate to its original value, then execute continue. What happened to the overall state of the game?

Part 3: Changing the score

The function responsible for updating the score has the following form:

void function_responsible_for_updating_score(int *pointer) {
    *pointer = *pointer + 1;
}

Your task is to locate the address used to keep track of the score and modify it. Your suggested plan of attack is as follows:

  1. Set a breakpoint at the function responsible for updating the score
  2. Identify the portion of the stack frame containing the value of pointer
  3. Take the value in the said portion as an address
  4. Modify the four-byte value stored in that address

Setting breakpoints

A breakpoint is an address which causes the debugger to pause program execution when PC becomes equal to it. The break <addr> GDB command (shorthand: b <addr>) allows the setting of a new breakpoint.

Recall from disas set_game_over that address <A> (equivalent to offset +8 from set_game_over) contains the instruction auipc a5, 0x3.

The most straightforward way to set a breakpoint at that instruction is to execute break <A> (e.g., break 0x000055556ff97128). More conveniently, break *(set_game_over + 8) may be used to achieve the same effect.

Do now!

Like most GDB commands, break allows the use of C-like expressions for arguments.

Pause the execution of flappy, then execute the command b *(set_game_over+(2*2+5-1)). What does this command do?

Continue execution via continue, then intentionally make the bird hit the ground or a pipe. What did GDB do?

info breakpoints (shorthand: i b) shows the breakpoint number and status of each breakpoint set. Other useful commands related to breakpoints are as follows:

  • disable <n> (shorthand: di <n>): Prevents execution from pausing when breakpoint number <n> is hit
  • enable <n> (shorthand: en <n>): Enables pausing of execution when breakpoint number <n> is hit (breakpoints are enabled by default)
  • delete <n> (shorthand: d <n>): Deletes breakpoint number <n>

Do now!

Delete the previously set breakpoint using d <n> and confirm that no more breakpoints exist using i b.

Printing register values

Before proceeding, ensure that execution is paused at set_game_over + 8. You may verify this by:

  1. Checking that the output of disas set_game_over contains an arrow => in the line for offset +8, or
  2. Executing info registers (shorthand: i r), hitting Enter until the value of pc is shown, and verifying that it is indeed equal to the value of <A> you recorded earlier (e.g., 0x55556ff97128)

Tip: info registers $<reg> command

In general, registers can be used as variables by prepending $ to the register name.

You may also print only that of a single register via info registers $<reg> (where $ is prepended to the register name). For example, i r $a5 prints out the current value of the a5 register.

Do now!

Execute disas $ra to disassemble the function that called set_game_over. What is the name of the function? Exactly which instruction listed in the disassembly calls set_game_over? Does the current value of ra refer to the instruction after it?

By the time pc is in set_game_over + 8, sp should be pointing to the stack frame of set_game_over and that the old value of ra is stored in sp + 8. Since ra is never changed, the current value of ra should match that of sp + 8.

Attempting to dereference sp + 8 and print the value via p $sp + 8 will fail with the message Attempt to dereference a generic pointer.. As GDB has limited type information, it needs to be told explicitly how many bytes the value being pointed to by a pointer is.

Eight-byte pointers

As mentioned earlier, this activity involves the 64-bit variant of RISC-V. Pointers in this variant are 64 bits wide instead of 32 bits.

Since we know that ra is eight bytes and that sp + 8 points to a copy of that eight-byte value in memory, we can type cast sp + 8 with the type long * as long is set to be eight bytes in this architecture. As such, the command p/x *((long *) ($sp + 8)) should give the eight-byte value stored in sp + 8.

Do now!

Verify that the outputs of i r $ra and p/x *((long *) ($sp + 8)) match.

Actual task

Recall that the function responsible for updating the score has the following form:

void function_responsible_for_updating_score(int *pointer) {
    *pointer = *pointer + 1;
}

As stated earlier, your task is to locate the address used to keep track of the score and modify it. Do the following:

  1. Using the earlier output of nm flappy, identify the function most likely involved in updating the score
  2. Pause execution, disassemble the said function, and verify that it is consistent with the C code given above
  3. Set a breakpoint at the first instruction of the said function that is not related to setting up the stack frame (which should be at offset +12 if you have identified the function correctly)
  4. Continue execution, play the game until you pass the first set of pipes (which should update the score), and verify that the breakpoint was hit
  5. Using the disassembly, identify the offset of fp/s0 that contains the value of pointer (say K)
  6. As pointer is an int *, $fp + K is a pointer to an int * value; given this, run the following:

    • p *((int **) ($fp - K)) as the address held by pointer; record this address
    • p **((int **) ($fp - K)) as the value pointed to by the address held by pointer

    int ** type cast

    Ensure you understand why the casting of ($fp - K) to an int ** is correct despite pointer being declared as an int *.

  7. Verify that the resulting int matches the displayed score

  8. Change the value of the said int to 1000 via set **((int **) ($fp - K)) = 1000
  9. Continue execution, then verify that the displayed score is now 1000

Part 4: Changing the bird's position

Prerequisites

The Bird type is a struct defined as follows:

typedef struct {
    float y;
    float vel;
} Bird;

The function in charge of updating the position of the bird has the following form:

void function_in_charge_of_updating_bird_position(Bird *bird, float y, float vel) {
    bird->y = y;
    bird->vel = vel;
}

Your goal is to change the value of y before its value is assigned to bird->y.

By convention, the floating-point registers fa0 and fa1 contain the first and second floating-point arguments when performing a function call. They are also stored in stack frame in the same manner as regular arguments.

Actual task

Look for the function in charge of updating the position of the bird using the output of nm flappy, and disassemble that function.

The ld ra, 24(sp) instruction in the said function is the first instruction executed for its epilogue; this is hit when the function is about to exit. Place a breakpoint at this instruction.

Continue execution and wait for the breakpoint to be hit. Once hit, identify &(bird->y) (i.e., the address of the y field of the Bird value pointed to by bird) and bird->y (i.e., the value stored in the y field) by doing the following:

  • Read the disassembly to figure out where the bird argument is stored on the stack (in particular, in which fp offset, say C)
  • Run p *((float **) ($fp - C)) to get the value of bird (which is a Bird *, but can also be seen as a float * since the first field of Bird is a float); record this address
  • Run p **((float **) ($fp - C)) to get the value of *bird (which is equivalently (*bird).y and bird->y)

The y value is the distance from the top of the bird to the top of the game window. Change the value of bird->y to some integer Y between 0 and 500 (inclusive) by running set **((float **) ($fp - C)) = Y, then continue execution. Verify that the bird has moved its vertical position as a result of this.

Before proceeding, delete the breakpoint set for this section.

Part 5: Putting everything together

Given the previous parts of this activity, you should now be able to use GDB to modify the correct memory locations of flappy to undo a game over and continue playing normally.

Remove all breakpoints set, intentionally trigger a game over by making the bird hit the ground or a pipe, then start a new game.

Do the following:

  1. Intentionally make the bird hit the ground
  2. Pause the execution using GDB
  3. Reset the game state to undo the game over state
  4. Move the vertical position of the bird so that it does not hit the ground again when execution continues
  5. Continue execution
  6. Intentionally make the bird hit the left side of any incoming pipe
  7. Pause the execution using GDB
  8. Reset the game state to undo the game over state
  9. Move the vertical position of the bird so that it does not hit the ground again when execution continues
  10. Continue execution

Part 6: Memory map

For this part, your task is to print out the memory map of flappy and identify which kinds of allocations.

As mentioned in the lectures, the memory map is the assignment of address ranges of a program to specific purposes such as stack and heap memory.

Running pmap

The Linux utility pmap prints out the memory map of a running program. You will be using pmap later to determine whether certain pieces of data of flappy are stored globally, on the stack, or on the heap.

pmap must be given the PID of a running program. The command pmap 21140 prints the memory map of the running program with PID 21140.

Do now!

While flappy is running, open another terminal and run pmap $(pgrep flappy) to print its memory map.

As it may be more convenient to view the output of pmap $(pgrep flappy) through a file viewer, the command pmap $(pgrep flappy) > map.txt may be used to save the output to map.txt instead of displaying it on screen.

The output of pmap has three columns:

  • 1st column: Starting address of the segment
  • 2nd column: Size of the segment in bytes (where K = 1024)
  • 3rd column: Permission bits (explained later)
  • 4th column: Segment name

Do now!

Locate the segment with name [ stack ]. How large is the segment in bytes? What is its address range?

Note that across multiple executions of the same program, the locations of certain segments such as stack memory are randomized for security purposes. The tasks that follow are meant to be done without restarting the flappy program (i.e., its memory map must stay consistent for all tasks).

Segment permissions

Verify that the output of pmap $(pgrep flappy) starts with the following segments:

000055556ff96000  12K r-x-- flappy
000055556ff99000   4K r---- flappy
000055556ff99a00   4K rw--- flappy

The third column contains the permission bits of a segment. Only the first three are relevant to this activity, and are as follows:

  • r: Can be read from
  • w: Can be written to
  • x: Can execute data as instructions

The first flappy segment has the permissions r and x. Having the x permission set implies that it is the text segment. Not having the w permission means the segment is read-only.

The second flappy segment has only the permission r which implies that it is a read-only portion of the data segment.

The third flappy segment has the permissions r and w which implies that it is a read-write portion of the data segment.

Actual task

Before proceeding, ensure you have made a record of the following:

  • Address of the auipc a5, 0x3 instruction in Part 2 (already stated in the document)
  • Address of the gamestate variable in Part 2
  • Address held by the pointer variable in Part 3
  • Address held by the bird variable in Part 4

For each address above, determine in which segment it is located using the output of pmap generated earlier.

Bonus: Calling hidden functions

Go through the output of nm flappy and look for a function that hints that it is called when debugging to disable collision checks.

Find a way to reliably call the said function using GDB without causing the game to crash. To verify that you have done this correctly, the game should not end when the bird hits the ground or any of the pipes.

Hints

It is possible to set register values through GDB via set $<reg> = <expr>.

You may want to set a breakpoint at a location that would not be problematic when forcing the calling of a function from that location.

When hitting a breakpoint, the instruction at the breakpoint has yet to be executed.

While the GDB j command is not necessary (as you are expected to use set), note that there is a difference between j set_game_over and j *set_game_over; the former skips over the prologue of the function while the latter does not.

Video demonstration

The main deliverable for this activity is an MP4 video of you demonstrating the following (with all explanations verbally delivered):

  1. [30pts] How the address of the gamestate variable in Part 2 is computed from the disassembly
    • The content of the gamestate variable must be printed using GDB
  2. [40pts] How the address held by the pointer variable in Part 3 is computed from the disassembly, register values, and memory content
    • The int value in the said address must be printed using GDB (must be consistent with score shown on screen)
    • The int value in the said address must be changed to 1000 with the game properly reflecting the change
  3. [40pts] How the address held by the bird variable in Part 4 is computed from the disassembly, register values, and memory content
    • The float value in the said address must be printed using GDB
    • The float value in the said address must be changed to some other value between 0 and 500 (inclusive) with the game properly reflecting the change
  4. [40pts] In which segment each of the following is located (using the output of pmap which must be visible in the video):
    1. [10pts] Address of the auipc a5, 0x3 instruction in Part 2
    2. [10pts] Address of the gamestate variable in Part 2
    3. [10pts] Address held by the pointer variable in Part 3
    4. [10pts] Address held by the bird variable in Part 4
  5. [50pts] Bonus task demonstration
    • Each step done must be verbally explained
    • The bird must be shown to slide through the ground and go through pipes without triggering a game over

Video requirements are as follows:

  • Both your face and your screen must be visible throughout the video
  • The flappy window and the terminal must be visible side-by-side
  • Text on screen must be legible in the video
  • Video length must be at most 20 minutes

You may use Zoom to record this as it allows the inclusion of both your screen and your camera in the same video recording.

flappy must stay running throughout the duration of the video.

Deductions

Deductions will be given for each restriction that is not adhered to.

Readme file

A README.md file should also be included containing the following (omitting those that have not been accomplished):

  1. Your full name and student number
  2. Timestamps of the start of the video segment for each of the following:
    • Item #1
    • Item #2
    • Item #3
    • Item #4a
    • Item #4b
    • Item #4c
    • Item #4d
    • Item #5

Warning

Your submission will not be graded if this is not submitted properly.

Submission

  • Google Classroom deliverables:
    • cs21252lab03.mp4
    • README.md
  • Deadline: March 11 (Wed), 11:59 PM
  • Maximum score: 200/100