Skip to content

CS 21 25.2 Laboratory Exercise 4

Machine Language and Binary Patching

🆕 Prerequisites

You will need to install the following to do this activity (already installed on TL machines):

  1. ImHex v1.38.1
  2. RISC-V version of objdump
  3. qemu-riscv32 (not qemu-system-riscv32)

You may find installation instructions below.

🆕 ImHex v1.38.1

  1. Visit https://github.com/WerWolv/ImHex/releases/tag/v1.38.1 and click Assets to show the download links
  2. Download the appropriate file for your operating system:
    • Windows: imhex-1.38.1-Windows-x86_64.msi
    • macOS (ARM/Apple Silicon): imhex-1.38.1-macOS-arm64.dmg
    • macOS (Intel): imhex-1.38.1-macOS-x86_64.dmg
    • Linux: imhex-1.38.1-x86_64.AppImage
  3. Install ImHex using the downloaded file:
    • Windows: Double-click the .msi installer and follow the instructions
    • macOS: Double-click the .dmg file to mount the image, then drag the ImHex.app icon to the Applications icon in the window popup
    • Linux: Run chmod +x imhex-1.38.1-x86_64.AppImage (and optionally rename it to imhex), then optionally make it visible in your PATH (e.g., place it in /usr/local/bin/)

🆕 RISC-V version of objdump

The objdump available on WSL, macOS, and Linux is incompatible with the binary file to be disassembled for this activity as it is in RISC-V.

To install the objdump verison that is compatible with the RISC-V binaries, do the following for your operating system:

  • Windows:
    1. Install WSL
    2. Run sudo apt update && sudo apt install -y gcc-riscv64-unknown-elf in the terminal
    3. Replace all objdump-rv32 instances in the commands below with riscv64-unknown-elf-objdump (or optionally, run "alias objdump-rv32='riscv64-unknown-elf-objdump'" >> ~/.bashrc && source ~/.bashrc to set an alias to retain the use of objdump-rv32)
  • macOS:
    1. Install Homebrew by following the instructions here: https://brew.sh/
    2. Run brew install riscv64-elf-binutils in the terminal
    3. Replace all objdump-rv32 instances in the commands below with riscv64-unknown-elf-objdump (or optionally, run "alias objdump-rv32='riscv64-unknown-elf-objdump'" >> ~/.zshrc && source ~/.zshrc to set an alias to retain the use of objdump-rv32)
  • Linux:
    1. Run sudo apt update && sudo apt install -y gcc-riscv64-unknown-elf in the terminal
    2. Replace all objdump-rv32 instances in the commands below with riscv64-unknown-elf-objdump (or optionally, run "alias objdump-rv32='riscv64-unknown-elf-objdump'" >> ~/.bashrc && source ~/.bashrc to set an alias to retain the use of objdump-rv32)

🆕 qemu-riscv32

While running a 32-bit RISC-V executable file may be done on an emulated 32-bit RISC-V machine via qemu-system-riscv32, there are no regularly maintained Linux distributions for 32-bit RISC-V (as there is little to no reason to use it over the 64-bit variant).

As a workaround, the user mode emulation variant of QEMU for 32-bit RISC-V, qemu-riscv32, may be used to run 32-bit RISC-V executable files. This variant requires

qemu-riscv32 vs. qemu-system-riscv32

Note that qemu-riscv32 (user mode emulator) is not the same as qemu-system-riscv32 (full system emulator).

To install qemu-riscv32, do the following for your operating system:

  • Windows:
    1. Install WSL
    2. Run sudo apt update && sudo apt install -y qemu-user in the terminal
  • macOS:
    1. Install OrbStack (https://orbstack.dev/) or Docker Desktop (https://www.docker.com/products/docker-desktop/), then run it
    2. Replace qemu-riscv32 instances in the commands below with:
      docker run --rm -it -v "$PWD":/work -w /work jfmcoronel/qemu-riscv32 qemu-riscv32
      
      • Note that the very first run will download the Docker image which may take a while
      • Ensure that OrbStack or Docker Desktop is running whenever you execute this command
      • Sample replacement for the command qemu-riscv32 lab04:
        docker run --rm -it -v "$PWD":/work -w /work jfmcoronel/qemu-riscv32 qemu-riscv32 lab04
        
  • Linux:
    1. Run sudo apt update && sudo apt install -y qemu-user in the terminal

General Instructions

For this laboratory activity, you are to work on one HOPELEx Checkpoint Item that involves the 32-bit RISC-V machine language and the binary file format ELF.

Important Reminder

You must finish and show your working checkpoint tasks to your lab handler before the end of the laboratory period. HOPELEx checkpoint items contribute 1% to your lab grade.

Overview

You will be patching the executable file of a text-based RISC-V program to modify its behavior without having access to its source code.

In particular, you will be doing the following:

  1. Analyze the disassembly of the given binary file
  2. Identify which instructions can be changed and their file offsets in the binary file
  3. Changing the instructions byte-wise using a hex editor
  4. Creating a binary patching program in C that automates the said changes

Target program

The program to be analyzed and patched, lab04, may be downloaded from here: https://drive.google.com/file/d/1K_iMAeB7h4y0t-1APmFte-zdpPhniEDY/view?usp=sharing

Do now!

Download lab04 and place it in some easily accessible directory.

Navigate to the said directory using the terminal, then run chmod +x lab04 to make lab04 executable (i.e., it can now be executed by doing qemu-riscv32 lab04 in its directory).

lab04 implements a fully offline trial period scheme that prevents the user from proceeding once the period is up. Its trial period is set to 10 seconds from the first time the program is executed.

The checkpoint task requires you to understand how lab04 computes how much time is left for the trial period; you are to do so using its disassembly.

objdump

objdump is a Linux utility that disassembles a given binary file. A version of objdump that works with 32-bit RISC-V binaries is accessible using DCS TL computers via objdump-rv32.

The arguments of the command objdump-rv32 -d --visualize-jumps <file> are as follows:

  • -d: Disassembles all sections that contain instructions
  • --visualize-jumps: Adds arrows from branch instructions to their target instruction
  • <file>: Binary file to disassemble

Tip: Redirecting program output to a file

As the output is large, redirecting it to a file (say dump.txt) may help in going through it. To do so, append > dump.txt to the objdump-rv32 command mentioned earlier.

Do now!

Generate the disassembly for lab04 using objdump-rv32 (and ideally save its output in a file).

Part of the objdump-rv32 output for lab04 is as follows:

00010318 <_start>:
   10318:       034000ef                jal     1034c <load_gp>
   1031c:       00050793                mv      a5,a0
   10320:       00000517                auipc   a0,0x0
   10324:       02850513                addi    a0,a0,40 # 10348 <__wrap_main>
   10328:       00012583                lw      a1,0(sp)
   1032c:       00410613                addi    a2,sp,4
   10330:       ff017113                andi    sp,sp,-16
   10334:       00000693                li      a3,0
   10338:       00000713                li      a4,0
   1033c:       00010813                mv      a6,sp
   10340:       460000ef                jal     107a0 <__libc_start_main>
   10344:       00100073                ebreak

00010318 <_start>: means that the memory address 0x00010318 is mapped to the label _start.

Regarding 10318: 034000ef jal 1034c <load_gp>:

  • 10308 means that the instruction starts at address 0x10308
  • 034000ef means that the byte sequence 0xef 0x00 0x40 0x03 may be found starting from the address stated earlier
  • jal 1034c <load_gp> means that the byte sequence above corresponds to the instruction jal 0x1034c where 0x1034c is equal to the load_gp label (when taken as an address)

jal pseudoinstruction form

Recall that jal <addr> is shorthand for jal ra <addr>.

Do now!

Go through the disassembly of lab04, locate main, and identify the address of the instruction that sets its return value (i.e., the last instruction that sets a0 before ret is executed).

Built-in C functions

You may need to review the following C functions:

Hex editor

You will be using the ImHex hex editor (https://github.com/WerWolv/ImHex) for manual binary patching.

To run this on the DCS TL computers, simply execute imhex in the terminal.

Opening a file for inspection (which you may need to drag from the file explorer) will cause several panes to display. Only the Hex editor pane is relevant for this activity.

Each row in the Hex editor pane starts with the file offset, followed by 16 bytes (in hex) corresponding to the sequence starting at that offset, and finally the ASCII symbol equivalents of each byte.

The status bar at the bottom of the window is useful for identifying the current file offset the cursor is under, which range of file offsets are highlighted, and how many bytes are currently highlighted.

Executable and Linkable Format (ELF)

This exercise involves statically modifying the code of the given program by changing the raw bytes of the executable file, so it makes sense to understand how the said file is formatted.

Recall that an assembler outputs machine code, as well as additional information on memory segments such as .data. These outputs must be arranged in a standardized format so that they can be properly loaded into memory just before execution.

One such format is the Executable and Linkable Format (ELF) which is used primarily by Linux. An ELF file is divided into the ELF header, program header, data, and section header portions.

ELF header

An ELF file starts with the ELF header which contains metadata, or data about the ELF data in the file. The ELF header is 52 bytes wide (for the 32-bit format).

The program header starts with four magic bytes 0x7f, 0x45, 0x4c, and 0x46 (where the sequence 0x45 0x4c 0x46 is ELF in ASCII) which signal that the file format is ELF.

The following byte (at file offset 0x4) denotes whether the ELF file is in the 32-bit (0x01) or 64-bit (0x02) format. For 32-bit RISC-V, this is expected to be 0x01.

The byte after that (at file offset 0x5) denotes whether the ELF file is in little endian (0x01) or big endian (0x02).

To find out from which file offset the .text section (segment) starts in the file, you will need to parse the section header of .text.

Do now!

Use ImHex to verify that lab04 starts with the ELF magic bytes, is in the 32-bit ELF format, and stores values in little endian.

ELF section header array

Each program section (e.g., .text, .data) has an associated section header which contains information about the associated section. Each section header is 40 bytes wide (for the 32-bit format).

File offset 0x20 (which is part of the ELF header) contains the four-byte file offset denoting where the first section header starts. The second section header starts right after the first (i.e., 40 bytes from the start of the first), and so on until the end of the file, forming a contiguous array of section headers.

Do now!

Use ImHex to identify the file offset of the first section header via forming the four-byte integer at file offset 0x20.

Tip: Shortcut for jumping to file offset

In ImHex, Ctrl-G or Cmd-G can be used to easily jump to a file offset by specifying it in hexadecimal (prefixed with 0x) or decimal.

.shstrtab section

The .shstrtab section is a special section that contains a sequence of null-terminated strings. Each null-terminated string corresponds to a name of an existing section (e.g., .text).

File offset 0x32 (which is also part of the ELF header) contains a two-byte unsigned integer that denotes the index of the .shstrtab header in the section header array.

Do now!

Use ImHex to do the following:

  • Identify the index \(n\) containing the .shstrtab section header
  • Jump to the first index of the section header array
  • Move forward by some multiple of 40 bytes (size of each section header) to get to index \(n\) containing the .shstrtab section header
  • Examine the four-byte value at offset 0x10 of the said section header; this value is the location (file offset) of the .shstrtab data in the file (which should be a sequence of null-terminated strings)
  • Record this file offset as \(S\)

ELF section header

Each section header starts with a four-byte offset into the data of .shstrtab. The specified location contains the name of the section.

Relevant offsets from the start of a section header are as follows:

  • 0x0c: Default starting memory address of the section (four-byte address)
  • 0x10: Location of the data of the section in the file (four-byte file offset)
  • 0x14: Size in bytes of the data of the section (four-byte unsigned integer)

Do now!

Use ImHex to do the following:

  • Identify the offset from \(S\) (obtained in the previous section) containing .text
  • Identify which section header corresponds to .text by going through each header and checking which of them have the name .text
  • Examine offset 0x10 of the .text section header to get the location of the machine instructions in the file
  • Verify that the four-byte value at the said location matches 0xff010113

Mapping objdump addresses and ELF file offsets

For this section, we will be doing the following:

  1. Identifying the memory address in which the last li a5, 0 instruction of the main function can be found
  2. Locating the file offset of the said instruction using the identified memory address and .text metadata
  3. Changing the instruction from li a5, 0 to li a5, 21 using a hex editor
  4. Locating the file offset of the string Trial period has lapsed.\n
  5. Modifying the said string using a hex editor

Memory addresses via objdump-rv32

As mentioned earlier, objdump-rv32 lists the starting memory address of each instruction in .text.

Do now!

Verify that the objdump-rv32 disassembly of lab04 shows that the last li a5, 0 instruction of main is mapped to memory address 0x1069c.

File offsets via .text metadata

Recall that offset 0x0c of a section header contains the default starting memory address of the section as a four-byte value.

Suppose that:

  • The default starting memory address of .text is 0x10000 (as seen in its 0x0c offset)
  • objdump-rv32 shows that li a5, 0 is in memory address 0x1069c

The offset of li a5, 0 with respect to the start of the data of the .text section can be computed via 0x1069c - 0x10000 = 0x69c.

Recall as well that the four-byte value stored starting offset 0x10 of the .text header contains the file offset pertaining to the start of the data of the .text section. Suppose this value is 0x2000.

Since we know that the start of the .text section data is at file offset 0x2000 and that the offset of li a5, 0 from this is 0x69c, we can compute the file offset of li a5, 0 by doing 0x2000 + 0x69c = 0x269c.

Do now!

Compute the actual file offset of the last li a5, 0 instruction of main and verify in ImHex that the four-byte machine instruction 0x00000793 is found in that file offset.

Recall that the starting memory address of .text can be found in its section header.

Patching instructions

To change bytes in ImHex, highlight the sequence of bytes to be changed, right click, select Fill..., specify the new sequence of bytes, then click Set. Ensure you press Save to update the binary file with your changes.

Do now!

Change the aforementioned li a5, 0 instruction to li a5, 21 using ImHex.

To verify that you have done this correctly, run ./lab04 while in the directory containing it, run echo $?, and check whether 21 is printed out.

Locating and patching strings

Notice that the disassembly of check_trial has the following instructions:

1044c:   000947b7      lui     a5,0x94
10450:   de07a703      lw      a4,-544(a5) # 93de0 <filepath>

Here, objdump-rv32 is saying that memory address 0x93de0 corresponds to a label called filepath. In the C source, filepath is declared as a char *, so it must contain an address to a char (or a sequence of chars).

filepath itself lives in the .sdata section (static data section), but it is pointing to a location in the .rodata section (readonly data section).

Do now!

Using the section header of .sdata, locate (1) the starting memory address of the .sdata section and (2) the file offset of the .sdata section data, then use them to compute the file offset of filepath. Take the four-byte value there to be \(P\) (which is a memory address).

Then, using the section header of .rodata, locate (1) the starting memory address of the .rodata section and (2) the file offset of the .rodata section data, then use them to compute the file offset of \(P\).

What null-terminated string is this? What do you think this string pertains to?

Checkpoint Task

Make a C program called checkpoint that:

  1. Adds 5 seconds to the trial period upon executing ./checkpoint 1

  2. Patches the main function in lab04 in some way such that the trial period check has no effect upon executing ./checkpoint 2

  3. Patches the check_trial function in lab04 in some way such that the trial period check has no effect upon executing ./checkpoint 3

lab04 is assumed to be in the same folder as checkpoint. No error handling is needed (i.e., assume all operations will succeed).

Machine instruction conversion

You are expected to convert instructions into their machine instruction equivalents and store them in little endian order.

Tip: Patching out instructions with nop

"Patching out" instructions by replacing them with nop is a viable approach.

checkpoint.c template

You may use the following main template for checkpoint.c:

#include <string.h>

int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("No argument provided\n");
        return 1;
    }

    if (strcmp(argv[1], "1") == 0) {
        // Do Item 1
    } else if (strcmp(argv[1], "2") == 0) {
        // Do Item 2
    } else if (strcmp(argv[1], "3") == 0) {
        // Do Item 3
    }

    return 0;
}

🆕 Additional lab report deliverable for take-home submissions

Submit a file lab04.pdf together with your checkpoint.c with the following:

  1. [40pts] An explanation of what your implementation of ./checkpoint 1 does at the byte level to increase the trial period of lab04 by 5 seconds (ensure your explanation refers to actual lines of code in checkpoint.c)
  2. [30pts] An explanation of which instructions of main are changed by ./checkpoint 2 into which instructions and how they are patched (ensure your explanation refers to actual lines of code in checkpoint.c)
  3. [30pts] An explanation of which instructions of check_trial are changed by ./checkpoint 3 into which instructions and how they are patched (ensure your explanation refers to actual lines of code in checkpoint.c)

🆕 Expected deliverables

You are expected to submit the following on or before March 11 (Wed), 11:59 PM via Google Classroom:

  • checkpoint.c
  • lab04.pdf

No need to submit dump.txt, item2.txt, and item3.txt.

🆕 Ignore this section for take-home submissions: To do during checking

🆕 Item 1 (60 points)

  1. Run ./lab04 and show that the trial period has lapsed
  2. Run ./checkpoint 1
  3. Run ./lab04 and show that the trial period has increased by ~5 seconds (no need to show or explain implementation)

🆕 Item 2 (20 points)

  1. Make a copy of lab04 in the same directory (any filename)
  2. Run ./lab04 repeatedly until it says that the trial period has lapsed
  3. Run ./checkpoint 2
  4. Run ./lab04 and show that the trial period check has no effect
  5. Explain what exactly was patched and show the relevant source code of checkpoint
  6. Run objdump-rv32 -d --visualize-jumps > item2.txt and show the instructions of main that were patched

🆕 Item 3 (20 points)

  1. Delete lab04 and rename the backup made earlier as lab04
  2. Run ./lab04 repeatedly until it says that the trial period has lapsed
  3. Run ./checkpoint 3
  4. Run ./lab04 and show that the trial period check has no effect
  5. Explain what exactly was patched and show the relevant source code of checkpoint
  6. Run objdump-rv32 -d --visualize-jumps > item3.txt and show the instructions of check_trial that were patched