CS 21 25.2 Laboratory Exercise 4
Machine Language and Binary Patching
🆕 Prerequisites
You will need to install the following to do this activity (already installed on TL machines):
- ImHex v1.38.1
- RISC-V version of
objdump qemu-riscv32(notqemu-system-riscv32)
You may find installation instructions below.
🆕 ImHex v1.38.1
- Visit https://github.com/WerWolv/ImHex/releases/tag/v1.38.1 and click Assets to show the download links
- Download the appropriate file for your operating system:
- Windows:
imhex-1.38.1-Windows-x86_64.msi - macOS (ARM/Apple Silicon):
imhex-1.38.1-macOS-arm64.dmg - macOS (Intel):
imhex-1.38.1-macOS-x86_64.dmg - Linux:
imhex-1.38.1-x86_64.AppImage
- Windows:
- Install ImHex using the downloaded file:
- Windows: Double-click the
.msiinstaller and follow the instructions - macOS: Double-click the
.dmgfile to mount the image, then drag theImHex.appicon to theApplicationsicon in the window popup - Linux: Run
chmod +x imhex-1.38.1-x86_64.AppImage(and optionally rename it toimhex), then optionally make it visible in yourPATH(e.g., place it in/usr/local/bin/)
- Windows: Double-click the
🆕 RISC-V version of objdump
The objdump available on WSL, macOS, and Linux is incompatible with the binary file to be disassembled for this activity as it is in RISC-V.
To install the objdump verison that is compatible with the RISC-V binaries, do the following for your operating system:
- Windows:
- Install WSL
- Run
sudo apt update && sudo apt install -y gcc-riscv64-unknown-elfin the terminal - Replace all
objdump-rv32instances in the commands below withriscv64-unknown-elf-objdump(or optionally, run"alias objdump-rv32='riscv64-unknown-elf-objdump'" >> ~/.bashrc && source ~/.bashrcto set an alias to retain the use ofobjdump-rv32)
- macOS:
- Install Homebrew by following the instructions here: https://brew.sh/
- Run
brew install riscv64-elf-binutilsin the terminal - Replace all
objdump-rv32instances in the commands below withriscv64-unknown-elf-objdump(or optionally, run"alias objdump-rv32='riscv64-unknown-elf-objdump'" >> ~/.zshrc && source ~/.zshrcto set an alias to retain the use ofobjdump-rv32)
- Linux:
- Run
sudo apt update && sudo apt install -y gcc-riscv64-unknown-elfin the terminal - Replace all
objdump-rv32instances in the commands below withriscv64-unknown-elf-objdump(or optionally, run"alias objdump-rv32='riscv64-unknown-elf-objdump'" >> ~/.bashrc && source ~/.bashrcto set an alias to retain the use ofobjdump-rv32)
- Run
🆕 qemu-riscv32
While running a 32-bit RISC-V executable file may be done on an emulated 32-bit RISC-V machine via qemu-system-riscv32, there are no regularly maintained Linux distributions for 32-bit RISC-V (as there is little to no reason to use it over the 64-bit variant).
As a workaround, the user mode emulation variant of QEMU for 32-bit RISC-V, qemu-riscv32, may be used to run 32-bit RISC-V executable files. This variant requires
qemu-riscv32 vs. qemu-system-riscv32
Note that qemu-riscv32 (user mode emulator) is not the same as qemu-system-riscv32 (full system emulator).
To install qemu-riscv32, do the following for your operating system:
- Windows:
- Install WSL
- Run
sudo apt update && sudo apt install -y qemu-userin the terminal
- macOS:
- Install OrbStack (https://orbstack.dev/) or Docker Desktop (https://www.docker.com/products/docker-desktop/), then run it
- Replace
qemu-riscv32instances in the commands below with:- Note that the very first run will download the Docker image which may take a while
- Ensure that OrbStack or Docker Desktop is running whenever you execute this command
- Sample replacement for the command
qemu-riscv32 lab04:
- Linux:
- Run
sudo apt update && sudo apt install -y qemu-userin the terminal
- Run
Relevant Links
- Google Classroom (slides, submission bin): https://classroom.google.com/u/2/c/ODQwNDkwODYxODg3
- RISC-V Green Card: https://drive.google.com/file/d/1xSll1ON5cSaOQhoGxpkvr4fKe7lGQFy3/view
General Instructions
For this laboratory activity, you are to work on one HOPELEx Checkpoint Item that involves the 32-bit RISC-V machine language and the binary file format ELF.
Important Reminder
You must finish and show your working checkpoint tasks to your lab handler before the end of the laboratory period. HOPELEx checkpoint items contribute 1% to your lab grade.
Overview
You will be patching the executable file of a text-based RISC-V program to modify its behavior without having access to its source code.
In particular, you will be doing the following:
- Analyze the disassembly of the given binary file
- Identify which instructions can be changed and their file offsets in the binary file
- Changing the instructions byte-wise using a hex editor
- Creating a binary patching program in C that automates the said changes
Target program
The program to be analyzed and patched, lab04, may be downloaded from here: https://drive.google.com/file/d/1K_iMAeB7h4y0t-1APmFte-zdpPhniEDY/view?usp=sharing
Do now!
Download lab04 and place it in some easily accessible directory.
Navigate to the said directory using the terminal, then run chmod +x lab04 to make lab04 executable (i.e., it can now be executed by doing qemu-riscv32 lab04 in its directory).
lab04 implements a fully offline trial period scheme that prevents the user from proceeding once the period is up. Its trial period is set to 10 seconds from the first time the program is executed.
The checkpoint task requires you to understand how lab04 computes how much time is left for the trial period; you are to do so using its disassembly.
objdump
objdump is a Linux utility that disassembles a given binary file. A version of objdump that works with 32-bit RISC-V binaries is accessible using DCS TL computers via objdump-rv32.
The arguments of the command objdump-rv32 -d --visualize-jumps <file> are as follows:
-d: Disassembles all sections that contain instructions--visualize-jumps: Adds arrows from branch instructions to their target instruction<file>: Binary file to disassemble
Tip: Redirecting program output to a file
As the output is large, redirecting it to a file (say dump.txt) may help in going through it. To do so, append > dump.txt to the objdump-rv32 command mentioned earlier.
Do now!
Generate the disassembly for lab04 using objdump-rv32 (and ideally save its output in a file).
Part of the objdump-rv32 output for lab04 is as follows:
00010318 <_start>:
10318: 034000ef jal 1034c <load_gp>
1031c: 00050793 mv a5,a0
10320: 00000517 auipc a0,0x0
10324: 02850513 addi a0,a0,40 # 10348 <__wrap_main>
10328: 00012583 lw a1,0(sp)
1032c: 00410613 addi a2,sp,4
10330: ff017113 andi sp,sp,-16
10334: 00000693 li a3,0
10338: 00000713 li a4,0
1033c: 00010813 mv a6,sp
10340: 460000ef jal 107a0 <__libc_start_main>
10344: 00100073 ebreak
00010318 <_start>: means that the memory address 0x00010318 is mapped to the label _start.
Regarding 10318: 034000ef jal 1034c <load_gp>:
10308means that the instruction starts at address0x10308034000efmeans that the byte sequence0xef 0x00 0x40 0x03may be found starting from the address stated earlierjal 1034c <load_gp>means that the byte sequence above corresponds to the instructionjal 0x1034cwhere0x1034cis equal to theload_gplabel (when taken as an address)
jal pseudoinstruction form
Recall that jal <addr> is shorthand for jal ra <addr>.
Do now!
Go through the disassembly of lab04, locate main, and identify the address of the instruction that sets its return value (i.e., the last instruction that sets a0 before ret is executed).
Built-in C functions
You may need to review the following C functions:
FILE *fopen(const char *restrict pathname, const char *restrict mode)size_t fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream)size_t fwrite(const void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream)int fclose(FILE *stream)int putchar(int c)int puts(const char *s)time_t time(time_t *tloc)
Hex editor
You will be using the ImHex hex editor (https://github.com/WerWolv/ImHex) for manual binary patching.
To run this on the DCS TL computers, simply execute imhex in the terminal.
Opening a file for inspection (which you may need to drag from the file explorer) will cause several panes to display. Only the Hex editor pane is relevant for this activity.
Each row in the Hex editor pane starts with the file offset, followed by 16 bytes (in hex) corresponding to the sequence starting at that offset, and finally the ASCII symbol equivalents of each byte.
The status bar at the bottom of the window is useful for identifying the current file offset the cursor is under, which range of file offsets are highlighted, and how many bytes are currently highlighted.
Executable and Linkable Format (ELF)
This exercise involves statically modifying the code of the given program by changing the raw bytes of the executable file, so it makes sense to understand how the said file is formatted.
Recall that an assembler outputs machine code, as well as additional information on memory segments such as .data. These outputs must be arranged in a standardized format so that they can be properly loaded into memory just before execution.
One such format is the Executable and Linkable Format (ELF) which is used primarily by Linux. An ELF file is divided into the ELF header, program header, data, and section header portions.
ELF header
An ELF file starts with the ELF header which contains metadata, or data about the ELF data in the file. The ELF header is 52 bytes wide (for the 32-bit format).
The program header starts with four magic bytes 0x7f, 0x45, 0x4c, and 0x46 (where the sequence 0x45 0x4c 0x46 is ELF in ASCII) which signal that the file format is ELF.
The following byte (at file offset 0x4) denotes whether the ELF file is in the 32-bit (0x01) or 64-bit (0x02) format. For 32-bit RISC-V, this is expected to be 0x01.
The byte after that (at file offset 0x5) denotes whether the ELF file is in little endian (0x01) or big endian (0x02).
To find out from which file offset the .text section (segment) starts in the file, you will need to parse the section header of .text.
Do now!
Use ImHex to verify that lab04 starts with the ELF magic bytes, is in the 32-bit ELF format, and stores values in little endian.
ELF section header array
Each program section (e.g., .text, .data) has an associated section header which contains information about the associated section. Each section header is 40 bytes wide (for the 32-bit format).
File offset 0x20 (which is part of the ELF header) contains the four-byte file offset denoting where the first section header starts. The second section header starts right after the first (i.e., 40 bytes from the start of the first), and so on until the end of the file, forming a contiguous array of section headers.
Do now!
Use ImHex to identify the file offset of the first section header via forming the four-byte integer at file offset 0x20.
Tip: Shortcut for jumping to file offset
In ImHex, Ctrl-G or Cmd-G can be used to easily jump to a file offset by specifying it in hexadecimal (prefixed with 0x) or decimal.
.shstrtab section
The .shstrtab section is a special section that contains a sequence of null-terminated strings. Each null-terminated string corresponds to a name of an existing section (e.g., .text).
File offset 0x32 (which is also part of the ELF header) contains a two-byte unsigned integer that denotes the index of the .shstrtab header in the section header array.
Do now!
Use ImHex to do the following:
- Identify the index \(n\) containing the
.shstrtabsection header - Jump to the first index of the section header array
- Move forward by some multiple of 40 bytes (size of each section header) to get to index \(n\) containing the
.shstrtabsection header - Examine the four-byte value at offset
0x10of the said section header; this value is the location (file offset) of the.shstrtabdata in the file (which should be a sequence of null-terminated strings) - Record this file offset as \(S\)
ELF section header
Each section header starts with a four-byte offset into the data of .shstrtab. The specified location contains the name of the section.
Relevant offsets from the start of a section header are as follows:
0x0c: Default starting memory address of the section (four-byte address)0x10: Location of the data of the section in the file (four-byte file offset)0x14: Size in bytes of the data of the section (four-byte unsigned integer)
Do now!
Use ImHex to do the following:
- Identify the offset from \(S\) (obtained in the previous section) containing
.text - Identify which section header corresponds to
.textby going through each header and checking which of them have the name.text - Examine offset
0x10of the.textsection header to get the location of the machine instructions in the file - Verify that the four-byte value at the said location matches
0xff010113
Mapping objdump addresses and ELF file offsets
For this section, we will be doing the following:
- Identifying the memory address in which the last
li a5, 0instruction of themainfunction can be found - Locating the file offset of the said instruction using the identified memory address and
.textmetadata - Changing the instruction from
li a5, 0toli a5, 21using a hex editor - Locating the file offset of the string
Trial period has lapsed.\n - Modifying the said string using a hex editor
Memory addresses via objdump-rv32
As mentioned earlier, objdump-rv32 lists the starting memory address of each instruction in .text.
Do now!
Verify that the objdump-rv32 disassembly of lab04 shows that the last li a5, 0 instruction of main is mapped to memory address 0x1069c.
File offsets via .text metadata
Recall that offset 0x0c of a section header contains the default starting memory address of the section as a four-byte value.
Suppose that:
- The default starting memory address of
.textis0x10000(as seen in its0x0coffset) objdump-rv32shows thatli a5, 0is in memory address0x1069c
The offset of li a5, 0 with respect to the start of the data of the .text section can be computed via 0x1069c - 0x10000 = 0x69c.
Recall as well that the four-byte value stored starting offset 0x10 of the .text header contains the file offset pertaining to the start of the data of the .text section. Suppose this value is 0x2000.
Since we know that the start of the .text section data is at file offset 0x2000 and that the offset of li a5, 0 from this is 0x69c, we can compute the file offset of li a5, 0 by doing 0x2000 + 0x69c = 0x269c.
Do now!
Compute the actual file offset of the last li a5, 0 instruction of main and verify in ImHex that the four-byte machine instruction 0x00000793 is found in that file offset.
Recall that the starting memory address of .text can be found in its section header.
Patching instructions
To change bytes in ImHex, highlight the sequence of bytes to be changed, right click, select Fill..., specify the new sequence of bytes, then click Set. Ensure you press Save to update the binary file with your changes.
Do now!
Change the aforementioned li a5, 0 instruction to li a5, 21 using ImHex.
To verify that you have done this correctly, run ./lab04 while in the directory containing it, run echo $?, and check whether 21 is printed out.
Locating and patching strings
Notice that the disassembly of check_trial has the following instructions:
Here, objdump-rv32 is saying that memory address 0x93de0 corresponds to a label called filepath. In the C source, filepath is declared as a char *, so it must contain an address to a char (or a sequence of chars).
filepath itself lives in the .sdata section (static data section), but it is pointing to a location in the .rodata section (readonly data section).
Do now!
Using the section header of .sdata, locate (1) the starting memory address of the .sdata section and (2) the file offset of the .sdata section data, then use them to compute the file offset of filepath. Take the four-byte value there to be \(P\) (which is a memory address).
Then, using the section header of .rodata, locate (1) the starting memory address of the .rodata section and (2) the file offset of the .rodata section data, then use them to compute the file offset of \(P\).
What null-terminated string is this? What do you think this string pertains to?
Checkpoint Task
Make a C program called checkpoint that:
-
Adds 5 seconds to the trial period upon executing
./checkpoint 1Hint: Unix time
Read about Unix time here:
-
Patches the
mainfunction inlab04in some way such that the trial period check has no effect upon executing./checkpoint 2 - Patches the
check_trialfunction inlab04in some way such that the trial period check has no effect upon executing./checkpoint 3
lab04 is assumed to be in the same folder as checkpoint. No error handling is needed (i.e., assume all operations will succeed).
Machine instruction conversion
You are expected to convert instructions into their machine instruction equivalents and store them in little endian order.
Tip: Patching out instructions with nop
"Patching out" instructions by replacing them with nop is a viable approach.
checkpoint.c template
You may use the following main template for checkpoint.c:
#include <string.h>
int main(int argc, char *argv[]) {
if (argc < 2) {
printf("No argument provided\n");
return 1;
}
if (strcmp(argv[1], "1") == 0) {
// Do Item 1
} else if (strcmp(argv[1], "2") == 0) {
// Do Item 2
} else if (strcmp(argv[1], "3") == 0) {
// Do Item 3
}
return 0;
}
🆕 Additional lab report deliverable for take-home submissions
Submit a file lab04.pdf together with your checkpoint.c with the following:
- [40pts] An explanation of what your implementation of
./checkpoint 1does at the byte level to increase the trial period oflab04by 5 seconds (ensure your explanation refers to actual lines of code incheckpoint.c) - [30pts] An explanation of which instructions of
mainare changed by./checkpoint 2into which instructions and how they are patched (ensure your explanation refers to actual lines of code incheckpoint.c) - [30pts] An explanation of which instructions of
check_trialare changed by./checkpoint 3into which instructions and how they are patched (ensure your explanation refers to actual lines of code incheckpoint.c)
🆕 Expected deliverables
You are expected to submit the following on or before March 11 (Wed), 11:59 PM via Google Classroom:
checkpoint.clab04.pdf
No need to submit dump.txt, item2.txt, and item3.txt.
🆕 Ignore this section for take-home submissions: To do during checking
🆕 Item 1 (60 points)
Run./lab04and show that the trial period has lapsedRun./checkpoint 1Run./lab04and show that the trial period has increased by ~5 seconds (no need to show or explain implementation)
🆕 Item 2 (20 points)
Make a copy oflab04in the same directory (any filename)Run./lab04repeatedly until it says that the trial period has lapsedRun./checkpoint 2Run./lab04and show that the trial period check has no effectExplain what exactly was patched and show the relevant source code ofcheckpointRunobjdump-rv32 -d --visualize-jumps > item2.txtand show the instructions ofmainthat were patched
🆕 Item 3 (20 points)
Deletelab04and rename the backup made earlier aslab04Run./lab04repeatedly until it says that the trial period has lapsedRun./checkpoint 3Run./lab04and show that the trial period check has no effectExplain what exactly was patched and show the relevant source code ofcheckpointRunobjdump-rv32 -d --visualize-jumps > item3.txtand show the instructions ofcheck_trialthat were patched