Crafting and Executing a Polyglot File: A PNG Image with Embedded Data
Polyglot Files
Introduction
A polyglot file is a file format that can be interpreted in more than one way by different systems. Understanding how to manipulate and utilize polyglot files can be valuable in many IT and security-related fields. For instance, polyglots can be used for data exfiltration, obfuscation, and bypassing input validation.
In this blog post, we'll explore how to create a polyglot file that is both a valid PNG image and a functional Python script. We'll discuss how you can execute the Python script portion using the Unix dd command, bypassing the need for complex extraction tools. You can find the complete code here. We will also discuss how to hide binary data that you can pass as command-line arguments to another program or just hide it for the fun of it
How to Generate the PNG Image
First, we use the Python Imaging Library (PIL) to create a simple PNG image. Here, we're making a 10x10 pixel blue image.
from PIL import Image
import io
img = Image.new("RGB", (10, 10), color="blue")
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format="PNG")
png_data = img_byte_arr.getvalue()
How to Embed the Python Script
After creating the image, we append a Python script to it. We start by adding a marker #--PYTHON--# to denote the beginning of the Python script. This helps in identifying where the script starts within the binary file.
marker = b"\n\n#--PYTHON--#\n"
python_script = b"""#!/usr/bin/env python3
print('Hello from your polyglot file!')
polyglot_data = png_data + marker + python_script
How to Save the Polyglot File
We then save this combined data to a file and ensure it has the correct permissions to be executed.
polyglot_path = "executable_polyglot.png"
with open(polyglot_path, "wb") as file:
file.write(polyglot_data)
import os
os.chmod(polyglot_path, 0o755)
How to Execute the Python Script
To run the embedded script without manually extracting it, you can use the dd command in Unix-based systems. This command is powerful for handling and transferring fixed blocks of data.
Why Use the dd Command?
The dd command is traditionally used for copying and converting raw data. It is extremely useful in our case to skip the initial part of the file that contains the PNG data.
Since we know the exact byte where the Python script starts after the #--PYTHON--# marker (offset 90 bytes in this example), we can instruct dd to skip directly to this point and pipe the result to Python for execution:
dd if=executable_polyglot.png bs=1 skip=90 | python3
Storing Raw Bytes in an Image
In our previous exploration, we embedded a Python script into a PNG and used dd
to execute it. What if you don’t want to embed a script, but raw binary data instead—maybe a key, a configuration blob, or any arbitrary bytes?
Now we will how to create a polyglot that is simultaneously a valid PNG and a container for binary data. We’ll still rely on the power of dd
(and other Unix tools) to extract exactly what we need.
Generating the PNG
As before, we use the Pillow (PIL) library to create a small PNG image (10×10 blue). We store it in memory via Python’s io.BytesIO()
. This is straightforward, ensuring we’re producing valid PNG header and chunk data.
Appending a Marker and Binary Data
In conclusion, this approach illustrates not only the versatility of combining different data types into a single file but also demonstrates practical command-line skills for extracting and executing parts of binary files.
Unlike plain-text or ASCII scripts, binary data might contain null bytes, Unicode code points, or other non-printable sequences that could confuse many text-based tools. To keep extraction simple, we:
Insert a marker (
#--BINARY--#
) that is somewhat easy to spot usingstrings
orgrep
.Immediately follow that marker with our arbitrary binary data, which can include any bytes from
\x00
to\xFF
.
Saving the Polyglot File
We combine the original PNG bytes + the marker + the binary payload into one file, binary_polyglot.png
.
import os
import io
from PIL import Image
# 1. Create a simple PNG in memory (10x10 blue image).
img = Image.new("RGB", (10, 10), color="blue")
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format="PNG")
png_data = img_byte_arr.getvalue()
# 2. Define a marker that we'll use to locate the embedded data.
# We put "\n" before it so it doesn't accidentally form a valid chunk in the PNG data,
# but the main reason is just to have a textual pivot point we can 'grep' for.
marker = b"\n#--BINARY--#\n"
# 3. Here is an example of arbitrary bytes.
# This can include null bytes, high-ASCII bytes, control codes, etc.
# We'll illustrate that by mixing text, binary, and escape sequences.
# '\x00' = Null byte
# '\xFE' = 254 in decimal
# '\xFF' = 255 in decimal
# '\n' = newline
binary_data = b"\x00\x01\x02\x03\xFE\xFF\x00\x10"
# Combine the PNG data + marker + arbitrary binary data
polyglot_data = png_data + marker + binary_data
# 4. Write out the combined file
polyglot_path = "binary_polyglot.png"
with open(polyglot_path, "wb") as f:
f.write(polyglot_data)
# 5. (Optional) Make it executable if you want to treat it like a script
os.chmod(polyglot_path, 0o755)
print(f"[+] Created '{polyglot_path}' with {len(polyglot_data)} bytes total.")
print("[+] You can open it in an image viewer to see the 10x10 blue square.")
print("[+] You can also locate '#--BINARY--#' to extract the raw binary bytes.")
Viewed in an image viewer, it’s just a 10×10 blue square. But the extra data is waiting for us at the end.
Extracting the Raw Binary Data
To extract the raw bytes, you must skip the PNG portion until you reach the marker. First, find the marker offset. Several methods are possible but we will use the strings command
strings -a -t d binary_polyglot.png | grep "#--BINARY--#"
Assuming that the offset is at 78, the marker length is 14, then your actual data starts at 92. You can then extract the data and pass it to another program like so:
some_program "$(dd if=binary_polyglot.png bs=1 skip=92 2>/dev/null | base64)"
The complete code can be found here
Conclusion
By combining image data with either a Python script or raw bytes, we’ve created polyglot files that function both as valid PNG images and containers for hidden data. Leveraging simple tools like dd
(and optionally base64
) makes it straightforward to extract or execute the embedded content without specialized utilities. This approach highlights how file formats can be blended to achieve obfuscation, data exfiltration, or just a fun demonstration of Unix command-line skills. As you explore these techniques further whether embedding scripts or arbitrary data—remember to use them responsibly and ethically. You can adapt the same process to many file types and data formats, showcasing the versatility of polyglot files in various security and development scenarios.