如何无需解压直接读取tar.gz压缩包中子文件夹内的表格文件？

阿华AIGC实验室

2026-5-11

Reading a Specific Nested File from a tar.gz Archive in R (No Full Extraction Needed)

Great question! You absolutely can read a specific nested file (like firstf/secondf/table.txt) inside a tar.gz archive directly in R—no need to extract the entire archive or delete the original compressed file afterward. Here are three reliable, straightforward methods:

Method 1: Use Base R's `tar()` Function

Base R has a built-in tar() function that lets you target specific files in an archive and stream their content directly into a connection. This works without any extra packages:

# Create a connection to the nested file inside the tar.gz
file_conn <- tar(
  tarfile = "myFile.tar.gz",
  files = "firstf/secondf/table.txt",
  stdout = TRUE
)

# Read the table from the connection
myData <- read.table(file_conn)

# Always close the connection when finished to free resources
close(file_conn)

Method 2: Tidyverse-Friendly Approach with `readr`

If you’re already using the tidyverse, the readr package has a convenient tar_connection() function that simplifies this workflow. It handles text formatting nicely and integrates smoothly with other tidyverse tools:

library(readr)

# Create a direct connection to the nested file
tar_conn <- tar_connection(
  archive = "myFile.tar.gz",
  file = "firstf/secondf/table.txt"
)

# Read the table (use read_csv() if it's a CSV instead of a plain table)
myData <- read_table(tar_conn)

# Clean up the connection
close(tar_conn)

Method 3: Flexible Handling with the `archive` Package

For broader support of different compression formats (tar, zip, 7z, etc.), the archive package is a great choice. It lets you read the target file in one concise line:

library(archive)

# Read the specific nested file directly into your data frame
myData <- read.table(archive_read(
  "myFile.tar.gz",
  file = "firstf/secondf/table.txt"
))

Quick Notes:

Double-check the file path inside the archive—paths are case-sensitive on Linux/macOS, so make sure it matches exactly what’s in the tar.gz.
All these methods stream the file content instead of extracting the entire archive, which is much more memory-efficient for large files.

内容的提问来源于stack exchange，提问作者Ahdee