Aug 29, 2017

Splitting and Re-Assembling Files in Linux

Linux has several utilities for splitting up files.
So why would you want to split your files? One use case is to split a large file into smaller sizes so that it fits on smaller media, like USB sticks. This is also a good trick to transfer files via USB sticks when you're stuck with FAT32, which has a maximum file size of 4GB, and your files are bigger than that. Another use case is to speed up network file transfers, because parallel transfers of small files are usually faster.

We'll learn how to use csplit, split, and cat to chop up and then put files back together. These work on any file type: text, image, audio, .iso, you name it.

Split Files With csplit

csplit is one of those funny little commands that has been around forever, and when you discover it you wonder how you ever made it through life without it. csplit divides single files into multiple files. This example demonstrates its simplest invocation, which divides the file foo.txt into three files, split at line numbers 1 and 3:
$ csplit foo.txt 1
0
15
csplit creates three new files in the current directory, and prints the sizes of your new files in bytes. By default, each new file is named xxnn:
Splitting and Re-Assembling Files in Linux
Splitting and Re-Assembling.
You can view the first ten lines of each of your new files all at once with the head command:
$ head xx0*
==> xx00 <==

==> xx01 <==
2591
3889
2359
What if you want to split a file into several files all containing the same number of lines? Specify the number of lines, and then enclose the number of repetitions in curly braces. This example repeats the split 4 times, and dumps the leftover in the last file:
$ csplit foo.txt 1 {3}
0
5
5
5
You may use the asterisk wildcard to tell csplit to repeat your split as many times as possible. Which sounds cool, but it fails if the file does not divide evenly:
$ csplit foo.txt 1 {*}
0
5
5
5
The default behavior is to delete the output files on errors. You can foil this with the -k option, which will not remove the output files when there are errors. Another gotcha is every time you run csplit it overwrites the previous files it created, so give your splits new filenames to save them. Use --prefix=prefix to set a different file prefix:
Splitting and Re-Assembling Files in Linux
Splitting and Re-Assembling.

Splitting Files into Sizes

split is similar to csplit. It splits files into specific sizes, which is fabulous when you're splitting large files to copy to small media, or for network transfers. The default size is 1000 lines:
$ split foo.txt
$ ls
foo.txt  mine00  mine01  xaa
They come out to a similar size, but you can specify any size you want. This example is 20 megabytes:
$ split -b 20M foo.txt
The size abbreviations are K, M, G, T, P, E, Z, Y (powers of 1024), or KB, MB, GB, and so on for powers of 1000.
Choose your own prefix and suffix for the filenames:
$ split -a 3 --numeric-suffixes=9 --additional-suffix=mine foo.txt SB
240K Aug 21 17:44 SB009mine
214K Aug 21 17:44 SB010mine
220K Aug 21 17:44 SB011mine
The -a controls how many numeric digits there are. --numeric-suffixes sets the starting point for numbering. The default prefix is x, and you can set a different prefix by typing it after the filename.
Nara: Here

 

AdBlock Detected!

Like this blog? Keep us running by whitelisting this blog in your ad blocker.

This is how to whitelisting this blog in your ad blocker.

Thank you!

×