5. Developer Notes
The best way to learn about the structure of libmtcelledit is to study the public API header file, and the programs that use it.
5.1 Example Programs
I find libmtcelledit a very useful library when developing my own software so I thought it would be helpful to include a few of these programs with the mtCellEdit package to demonstrate what can be done.
5.1.1 mtNetLog
This program demonstrates how you can create a small console program to monitor network activity, and then use spreadsheet formulas for accounting. Data is saved as values in a TSV file so its contents can be read at the command line at any time (even while logging). You can run this program at bootup and it will record all network activity. I do this by inserting the following code at the end of file like /etc/rc.d/rc.S or /etc/rc.local :
/usr/bin/mtnetlog -network ppp0 -delay 10 /var/log/mtnetlog_ppp0.tsv & /usr/bin/mtnetlog -network eth0 -delay 10 /var/log/mtnetlog_eth0.tsv &
5.1.2 mtRaft
This program demonstrates how you can use a spreadsheet to make the following jobs easier:
- Format numerical output in a table.
- Table sorting.
- Copying table output to the system clipboard.
5.2 Porting and Hacking mtCellEdit
mtCellEdit was designed and built on a GNU/Linux system, with the GUI created with the help of Qt 5. It should possible to port it anywhere else (with a little effort) as it is just a C/C++ program. I don't have enough time and energy to do this porting myself but feel free to port the code if you want (according to the GNU GPL of course) and tell me or the world about your experiences.
5.3 Command Line Examples
Because the TSV file format is so simple its very easy to read using the command line or a script. This is completely independent of the mtCellEdit GUI or even the core libmtcelledit library. This ensures that your data is completely accessible on any properly equipped GNU operating system.
Here is a summary of some useful commands for manipulating a TSV spreadsheet:
Command | Description |
awk | General purpose parsing, manipulation, and reporting |
cat | Concatenate various files together |
cut | Remove columns |
head | Output the top rows of a file |
paste | Paste one files data after another files data (line by line) |
sed | Simple regexp's |
sort | Sort rows |
tac | Reverse the order of the rows (vertical flip) |
tail | Output the bottom of a file |
tee | Read from stdin and output to stdout and files |
tr | Translate or delete characters |
uniq | Remove duplicate rows (see also sort -u) |
The shell also provides the usual redirections:
> | Create a new file |
>> | Append rows to a file |
5.3.1 awk
Writing a small awk program to process a TSV sheet offers a few advantages over a C++ program. For example some simple tasks require just a few awk instructions to complete, compared to a C++ program that would need to be much larger and therefore more time consuming to create and maintain.
5.3.1.1 mtNetLog Queries
You can use the awk program to process the TSV file created by mtNetLog and present the data in a more human readable form like this:
cat /var/log/mtnetlog_ppp0.tsv | awk -F "\t" 'NR==2{for (i=1; i<=68; i++) printf "-"; printf "\n"} NR!=2{printf "%20s %10.2fMB %10.2fMB %10.2fMB %8.2f\n", $1, $7, $8, $9, 100*$8/$7}'
Which displays the in/out/total columns and a perctange of input to output flow. This will convert the data from this file:
61.555277776904 253420306 30916922 330063404 301.02403259277 13.748958587646 314.77299118042 'Start 'End 'Hours 'Bytes In 'Bytes Out 'Bytes Total 'MB In 'MB Out 'MB Total 2010-11-10 12:10:42 2010-11-11 00:39:11 12.474722223356 36193057 11935767 86299634 73.300669670105 9.0010747909546 82.30174446106 2010-11-14 09:59:13 2010-11-15 01:36:16 15.617500000633 15034107 2117117 70948296 67.253649711609 0.40791797637939 67.661567687988 2010-11-19 11:58:12 2010-11-20 02:43:36 14.756666664965 68171569 6224867 36326937 32.803199768066 1.8408670425415 34.644066810608 2010-11-20 10:09:43 2010-11-21 02:21:46 16.200833332725 84574392 1741061 56100792 52.855527877808 0.6463565826416 53.501884460449 2010-11-21 12:29:45 2010-11-21 15:00:05 2.5055555552244 49447181 8898110 80387745 74.810985565186 1.8527421951294 76.663727760315
Into this:
301.02MB 13.75MB 314.77MB 4.57 -------------------------------------------------------------------- 2010-11-10 12:10:42 73.30MB 9.00MB 82.30MB 12.28 2010-11-14 09:59:13 67.25MB 0.41MB 67.66MB 0.61 2010-11-19 11:58:12 32.80MB 1.84MB 34.64MB 5.61 2010-11-20 10:09:43 52.86MB 0.65MB 53.50MB 1.22 2010-11-21 12:29:45 74.81MB 1.85MB 76.66MB 2.48
You can also use the sort command line program in order to sort according to a given field:
cat /var/log/mtnetlog_ppp0.tsv | tail -n+3 | sort -t $'\t' -nk 3 | awk -F "\t" '{ printf "%s\t%5.2f hours\n", $1, $3 }'
Creates this output:
2010-11-21 12:29:45 2.51 hours 2010-11-10 12:10:42 12.47 hours 2010-11-19 11:58:12 14.76 hours 2010-11-14 09:59:13 15.62 hours 2010-11-20 10:09:43 16.20 hours
5.3.1.2 Splitting fixed width text files
If a text file contains a rectangular lump of text with various columns set up using spaces it is possible to chop this up into a TSV file ready for spreadsheet editing:
cat fixed.txt | awk -v FIELDWIDTHS='20 11 2 11 2 11 2 9' -v OFS='\t' '{ $1=$1; print }' > fixed.tsv
This cuts up the output from the mtnetlog example in 5.3.1.1. You may also want to remove spaces using the following sections.
5.3.1.3 Removing all spaces
cat fixed.tsv | tr -d " "
This is a very crude and indiscriminate weapon of mass destruction, so care is needed when using it. For example it destroys the space between the date and time in the example from cutting up the text from 5.3.1.1. To avoid this problem use the following example.
5.3.1.4 Removing all leading and trailing spaces
You can use sed or awk to chop off leading and trailing spaces in each cell. This is useful if you want to preserve the spaces within the cell, but remove useless spaces after splitting a fixed width file:
cat fixed.tsv | sed -e "s/^[ ]\+//" | sed -e "s/[ ]\+$//" | sed -e "s/\t[ ]\+/\t/g" | sed -e "s/[ ]\+\t/\t/g" cat fixed.tsv | awk '{ sub( /^ +/, "" ); sub( / +$/, "" ); gsub( /\t +/, "\t" ); gsub( / +\t/, "\t" ); print }'
5.3.1.5 Creating fixed width text files
To reverse the operation in 5.3.1.3 and to create a fixed width file from a TSV file:
cat fixed.tsv | awk -v FS='\t' '{ printf "%20s%11s%2s%11s%2s%11s%2s%9s\n", $1, $2, $3, $4, $5, $6, $7, $8 }'
In this example, we want all of the chunks of text to be right justified, but if we wanted to have them left aligned we would simply place a '-' before the number like this:
cat fixed.tsv | awk -v FS='\t' '{ printf "%-20s%-11s%-2s%-11s%-2s%-11s%-2s%-9s\n", $1, $2, $3, $4, $5, $6, $7, $8 }'
5.3.2 unzip
The command line program unzip allows you to access the sheets held inside a CED/ZIP file. Here is an example using the example data in the file 'test_suite.tsv.zip' from the mtCellEdit source code distribution:
unzip -p test_suite.tsv.zip "test_suite.tsv/sheet/Data (1959 GE)" | awk -F "\t" 'NR>340 { printf "%s - %s\n", $1, $2 } NR>350 {exit}'
This outputs the first 2 columns from rows 341 to 350, which results in this being displayed:
'King's Lynn - 'DG Bullard 'Kingston upon Thames - 'JA Boyd-Carpenter 'Kinross and West Perthshire - 'WG Leburn 'Kirkcaldy - 'HPH Gourlay 'Knutsford - 'WH Bromley-Davenport 'Lanark - 'JCM Hart 'Lanarkshire North - 'MM Herbison 'Lancaster - 'HJ Berkeley 'Leeds East - 'DW Healey 'Leeds North East - 'K Joseph 'Leeds North West - 'D Kaberry
As you can see it prints out the ' character at the beginning of each field which is undesirable, so we ask awk to cut this out for us using the sub function:
unzip -p test_suite.tsv.zip "test_suite.tsv/sheet/Data (1959 GE)" | awk -F "\t" 'NR>340 { sub(/^\x27/, "", $0); gsub(/\x09\x27/, "\t", $0); printf "%s - %s\n", $1, $2 } NR>350 {exit}'
Which creates this:
King's Lynn - DG Bullard Kingston upon Thames - JA Boyd-Carpenter Kinross and West Perthshire - WG Leburn Kirkcaldy - HPH Gourlay Knutsford - WH Bromley-Davenport Lanark - JCM Hart Lanarkshire North - MM Herbison Lancaster - HJ Berkeley Leeds East - DW Healey Leeds North East - K Joseph Leeds North West - D Kaberry
If you wanted to see the last 10 rows in a file you would simply call the tail command first:
unzip -p test_suite.tsv.zip "test_suite.tsv/sheet/Data (1959 GE)" | tail | awk -F "\t" '{ sub(/^\x27/, "", $1); sub(/^\x27/, "", $2); printf "%s - %s\n", $1, $2 }'
Which creates this:
Worcester - GR Ward Worcestershire South - PG Agnew Workington - TF Peart Worthing - OL Prior-Palmer Wrekin, The - W Yates Wrexham - JI Jones Wycombe - J Hall Yarmouth - A Fell Yeovil - JWW Peyton York - CB Longbottom
5.3.3 zcat
If you have saved some TSV data inside a compressed GZ file you cat use the zcat command to access this data. This example extracts the first and third columns from a file and stores the results in a new compressed output file:
zcat input.tsv.gz | awk -F "\t" '{ printf "%s\t%s\n", $1, $3 }' | gzip > output.tsv.gz
5.3.4 tr
5.3.4.1 Changing Field Separators
Occasionally you may come across a file with an unusual field separator such as : ; or @. Assuming these characters only exist as field separators then its very easy to swap the characters to tabs using tr:
cat input.txt | tr "@" "\t" > output.tsv
For exotic multi-character separators like @;@ you can use awk:
cat input.txt | awk '{ gsub( /@;@/, "\t" ); print }' > output.tsv
You can also change a normal TSV file into a file with rather more eccentric field separators:
cat input.tsv | tr "\t" "@" > output.txt cat input.tsv | awk '{ gsub( /\t/, "@;@" ); print }' > output.txt
5.4 cedutils
As mentioned in section 5.3, a GNU system can manipulate a TSV file in a number of ways. However there are some jobs that are not possible to complete quickly and easily so I have assembled a group of command line utilities that can be used by the shell to manipulate spreadsheet files.
These utilities are designed for accessing CSV or TSV files using basic scripts. If you need access to data inside a CED/ZIP file then you should unzip it to a temp directory and access the sheets there. Any complex or CPU intensive tasks should be left to custom built C++ programs.
These utilities are accessed via the binary program cedutils. You access its functions by calling it via a symlink to execute a single command. Because all commands really exist in a central program, it is also possible to do multiple different operations within the same line by using the argument "-com" followed by the new command. The advantage of this is in terms of speed and fewer file operations. When you call 2 commands separately in two script lines, you action a load and a save operation twice. However, with multiple commands in a single line you only have a single load and a single save operation which saves CPU cycles, and involves fewer file read/write operations. Here is an example which flips a sheet and inserts 10 empty rows:
cedflip input.tsv -com insert -total 10 "" -o output.tsv
Compare this to the two operations done separately:
cedflip input.tsv -o output.tsv cedinsert -total 10 output.tsv -o output.tsv
More information can be obtained by consulting the man page:
man cedutils