Friday, March 18, 2022

Removing Duplicated Lines by VIM

Recently I have interest in GPS location coordinates. In my TapToCount-3W iOS app, there is a feature to record location information for each tap.

I have been puzzled by a scenarior for long time. For example, if I make several taps at same location, the GPS locations spead within a range, several tens of meters to meters apart, 30 meters or 8 meters as examples.

Further invetigation, I found that the decimal points of GPS coordinates are related to location accuracy. The more decical points are in coordindates, the more accuracy.

VIM is a powerful tool. The following is a real case of my invetigation and analysis of GPS coordindates by using VIM.

Export GPS Information

First, I have to get GPS location coordates from my app. I use the Export feature to share the result of cvs file to my mac.


Using Numbers to open the csv file. The tap information includes my sleep times between Dec 2018 to Mar 2022, 7484 lines of taps I made. From GPS coordinates, I see the decimal points up to 14 digits.

For example, here is a coordinate of latitue, longitue, and altitue value:
53.55404449631790,	-113.2990255394140,	692.3739891052250

Find Duplicate GPS Information

7484 lines are quite a lot of information. I need to find out if there is any duplicated information.

Use the following command to find duplicate coordinate information:
/\(T-\d\)\(.*\)\n\(.*\2\n)\+

Match pattern explaination: first group beginning with "T-" and a digit, then any charts (coordinates) as second group, new line, next or more lines of any charts followed by the matched second group (coordinates).

The result is as followings:


Remove Duplicate Coordinate Lines

With about search result, I can then proceed removing duplicate coordinate lines.

Use the following command to remove duplicate coordinate information:
:%s/\(T-\d\)\(.*\)\(\n.*\2\)\+/\1\2/

Replacement command explaination: replace the matched pattern (duplicate GPS coordinate lines with group 1 and group 2 defined in the matched pattern.

Analysis Results

With VIM search and replacement as a convenience and powerful tool, I can do my research and analysis on GPS coordinates collected in my TapToCount-3W app.

Using Numbers, I can get coordinate values with different decimal digits, and find out what duplicate coordinates.

7,485 Taps with GPS coordindates
Decimal
Places
Degrees
Duplicates
Unique
0
0
5246
2239
1
0.1
5245
2240
2
0.01
5245
2240
3
0.001
5222
2263
4
0.000 1
4711
2774
5
0.000 01
4638
2847
14
0.000 000 000
000 00
577
6908

Based on above analysis, I think that the best decimal points should 5, 4, or 3 places, with accuracy corespoinding to +/-0.555m, +/-5.55m, or +/-55.5m. This will reduce duplicate coordinates up to 68% to 70%.

By using VIM, I did further analysis. I export a group of taps from Jan 1, 2022 to now (Mar 19, 2022), which I know for certain that they(454 taps) were tapped at the same location. The following are 2 results: one with altitute into consideration and another with no altitute.

454 Taps with GPS coordindates and altitude
Decimal
Places
Degrees
Duplicates
Unique
0
0
444
10
1
0.1
444
10
2
0.01
444
10
3
0.001
442
12
4
0.000 1
255
199
5
0.000 01
434
20
14
0.000 000 000
000 00
444
10

454 Taps with GPS coordindates and NO altitude
Decimal
Places
Degrees
Duplicates
Unique
0
0
453
1
1
0.1
453
1
2
0.01
453
1
3
0.001
452
2
4
0.000 1
444
10
5
0.000 01
33
421
14
0.000 000 000
000 00
8
446

With above research results, they helps me better to understand GPS coordindate information, and to take better and effective strategies for updating my app freatures and interfaces for my app users.

References

0 comments: