TIL that you can’t change the CSV delimiter or enclosure characters on Excel for Mac or Apple Numbers. You have to use Open Office, and even there it is tricky. You need to go to File > Save As, select Text CSV, then make sure you check the “Edit Filter Settings” checkbox:
Then you get the field option AFTER you hit Save, only if the Edit Filter settings box is checked:
I’m using the ImportWP Pro plugin to quickly create pages in the Praxis curriculum portal. One of the custom fields I need to set is for the WPComplete course system we use. It takes JSON for which course the page belongs to and which page to redirect the to once the current page is marked complete. Here is an example:
I needed to build this JSON in Excel with the rest of the page info. The pages I’m setting the redirect title and URL for are being created in the spreadsheet, too. So I wanted to build this JSON cell from the contents of other cells. The commas and quotes were a problem for my typical way of building text from other cells: =B5&" "&C5
Here is the new solution I found: Break the JSON down into constituent parts, put each in its own cell, then build the JSON with the concatenate() function.
I put 1 in A23, 3 in A24, and 5 in A25. Then pulled 2 and 4 from various cells in columns A and B. Then my JSON cells used this formula: =CONCATENATE(A23,A4,A24,B4,A25)
Here is a problem I was faced with at work last week: Individuals who had two different kinds of relationships with an organization had two different contact records stored in two different files in a legacy database. The client wants to collapse those contact records down into one if possible, so we had to do a comparison between then.
Difficulties:
Names were concatenated in one file and stored separately in the other.
Some addresses didn’t match (which is what we were checking in the first place), but even if they did, they were often entered differently between the two files (APT vs #, ST vs St., S vs South).
In one file, all postal codes were stored in one column. In the other file, domestic postal codes were stored in one column and international postal codes were stored in another column.
One file had about 30 duplicates for some reason. This means there were multiple entries in the database, but for some reason they all contained exactly the same information.
First things first: I got the information in a CSV file for comparison. I pulled this into Excel. Thankfully there was a unique ID between the two files that matched, so I was able to quickly deduplicate the first file and get the number of entries to match. Then I sorted the two files in the same order.
Here are the formulas I used for comparisons:
Compare cells on two different sheets and put out Yes if they match, No if they do not: =IF(E2='Sheet 2'!C2,"YES","NO")
Compare the first 5 characters of two cells instead of the whole contents. This is how I solved the whether or not addresses “matched” even if they have slight issues (APT vs #, ST vs St., S vs South). This won’t work for every situation, but I spot checked the cells and didn’t find any issues in my data set. =IF(LEFT(E2,5)=LEFT(F2,5),"YES","NO")
Counting the number of NOs in a particular column: =COUNTIF(G2:G150,"NO")
Count the number of YESes or NOs in a particular row. This will give you a good indicator of whether or not a row matches overall: =COUNTIF(H2:AC2,"NO")=COUNTIF(H2:AD2,"YES")
I took a 10-day break from my TILs and now I’m reinvigorated and back on track. In the past 10 days I spent a lot of time doing things I already knew how to do, but I also worked on a small project related to data visualization to see if I could apply some things I learned over the past few months. The result was my recent blog post on Steph Curry’s stats.
A new project
I got a lot of good feedback on Facebook on some more things I can explore (and about how little I understand basketball.) I decided to work on another personal project to apply some of the data science I’ve been learning by reading Joel Grus’s Data Science from Scratch. I decided to find some data (photo dates and locations), extract it (AppleScript), format it (grep FTW), analyze it (forthcoming in Excel and Python), then visualize the insights (Python and/or D3). You can follow my progress on GitHub.
Dealing with files in AppleScript
AppleScript natively deals with file paths with colons: Macintosh HD:usr:local:bin: If you get a file path then want to pass it to the shell, you’ll first need to turn it into a POSIX path:
If you want to create a file in the same folder as a script you are running, you might have to jump back and forth between AppleScript paths and POSIX paths because it is easier to make files by using the command do shell script. Notice the use of quoted form of, which quotes the file path and keeps you clear of pesky errors caused by characters in the path that need to be escaped:
Then if you want to get the path to the file you just created and make an it an alias to reference later:
Conditional counting in Excel
Suppose you have a [spreadsheet full of photos, the date they were taken, and the days of the week they were taken on] (https://github.com/cagrimmett/apple_photos_analysis) and you want to count how many were taken on Monday, Tuesday, etc. Then you’d use the function COUNTIFS(range,argument) to work it out.
Example: Suppose I have the days of the week in column A and cell H2 contained the word I was looking for, Monday. Then my formula would be:
=COUNTIFS(A2:A8500,H2)
I quickly repeated this for the cells that contained the values for the other days of the week and got instant results:
For example: here is how I’d combine columns of hours and minutes and put a colon in-between: =[@Hour] & ":" & [@Minute]
Convert hours.minutes to hh:mm:ss in Excel
Take 275.75 and convert it to 275:45 =CELL/24 converts the hours.minutes to days Then you format the column by Custom > Time > 37:30:55 (hh:mm:ss)
SQL Dates, Concatenation, and Grouping
I learned some useful things today in SQL: SUM(), CONCAT_WS() to get CSV output, DATE() to get the date part of a datetime stamp, and GROUP BY to get the sums grouped by another column
If you’ve ever tried to write a sum() function in Excel while you have filters on, you likely didn’t get the result you expected. The sum() function adds up what is in the raw cells. It does not take whether or not they’ve been hidden via a filter into account.
subtotal() is the function you are looking for. Specifically, subtotal(9,range). The 9 refers to sum for hidden (not at a result of filtering, though) and non-hidden rows. See the documentation.subtotal() functions only on the results of filtered data.
=SUBTOTAL(9,G1670:G640501) gave me the sum of the the filtered data from G1670 to G640501. (You can see why filtering would be important with such a large data set!)
Remind yourself why
Sometimes I spend too much time and effort trying to convince myself to do something I don’t want to do. This leads to general angst and despair.
My wife reminded me yesterday that the best way of convincing myself is to reflect on why I’m doing it in the first place.
Whatever your reason why is, as long as it outweighs the cost of not doing it, you’ll get over it and find a way. I keep an index card with some why bullet points on it nearby to keep things in perspective.