Archives

Month: June 2016

  • Many-to-Many Relationships in Relational Data Models


    Today I learned about Many-to-Many relationships in relational data models

    • A many-to-many relationship is a type of cardinality that refers to the relationship between two entities A and B in which A may contain a parent instance for which there are many children in B and vice versa.
    • For example, a recipe may have many ingredients and a specific ingredient may be used in many recipes.
    • In SQL, this relationship is handled by an associative table. The primary key for this type of table is composed of two different columns, both of which reference columns in other tables that you are associating together.
    • It is a good convention to name these associative tables by how they reference each other: Table1_Table2
    • A SELECT statement on an associative table usually involves JOINing the main table with the associative table.
    • If you don’t want to run these JOINs every time, create a view first. A view is a virtual table based on the result-set of an SQL statement. You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table.

    Example SQL for creating an associative table:

    CREATE TABLE recipes_ingredients (     RecipeID INT(11) REFERENCES Recipes (RecipeID),     IngredientID INT(11) REFERENCES Ingredients (IngredientID),     PRIMARY KEY (RecipeID, IngredientID) )
  • Trifacta Wrangler


    Trifacta Wrangler

    Trifacta Wrangler is a free program (currently in beta) that helps you clean up data sets and gives you a first cut at basic analysis. It is great for quickly turning messy data into structured, manageable formats.

    Trifacta Wrangler

    In the past few days I’ve used it to analyze huge log files and turn messy JSON in structured CSVs that I could import into SQL.

    Quick tips:

    • splitrows always has to come first. The program usually tries to split by \n (new line) first, but that doesn’t always work for JSON. Try splitting by something like },{, or do a quick find and replace ( },{ for }|||{ ) and do the split by ||| if you want to keep the curly brackets for an unnest.
    • unnest is very powerful for splitting out JSON values out into separate columns titled by their keys.
    • flatten works better than unnest in cases where the JSON does not have keys. It creates new rows and repeats other values in adjacent columns to keep the relation. This works well if you have an ID column and are going to eventually stuff things into a relational database.

    Here is documentation for the Transforms.

  • Linux Webserver Cheat Sheet


    There is obviously a lot more than this, so I’ll add more as I encounter and use them.

    Last updated on: July 6, 2016.

    Connecting

    • Connecting: ssh username@serveraddress
    • Switching to root once you get in, if needed: su root

    Where to find things

    • Logs: /var/log/apache2
    • Site roots: /var/www/ if a single site, /var/www/vhosts if multiple sites on virtual host infrastructure
    • Config files: /etc/apache2/sitesavailable and /etc/apache2/sitesenabled

    Changing directories, creating files and directories, viewing text files

    • Changing directory: cd ~/path/to/folder
    • Going up one directory: cd ..
    • Creating files: touch filename.jpg
    • Creating directories: mkdir directory_name
    • Viewing text files: less filename.txt or cat filename.txt
    • Closing out of less: q

    wget

    Wget retrieves content from web servers.

    Syntax:

    $ wget [option] URL

    Options:

    • -O – Which output file the file you are downloading should get written to.
    • -q – Quiet mode. Doesn’t show the download status and other output.

    Example – Getting a file from the web:

    $ wget -O /path/to/filename.json http://example.com/URL/to/filename.json

    Reading the manual for commands

    $ man <command>

    Examples:

    • man cron
    • man date

    Crontab

    Cron is a time-based job scheduler in Linux and Unix. Here is my full TIL on cron.

    Load your personal crontab (cron table) file:

    $ crontab -e

    View your personal crontab:

    $ crontab -l

    Syntax:

    min hour day of month month day of week command
    * * * * * command
    0 – 59 0 – 23 1 – 31 1 – 12 0 – 6 (0 to 6 are Sunday to Saturday) shell command you want to run at that time

    Examble: Download a JSON file from Quandl and overwrite GOLD.json with it Monday through Friday at 5pm server time

    0 17 * * 1-5 wget -O "/path/to/quandl_data/GOLD.json" "https://www.quandl.com/api/v3/datasets/LBMA/GOLD.json"

    Date

    Display a date and time:

    • $ date spits out the date and time on the server
    • $ TZ=US/Pacific date spits out the server’s date and time adjusted to the Pacific timezone
    • TZ=US/Eastern date -d 'Tue Jul 5 10:43:07 PDT 2016' converts the timestamp in the -d option to the Eastern timezone.
    • date -d @1467740657 converts UNIX timestamps to something you can actually read

    Eric Davis has a Simple CLI date calculator writeup on his site.

  • Cleaning up your Mac with Hazel


    Today I learned how to clean up my Mac with Hazel

    Hazel

    Hazel is a preference pane-based application that helps you automate organization on your Mac.

    Within a few hours of using Hazel, I was able to clean out my 1000+ file Downloads folder, tame my unruly Desktop, get rid of all the trash that accumulated in my home folder, and organize my stashes of client files. I also set up rules for the future that will keep these places neat and orderly.

    The next step is to turn my gaze on my photo library to create one master repository.

    Tools and guides

    The best tools/guides on Hazel right now are:

  • Using Word Frequency Charts for Better Word Clouds


    Word clouds

    Data scientists notoriously hate word clouds. Besides for figuring out what the top 2-3 words are (because they are the biggest), it is difficult to see how much one word is used relative to another. Unfortunately, clients and non-data people love word clouds and sometimes insist on them. What is a self-respecting data nerd to do?

    Pair it a word frequency chart!

    The easiest way to do this is by using Python’s counter:

    Counter(words).most_common()

    Then you can use your favorite charting tool to make a bar chart of the results. I prefer D3.js.

    Results

    Word Frequency Chart

    Word Cloud

    If you see both together, you get a better understanding of the words being used. Of course, a single word doesn’t always capture sentiment. They can be helpful in smaller data sets, but sometimes common phrases are more helpful in larger data sets. For common phrases, use n-gram analysis.

    For more on visualizing text, check out episode 62 of the Data Stories podcast and the Text Visualization Browser.

  • Isaac Morehouse Podcast Episode 75 – How to Learn Anything, with Chuck Grimmett


  • Ember.js Basics


    Resources

    What it is

    • Ember.js is an open-source Javascript framework with the Model–view–viewmodel pattern.
    • Ember is an opinionated framework. This means that most architectural design decisions have been made for you by the developers of the framework. The advantage of this is that anyone who knows Ember can load your code and understand within a few minutes what is going on.

    Core Concepts

    Ember Core Concepts

    1. Ember router maps a URL to a route handler
    2. Route handler renders a template, then a model that is available to the template. Templates use Handlebars syntax.
    3. Models save data in a “persistent state”, which is fancy language for putting it in a database or data store of some kind.
    4. Components control how the UI behaves. They have two parts: A handlebars template and a javascript source file that defines the behavior.

    Installing and running a project

    Install:

    $ npm install -g ember-cli@2.6

    Creating a new app:

    $ ember new 

    Starting the development server. You must be in the project folder (cd to it). It will serve on http://localhost:4200/

    $ ember server

    or ember s for short

  • Link Posts with Jekyll


    Today I learned:

    How to make link posts (or external post links) with Jekyll

    I’m an avid reader of Marco.org and Daring Fireball. They both have these nifty posts that link directly to external pages. Now that I have two blogs, a podcast, and occasionally have work published elsewhere, I want to keep a record of these things on my cagrimmett.com blog. Link posts are an excellent way to do that.

    Jekyll doesn’t have this capability out of the box, but with a little Liquid magic you can make it happen. I followed Christian-Frederik Voigt’s method. This takes advantage of the ability to create items in each post’s YAML front-matter and use them in template files:

    First, create a variable called link: in your YAML front-matter for the post you want to link elsewhere. Fill it in with the link you want to post to link to:

    --- layout: post title: Snack Time Episode 3 - Negroni Week date: 2016-06-09 feature-img: "/img/snacktime-pattern.png" excerpt: blah blah blah excerpt here link: http://snacktime.fm/episodes/2016/6/9/episode-3-negroni-week ---

    Then anywhere in the template files that generates post headlines, you’ll need to write a conditional to check if the new link: variable is present. If it is, you’ll want to write the headline’s link there (post.link) instead of the post.url. I also added an arrow inside another conditional to specify that this is an outgoing link:

    You’ll want to use this anywhere your posts are listed. For me that is in index.html and 404.html.

    See this in action on my homepage.

    This method works great if you want to have direct links to other places and don’t want a permalink page on your own site that you can reference. I don’t care about that, so I didn’t make this for my site. If you want that, you’ll need to refer to Christian-Frederik Voigt’s method. The short story is that you’ll also need to modify the post and page templates.

  • Snack Time Episode 3 – Negroni Week