How to clean up text pasted from Google Docs with Atom and Regular Expressions

Have you ever pasted text from Google Docs onto your blog (WordPress or otherwise) and had to fix wacky formatting? Here is how to quickly strip out all those extra HTML tags using regular expressions with Atom.io, a free text editor.

Links:

Regex Lookahead and Lookbehind

Today I learned:

Regex Lookahead and Lookbehind

(?=foo) Lookahead Asserts that what immediately follows the current position in the string is foo
(?<=foo) Lookbehind Asserts that what immediately precedes the current position in the string is foo
(?!foo) Negative Lookahead Asserts that what immediately follows the current position in the string is not foo
(? Negative Lookbehind Asserts that what immediately precedes the current position in the string is not foo

Example

Expression: Hell(?=o) - Lookahead

  • Hellish fails
  • Hello passes
  • MelloHello passes

Use

The basic example above isn’t very helpful. But it can be very useful for password rule validation or advanced find/replaces.

Web crawlers, Regex for Markdown URLs, and Removing your site from Google search results

Today I learned:

Web Crawlers

Need a web crawler but don’t want to write one?


Getting pages removed from Google cache

Have an old site that you need to keep live but don’t want the results to show on Google searches? Here are a few things you need to do:

  1. Change the robots.txt or password protect your site to prevent search engines from indexing.
  2. Log in to Google Webmaster Tools and submit the site to the URL Removal tool.
  3. Finish what you need the site up for ASAP and take it offline.

This matches the links above:

  • Search: ([wS]*[mo7b/])$
  • Replace: [1](1)

Amending Commits, Matplotlib, and More Python

I’ve been on vacation and spend the last two days catching up and not doing a lot of learning, so I’ve been lazy in putting up TIL posts. That is over. (I did, however, push some updates to my Apple Photos Analysis project.) Here is a small collection of things I learned in the last week.


Amending commits

Say you forgot to add a file to your last commit or you made a typo in your commit message. You can amend it!

Make the necessary changes, then do this:

git commit --amend -m "Commit message here"

If you’ve already pushed it to an external repository, you’ll need to force the push since the external repo will look like it is ahead. If branch protection is turned on, you’ll need to make a new commit. Make sure you aren’t overwriting anything important!

git push origin master --force

Here are the docs.


Adding data labels to the top of bar charts in Matplotlib

Matplotlib is a great plotting library for Python.

def autolabel(rects):     # attach some text labels     for rect in rects:         height = rect.get_height()         plt.text(rect.get_x() + rect.get_width()/2., 5+height,                 '%d' % int(height),                 ha='center', va='bottom') rect = plt.bar(xs, counted_hours, color=color)  # To use: autolabel(rect)

Saving images in matplotlib

plt.savefig('directory/filename.png')

Counting items that match a regex pattern

def hour_finder(regex,lines): 	time_counter = 0 	for l in lines: 		if re.match(regex, l): 			time_counter = time_counter + 1 	return time_counter 	 # To use hour_finder('^8:[0-9]{2,}:[0-9]{2,}sPM',time_csv)

Splitting!

Splitting by a space ' ' and choose the item after the split ([1] because counting starts at 0)

list = [i.split(' ')[1] for i in time_csv]