Getting python.exe to run from any directory on my PC so I could use D3’s external data file load function

The only way I could get python (python.exe) to run from any directory via from the command line was to set the SYSTEM variable PATH, *not* by changing the USER variable path. Arghhh. Took an hour of searching the Oracle (i.e., Google) to finally discover this.

Where I was headed was that I needed to steer to a local directory in the command line in order to start a local web server for using D3 …  http://localhost:8000/whatever.html. I started a local web server using  ‘python -m SimpleHTTPServer’.  Loading a local external file in D3, like:

d3.tsv("data.tsv", function(data) {
console.log(data[0].x);
});

requires a web server to be running (due to AJAX calls).

Unlike other frameworks / apps, D3 does *not* use the local machine’s OS file system to load files, it needs a web server. Who knew? Arghhh (redux).

Frustration Peter Bakke

Can’t find the location of the ENRON email files on my Windows PC

On a Windows PC:

I happened to unzip the ENRON maildir files in a random location on my windows machine. I was getting the “directory not found” error. (It would have been nice if the ML 6.4 video suggested to unzip the maildir email contents into the same folder as all the other datacamp lessons… perhaps I’m just an idiot by not doing so.)

In any case, you can insert your own absolute directory path per the following.

In the Vectorize_Text.py file I fixed the not found problem:

# Old code
path = os.path.join('..', path[:-1])

Fix the directory not found error by inserting your PC’s absolute path like this: (use Windows Explorer to find the maildir directory. Click on the path and copy the path. )

#New code        
path = os.path.join('C:\your_PCs_maildir_directory_path', path[:-1])

I hope this suggestion can save a lot of people precious time. :slight_smile: … we have to first assume that you successfully unzipped all of the ENRON email files – which consist of over a gigabyte of data. Cool!

Set the system path for Python Jupyter notebooks

In Jupyter, when I was a newbie, I often needed to reference some Python library code located in some weird place on my PC, so I did this at the top of each Jupyter notebook I created:

import sys
sys.path.append(‘C:\users\name\code\my-Python-object-location’)

Doing so made the path (temporarily) part of sys.path for as long as that session was active. But when I started a new notebook, I always had to include sys.path.append() again at the top of each new notebook. Drove me nuts.

Here’s the fix:

Add your Python object path(s) to “PYTHONPATH” or an exiting “path”  entry in your system environment variables (via the Windows Control Panel).

How to do it:

On your system (for Windows 10, enter the following in the “Type here to search” box, screen bottom left), search for “control panel” then in the upper right of the panel, search for “environment” and click on “Set your environment variables”

Next, in the Environment Variables section (see image below), check if you already have PYTHONPATH. If yes, select it and click “Edit” and add additional paths as needed. If it’s not there, click “New” and add PYTHONPATH (if you have an existing ‘path’ variable, simply edit it. But I like to add PYTHONPATH to keep it logically separate from the generalized Windows system ‘path’ variable).

Paths in environment variables such as PYTHONPATH need to be separated with a semicolon, “;” … like this: ‘C:\users\name\code\my-library111′;’C:\users\name\code\my-library222′;’C:\users\name\code\my-library333’

So, click ‘Save’ farther down at the bottom of the Environment Variables box and you are done.

Remove the sys.path.append() code from your notebooks and restart them and you should be good to go. (Just to be safe, adjust one notebook first and check it out to make sure this system path fix is working for you!)

Good luck. The game is afoot!

 

add python path to system Peter Bakke

Mars Spacecraft was Lost in Translation

NASA lost its $125-million Mars Climate Orbiter because spacecraft engineers failed to convert from English to metric measurements when exchanging vital data before the craft was launched, space agency officials said.

A navigation team at the Jet Propulsion Laboratory used the metric system of millimeters and meters in its calculations, while Lockheed Martin Astronautics in Denver, which designed and built the spacecraft, provided crucial acceleration data in the English system of inches, feet, and pounds.

As a result, JPL engineers mistook acceleration readings measured in English units of pound-seconds for a metric measure of force called newton-seconds.

In a sense, the spacecraft was lost in translation.

Mars Climate Orbiter

From 1998:

Source:  http://articles.latimes.com/1999/oct/01/news/mn-17288

Do I Love or Hate R’s knitr?

Depends. I’m sharing my love/hate experience with knitr. When it works, it’s divine. When it doesn’t, it’s diabolical. knitr is finally working well in my local PC’s RStudio, but I find that I’m running knitr every hour or so to make sure I have not introduced something into my .rmd file that breaks it. I had wrongly assumed that knitr basically takes a “screen capture” or the like of my .rmd file and outputs the page as html, PDF, etc. In fact, knitr executes EVERYTHING in my ever-growing project .rmd file and if, say, a variable is undefined or I have an R code chunk that has the same name, knitr will barf and halt execution resulting in nada, nil, zip, nothing, nichevo. That’s why I run knitr several times a day, and good thing that I do because it catches stuff. I don’t want to wait for my project to be complete only to find that I have to spend hours fixing syntax errors (whatever) that knitr so gleefully finds. Onward.

knitr-peter-bakke

 

Setting an (almost) unique seed for a random generator

In Data Analysis and other programming endeavors, we frequently have to set a random() seed to, say, select a sample of observations from a very large dataset.

You might want to select a 10,000 sample from a million observation dataset. Good idea to avoid crashing your PC. You should use a random seed to do so.

If you truly want an (almost) unique random seed, try my tried and true method (from my IBM software engineering days) of using date/time … preferably utilizing milliseconds.

Example:  a seed of 1471300214792 milliseconds converted to date/time is 16 August 2016 01:30:14:792. Set your seed to today’s date/time in milliseconds.  You’ll never see that particular programmatical random seed in this lifetime (perhaps only if 1 million monkeys tapped on a calculator for 100+ million years… or the time it takes for Elon Musk’s Falcon Heavy-launched Tesla to fall into the sun ). Think about it.

Have fun, brainiacs.

 

Wiki

How to plot a variable in R that has spaces it it?

There are thousands of datasets on the web available for analysis using R. Many of them are listed by the plus or minus 175 countries, like “United States” or “Cote d’Ivoire”

So, ignoring for a moment that the experts say never name a variable with spaces, in the real world how do you plot a variable with spaces in its name?

Simple. When programming, encase the variable name with backticks. Like so: `United States`

Example: see below the line with, y=`Costa Rica`

ggplot(df, aes(Year, group=1)) +
ylab(“Country”) +
geom_line(aes(y = `Costa Rica`, color=”Costa Rica”)) +
geom_point(aes(y = Belgium, color=”Belgium”))

r-Peter_Bakke

 

Unable to install packages in R

Running Rstudio on Windows for more than a day or so without shutting down? Are you now getting errors installing packages? I suggest Windows has “lost” its mind. Save and Restart #rstudio & #windows. Likely your problems will go away.  #DataAnalytics #DataAnalyst #Statistics

Searches related to this R (RStudio) package installation problem that might provide solutions as well:

Are You Having Trouble Plotting a Time Series in R? Here’s one solution.

Context: I was using Gapminder.com data, so the data is presented in csv format.
The data consists of various country-related data with rows as country names and columns as years.

Step 1. Read into a data.frame the local .csv file that you downloaded from the Gapminder website.

Note that “row.names = 1” saves each of the 175 or so country names in the rows. Columns will be years.

df <- read.csv(‘path.to.your.local.file.here’, header = T, row.names = 1, check.names = F)

Step 2. Transform the data.frame. 

Years are now rows instead of columns. Country names are now cols.

df_t <- t(df)

Step 3. Add a “Year” column created from the row.names. For me, this was the missing link for
successfully plotting the data, else I was floundering.

Reason for this: We need this col to be numeric variables in order to plot
…which are the years in the time series

df_t$Year <- row.names(df_t)

Step 4. That’s it! Now you can plot. It’s that simple.

For example:

ggplot(df_t, aes(Year, group = 1)) +
geom_line(aes(y = Albania, color=“Albania”)) +
geom_point(aes(y = India, color=“India”))

Etc…

Peter Bakke - Data Analyst Gapminder.com plotting time series

How to Fix the GIT GUI Error “Drive already exists”

For GIT GUI users on Windows… When browsing for repository locations you may get the error “Drive already exists…” Simply remove the “C:\” from the GIT GUI command line after selecting the folder in Windows and you’ll be good to go!

Fix the GIT GUI error message "Drive already exists " Pete Bakke
Fix the GIT GUI error message “Drive already exists ” Peter Bakke