Excel and .csv lesson learned… Includes painful waste of time, unfortunately.

I found out that when you export data in .csv format from #Excel make sure you don’t have a trailing space after any column names. They are exported as “col_name ” with the space AND quotes. Beware when using them to index your array! They will fail. Simply delete any hidden spaces in your column names before exporting. Please. For your own sanity. #DataAnalytics #ERROR #painful lesson.

Peter Bakke DOH!

Python Immutable strings, integers, booleans

In Python, STRINGS, INTS, BOOLs etc. are IMMUTABLE… meaning, for example, that you cannot convert a df column that is a string to a column that is int.  However,  you can ASSIGN the values (objects) to another variable or create a new instance of that column in another dataframe   … or use .astype(int) to perform an intermediate computation.

python-logo-Pete-Bakke

Getting python.exe to run from any directory on my PC so I could use D3’s external data file load function

The only way I could get python (python.exe) to run from any directory via from the command line was to set the SYSTEM variable PATH, *not* by changing the USER variable path. Arghhh. Took an hour of searching the Oracle (i.e., Google) to finally discover this.

Where I was headed was that I needed to steer to a local directory in the command line in order to start a local web server for using D3 …  http://localhost:8000/whatever.html. I started a local web server using  ‘python -m SimpleHTTPServer’.  Loading a local external file in D3, like:

d3.tsv("data.tsv", function(data) {
console.log(data[0].x);
});

requires a web server to be running (due to AJAX calls).

Unlike other frameworks / apps, D3 does *not* use the local machine’s OS file system to load files, it needs a web server. Who knew? Arghhh (redux).

Frustration Peter Bakke

Do I Love or Hate R’s knitr?

Depends. I’m sharing my love/hate experience with knitr. When it works, it’s divine. When it doesn’t, it’s diabolical. knitr is finally working well in my local PC’s RStudio, but I find that I’m running knitr every hour or so to make sure I have not introduced something into my .rmd file that breaks it. I had wrongly assumed that knitr basically takes a “screen capture” or the like of my .rmd file and outputs the page as html, PDF, etc. In fact, knitr executes EVERYTHING in my ever-growing project .rmd file and if, say, a variable is undefined or I have an R code chunk that has the same name, knitr will barf and halt execution resulting in nada, nil, zip, nothing, nichevo. That’s why I run knitr several times a day, and good thing that I do because it catches stuff. I don’t want to wait for my project to be complete only to find that I have to spend hours fixing syntax errors (whatever) that knitr so gleefully finds. Onward.

knitr-peter-bakke

 

How to plot a variable in R that has spaces it it?

There are thousands of datasets on the web available for analysis using R. Many of them are listed by the plus or minus 175 countries, like “United States” or “Cote d’Ivoire”

So, ignoring for a moment that the experts say never name a variable with spaces, in the real world how do you plot a variable with spaces in its name?

Simple. When programming, encase the variable name with backticks. Like so: `United States`

Example: see below the line with, y=`Costa Rica`

ggplot(df, aes(Year, group=1)) +
ylab(“Country”) +
geom_line(aes(y = `Costa Rica`, color=”Costa Rica”)) +
geom_point(aes(y = Belgium, color=”Belgium”))

r-Peter_Bakke