On a Windows PC:
I happened to unzip the ENRON maildir files in a random location on my windows machine. I was getting the “directory not found” error. (It would have been nice if the ML 6.4 video suggested to unzip the maildir email contents into the same folder as all the other datacamp lessons… perhaps I’m just an idiot by not doing so.)
In any case, you can insert your own absolute directory path per the following.
In the Vectorize_Text.py file I fixed the not found problem:
# Old code
path = os.path.join('..', path[:-1])
Fix the directory not found error by inserting your PC’s absolute path like this: (use Windows Explorer to find the maildir directory. Click on the path and copy the path. )
path = os.path.join('C:\your_PCs_maildir_directory_path', path[:-1])
I hope this suggestion can save a lot of people precious time. … we have to first assume that you successfully unzipped all of the ENRON email files – which consist of over a gigabyte of data. Cool!
Geesh! I wrote a blog post last week about what I thought was a rather clever article about what I presumed might be “Preferred” Google abbreviations in NAPS (Name, Address, Phone) usage across the Web. In fact, I’m taking a very difficult Data Analysis (DA) nanodegree with Udacity and in my Data Wrangling project I strongly suggest in it that we Data Analysts should use consistent abbreviations when doing Search Engine Marketing, etc. and data cleanup. One of our tasks is to clean up street names in a large dataset taken from Open Street Mapping (OSM). Thus, the abbreviation idea for street name cleanup. My point: Standard abbreviations will actually reduce computing time across the planet. Haha. Maybe.
Well, my PeterBakke.com site that contained the aforementioned blog got hammered. It tumbled from page 1 SERP to Page 6 SERP with no end in sight when searching for “Peter Bakke,” c’est moi. Perhaps the word “Google” appeared too many times in my post and Google penalized me for keyword stuffing. Or perhaps Google thinks people writing about Google are pandering. Dunno.
In any case, I’m also learning about machine learning [ML aka AI] in my nanodegree and the Google automagic search result demotion is certainly an example of ML, good or bad. C’est la vie.