Python: Add thousand separator

In python, we can use ‘{:,}’.format() to convert numerical to a string separated by ‘,’.

for example:

Top15['PopEst'] = Top15['Energy Supply']/Top15['Energy Supply per Capita']

Australia             2.331602e+07
Brazil                2.059153e+08
Canada                3.523986e+07
China                 1.367645e+09
Germany               8.036970e+07
Spain                 4.644340e+07
France                6.383735e+07
United Kingdom        6.387097e+07
India                 1.276731e+09
Iran                  7.707563e+07
Italy                 5.990826e+07
Japan                 1.274094e+08
South Korea           4.980543e+07
Russian Federation    1.435000e+08
United States         3.176154e+08
Name: PopEst, dtype: float64

Change format:

Australia              23,316,017.316017315
Brazil                 205,915,254.23728815
Canada                  35,239,864.86486486
China                 1,367,645,161.2903225
Germany                 80,369,696.96969697
Spain                    46,443,396.2264151
France                  63,837,349.39759036
United Kingdom         63,870,967.741935484
India                 1,276,730,769.2307692
Iran                    77,075,630.25210084
Italy                  59,908,256.880733944
Japan                  127,409,395.97315437
South Korea            49,805,429.864253394
Russian Federation            143,500,000.0
United States          317,615,384.61538464
Name: PopEst, dtype: object

[Pandas]Load files and Concate

This is from the coursera course (Data Science)


Pandas version: 0.23.4

There are 3 files (excel, csv) to load into data frame:

1. Load the energy data from the file Energy Indicators.xls


“Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the data file. The first two columns are unnecessary, so you should get rid of them, and you should change the column labels so that the columns are:

['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']

“Convert Energy Supply to gigajoules (there are 1,000,000 gigajoules in a petajoule)”

Rename the following list of countries (for use in later questions):

"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"



import pandas as pd

Energy = pd.read_excel('Energy_Indicators.xlsx', 
sheet_name=0, usecols=[2, 3, 4, 5],
names=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'],
skiprows = list(range(1, 18)), 

# rename country
Energy['Country'] = Energy.Country.str.replace('[0-9]+?$', '', regex=True)

Energy['Country'] = Energy.Country.str.replace(' \(.*\)', '', regex=True)

Energy.replace({"Republic of Korea":"South Korea", 
"United States of America": "United States", 
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"}, regex = True, inplace=True)
Energy.set_index('Country', inplace=True)

# convert petajoules to gigajoules
Energy['Energy Supply'] = Energy['Energy Supply'].apply(lambda x: x*1000000)



Energy Supply Energy Supply per Capita % Renewable
Afghanistan 3.210000e+08 10.0 78.669280
Albania 1.020000e+08 35.0 100.000000
Algeria 1.959000e+09 51.0 0.551010
American Samoa NaN NaN 0.641026
Andorra 9.000000e+06 121.0 88.695650



2. Load GDP file world_bank.csv


Make sure to skip the header, and rename the following list of countries:

"Korea, Rep.": "South Korea",
"Iran, Islamic Rep.": "Iran",
"Hong Kong SAR, China": "Hong Kong"


GDP = pd.read_csv('world_bank.csv', skiprows=list(range(0,4)), usecols=list(range(0, 62)))
GDP.replace({'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran', 'Hong Kong SAR, China':'Hong Kong'}, 
            regex = True, 
GDP.set_index('Country Name', inplace=True)



Country Code Indicator Name Indicator Code 1960 1961
Country Name
Aruba ABW GDP (current US$) NY.GDP.MKTP.CD NaN NaN
Afghanistan AFG GDP (current US$) NY.GDP.MKTP.CD 5.377778e+08 5.488889e+08

3. Load ranking file scimagojr-3.xlsx, which ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame ScimEn.

ScimEn = pd.read_excel('scimagojr.xlsx', 
index_col = 'Country')


4. Concatenate


Join the three datasets: GDP, Energy, and ScimEn into a new dataset (using the intersection of country names). Use only the last 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr ‘Rank’ (Rank 1 through 15).

The index of this DataFrame should be the name of the country, and the columns should be
[‘Rank’, ‘Documents’, ‘Citable documents’, ‘Citations’, ‘Self-citations’, ‘Citations per document’, ‘H index’, ‘Energy Supply’, ‘Energy Supply per Capita’, ‘% Renewable’, ‘2006’, ‘2007’, ‘2008’, ‘2009’, ‘2010’, ‘2011’, ‘2012’, ‘2013’, ‘2014’, ‘2015’].


df should be 20 entries and 15 columns:

df = pd.concat([ScimEn.iloc[0:15, :], Energy, GDP.loc[:, '2006':'2015']], axis=1, ignore_index=False, join='inner')



(15, 20)


Simple Home Plant Auto Water System with ESP8266 (Part-1: Connect to the motor)

First, connect the wires as follow:



to control the motor, we need a relay ( the black box ).

this is from official document:

Pulse width modulation (PWM) is a way to get an artificial analog output on 
a digital pin. It achieves this by rapidly toggling the pin from low to 
There are two parameters associated with this: the frequency of the 
toggling, and the duty cycle. The duty cycle is defined to be how long the 
pin is high compared with the length of a single period (low plus high time). 
Maximum duty cycle is when the pin is high all of the time, and minimum is 
when it is low all of the time.

On the ESP8266 the pins 0, 2, 4, 5, 12, 13, 14 and 15 all support PWM. 
The limitation is that they must all be at the same frequency, and the 
frequency must be between 1Hz and 1kHz.
import machine
p13 = machine.Pin(13)
pwm13 = machine.PWM(p13)

# set the frequency to 500

# the duty cycle is bwteen 0 (all off) and 1023 (all on)
pwm13.duty(512) # half duty cycle, this is when the pump turns on

# to de-initialise pwm mode


[R]Use twitteR to detect and visualize crop disease

To create an shiny app that captures the outbreaks of crop disease from twitter tweets:

load packages:



You must have a twitter API account, where you will get key to access.

follow below:

Function 1: to retrieve a certain user account twitters

retrieve.twitter.user <- function(consumer_key, consumer_secret, access_token, access_secret, userName, ntweets, inclRt = TRUE){
 setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
 user = getUser(userName)
 tweets = userTimeline(user, n = ntweets, includeRts = inclRt)
 df = twListToDF(tweets)
 # clean tweets
 # remove http links, stop words in the tweets
 #replacement = paste(stop_words$word, collapse = "|")
 # states
 states_abb =
 states_full =
 statesPattern = paste0(states_full, "|", states_abb)
 pattern1 = "http://.+|https://.+"
 tweets_clean = df %>%
 mutate(clean_text = str_trim(str_replace(str_replace_all(tolower(str_trim(str_replace(text, pattern1, " "), "right")), "[[:punct:]]|[[0-9]]", " "), "^rt", ""), "both"))
 states = vector(length = nrow(tweets_clean))
 tweets_text = tweets_clean$text
 for(i in 1:length(tweets_text)){
 for(j in 1:length(statesPattern)){
 if(str_detect(tweets_text[i], statesPattern[j])){
 states[i] = unlist(str_split(statesPattern[j], "\\|"))[1]
 return(cbind(tweets_clean, states))

Function 2: generate a data frame that filter tweets contains corresponding crop diseases

retrieve.twitter.term <- function(twitterDF, diseaseDB){
 # load disease list and reformat
 pathogen_df = read.delim(diseaseDB, header = FALSE, col.names = "common_names")
 pathogen = pathogen_df %>% 
 mutate(patho_names = str_replace(common_names, " or ", "|")) %>% 
 mutate(patho_names2 = str_replace(patho_names, " \\(and ", "|")) %>%
 mutate(patho_names3 = str_replace(patho_names2, " \\(", "|")) %>%
 mutate(patho_names4 = str_replace(patho_names3, "\\)", "")) %>% 
 mutate(patho_names5 = str_replace(patho_names4, " and ", " ")) %>%
 mutate(patho_names6 = str_replace(patho_names5, "'s", "")) %>% .[-c(109:142),] %>%
 # tweets text
 tw = twitterDF$clean_text
 path_names = pathogen$patho_names6
 # tweet date and states
 dates = as.Date(twitterDF$created)
 states = twitterDF$states
 # create a matrix
 mx = matrix(nrow = length(dates), ncol = length(pathogen$patho_names6), byrow = TRUE)
 for( i in 1:length(tw)){
 for( j in 1:length(path_names)){
 if(score(tw[i], path_names[j]) >= 19){
 mx[i, j] = path_names[j]
 dimnames(mx) <- list(as.character(dates), path_names)
 mx.df = data.frame(mx)
 mx.df = cbind(mx.df, dates = as.character(dates), states = states)
 mx.df = mx.df %>%
 gather(pathogen_list, pathogen_reported, 1:146, na.rm = TRUE) %>%
 select(dates, states, pathogen_reported) %>% arrange(desc(dates))
 # adding coordinates to the df
 states_names <- ggplot2::map_data("state")
 # get avg of the coordinate for each state, since we want to draw a dot on them
 # computer centroid with mean(range(long))
 states_av <- states_names %>% group_by(region) %>% summarise(avg_long = mean(range(long)), avg_lat = mean(range(lat)))
 # change the format
 states_av$region = str_to_title(states_av$region)
 colnames(states_av)[1] = "states"
 mx.df.cor =, states_av, by = "states")
 mx.df.cor = mx.df.cor %>% arrange(desc(dates))

Function 3: a score system used in the function 2 to determine matching condition

score = function(string, pattern){
 matching = unlist(str_split(tolower(pattern), "\\s|\\|"))

score = 0
 if( str_detect(string, pattern) ){
 score = score + 20
 } else {
 for(i in 1:length(matching)){
 # perfect case
 if( i == 1 & grepl(matching[i], string)){
 score = score + 10
 } else if( i == 2 & grepl(matching[i], string)){
 score = score + 5
 } else if ( i == 3 & grepl(matching[i], string)){
 score = score + 4
 } else if ( i == 4 & grepl(matching[i], string)){
 score = score + 3
 } else if ( i == 5 & grepl(matching[i], string)){
 score = score + 2
 } else if ( i == 6 & grepl(matching[i], string)){
 score = score + 1
 } else {
 score = score + 0

Visualization use ggplot2 and its map

retrieve.twitter.DA = function(mx, yr = "2017", mn = "9"){
 # aggregate
 mx.aggre = mx %>% group_by(dates, states, avg_long, avg_lat, pathogen_reported) %>% mutate(year = year(dates), month = month(dates)) %>% group_by(year, month, avg_long, avg_lat, pathogen_reported) %>% summarise(total = sum(n())) %>% arrange(desc(year), desc(month))
 mx.aggre.custom <- mx.aggre %>% filter(year == yr & month == mn)
 # load states
 states = map_data("state")
 basemap = ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group), fill = "grey", col = 'white') + coord_fixed(1.3) + guides(fill = FALSE)
 outmap = basemap + 
 geom_point(data = mx.aggre.custom, aes(x = avg_long, y = avg_lat, size = total)) +
 geom_text_repel(data = mx.aggre.custom, aes(x = avg_long, y = avg_lat, label = pathogen_reported)) +
 ggtitle(paste0("Plant pathogen reported on twitter ", "(Year: ", yr, " Month:", mn, ")")) + theme(plot.title = element_text(hjust = 0.5))




In the case, we are interested in “@corndisease”, this is where farmers would tweet if corn disease occurring to their fields

tw = retrieve.twitter.user(consumer_key, consumer_secret, access_token, access_secret, '@corndisease', 456)

tw_df = tw

mx = retrieve.twitter.term(tw_df, "Corn_diseases.lst")


retrieve.twitter.DA(mx, "2016", "8")

Fixed Seagate 2T external drive mount on Ubuntu 16.04LTS


plugging external drive won’t show the icon, but it can be detected in the $ lsusb

~$ lsusb
Bus 004 Device 002: ID 0bc2:ab24 Seagate RSS LLC 
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 004: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
Bus 003 Device 003: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
Bus 003 Device 002: ID 0e0f:0003 VMware, Inc. Virtual Mouse
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 002: ID 0e0f:0002 VMware, Inc. Virtual USB Hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub


Not showing in the fdisk

sudo fdisk -l result:

Disk /dev/sda: 200 GiB, 214748364800 bytes, 419430400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x8181a6d5

Device     Boot     Start       End   Sectors  Size Id Type

/dev/sda1  *         2048 385878015 385875968  184G 83 Linux
/dev/sda2       385880062 419428351  33548290   16G  5 Extended
/dev/sda5       385880064 419428351  33548288   16G 82 Linux swap / Solaris

dmesg | tail result:

[   52.431393] usb 4-1: stat urb: status -32
[   52.431418] scsi 33:0:0:0: tag#0 data cmplt err -32 uas-tag 1 inflight: CMD 
[   52.431420] scsi 33:0:0:0: tag#0 CDB: Inquiry 12 00 00 00 24 00
[   73.964896] scsi 33:0:0:0: tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD 
[   73.964905] scsi 33:0:0:0: tag#0 CDB: Inquiry 12 00 00 00 24 00
[   73.965084] scsi host33: uas_eh_bus_reset_handler start
[   74.088200] usb 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
[   74.111388] usb 4-1: device firmware changed
[   74.140001] scsi host33: uas_post_reset: alloc streams error -19 after reset
[   74.140022] usb 4-1: USB disconnect, device number 2

it seems an uas problem.


go to /var/log/syslog to find idVendor and idProduct:

 [ 1582.945936] usb 4-1: New USB device found, idVendor=0bc2, idProduct=ab24

turn off uas:

echo options usb-storage quirks=0bc2:ab24:u > /etc/modprobe.d/ignore_uas.conf

Do this (I don’t understand why it needs this though):

locale-gen --purge --no-archive

unplug the external hard drive and reboot VM

reference source:

Hope it helps you.


Install latest R and Rstudio in Ubuntu 16.04 LTS



Add R repository to /etc/apt/sources.list:

sudo echo "deb xenial/" | sudo tee -a /etc/apt/sources.list


Add R to ubuntu keyring:

gpg --keyserver --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add


Install R base:

sudo apt update
sudo apt-get r-base r-base-dev

Install Rstudio:

sudo apt-get install gdebi-core
sudo gdebi -n rstudio-xenial-1.1.453-amd64.deb
rm rstudio-xenial-1.1.453-amd64.deb


There you go! Latest R and Rstudio have been installed in your system.

SAS base Q32: retain initial value

Consider the following data step:

    set work.old;
    Count + 1


The variable Count is created using a sum statement. Which statement regarding this variable is true?

A. It is assigned a value 0 when the data step begins execution.

B. It is assigned a value of missing when the data step begins execution.

C. It is assigned a value 0 at compile time.

D.It is assigned a value of missing at compile time.

In SAS documentation you will find:

“The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:

retain variable 0;

so count will be initialized to 0 at compile time:

retain Count 0;

Count = sum(Count, 1);

SAS: use SELECT-WHEN statement

We tend to use IF-ELSE-THEN statement for data filtering or evaluation of conditions. In SAS the SELECT-WHEN can be also useful:

The expression is written as followed:

SELECT ( a variable );

             WHEN ( variable VALUE TRUE )        VARIABLE=;

           OTHERWISE     VARIABLE=;


if you want to create more than one variables in each WHEN statement, use DO;…END; statement

SELECT ( a variable );

             WHEN ( variable VALUE TRUE )   DO;



           OTHERWISE   DO;





We will be using orion nonsales data set for this demo (check SAS e-learning for the data set).

As the boss of the company, if we want to give out gifts to our employees. If female workers, we give them scarfs and pedometer, if male workers, we give them gloves and money clips, if they are claimed to be other gender types, we give them coffee and calender.


data gifts;

   set orion.nonsales; 


         when('F') do; 




         when('M') do; 


             gift2='Money Clip';


         otherwise do; 





The result:


Part 2: Communicate esp8266 through WIFI module

The cool thing about feather esp8266 with wifi module is that you can remotely control it through webrepl:

This is how I set it up in Win10:

  1. connect esp8266 to PuTTY.
  2. enable webrepl in PuTTY: >>import webrepl_setup # press E, set your password for using the webrepl client (see below) then press y to reboot according to the interactive prompt
  3. Get the webrepl client (interface) from (
  4. Now you can detect MicroPython-xxx in your wifi list. Enter the default password to continue (password: micropythoN)
  5. Open webrepl client, click connect, enter your password. Now you are connecting to your esp8266 through web service. However, it is slow. We need to make use of the wifi.

Connect to webrepl through wifi:

  1. go back to putty: >> import network
  2. >> wlan = network.WLAN(network.STA_IF)
  3. >>
  4. >> wlan.connect(SSID, PASSWORD) # SSID is your network access point
  5.  >> wlan.ifconfig() will return a tuple of 4 addresses. The first is the IP assigned to the board.
  6.  In putty: >> import webrepl
  7. webrepl.start() this will return 2 addresses the machine is listening to. Remember the one that is the IP address
  8. open webrepl client, change the default ws:// address to the ws://IP address
  9. Click then!


NOW you are using WIFI network talking to your ESP8266 from your computer! Isn’t it awesome!

Part-1: Load Micropython into ESP8266 feather microcontroller board

Adafruit Feather ESP8266( Adafruit official site ) microcontroller with WIFI module looks so tiny but it processes huge functions. You can load micropython interpreter into it so that you can control it with python codes 🙂

feather ESP8266


Now to do that, you need to empty its memory first and flash it with micropython firmware.

we first connect the board to computer through serials port

The process was done in ubuntu 16.04 for me (you might have to do it as sudo):

  1. connect esp8266 to your computer using a USB to micro B cable. I use my kindle2 cable.
  2. open terminal to find which COM port it has connected to: $cd /dev/; $dmesg | grep tty   ##remember the path to the port, for me it is /dev/ttyUSB0
  3. then install  $ sudo pip install esptool
  4. go to firmware download the latest stable micropython firmware
  5. use esptool to erase the flash: $sudo –port /dev/ttyUSB0 erase_flash
  6. In terminal,  go to where you store the micropython firmware file. Then load this file into the machine: $sudo –port /dev/ttyUSB0 –baud 115200 write_flash –flash_size 4MB 0 esp8266-20170612-v1.9.1.bin                                               # Note: 115200 is the transfer speed and in flash_size write down the actual megabytes 
  7. It will take a few seconds and your machine will be flashing blue led.
  8. Remember to push reset button on your esp8266 board
  9. You need to download screen tool as well: $sudo apt-get install screen to communicate with the machine through the port. In Windows, you can use PuTTY.
  10. To start the repl: $sudo screen /dev/ttyUSB0 115200

Then you are all set to play!