Blog Archives
LiDAR Training
Source: U.S. Forest Service Remote Sensing Applications Center
Analyzing 1.1 Billion NYC Taxi and Uber Trips
I just came across an interesting article: Analyzing 1.1 Billion NYC Taxi and Uber Trips, with a Vengeance. – An opensource exploration of the city’s neighborhoods, nightlife, airport traffic, and more, through the lens of publicly available taxi and Uber data.
Quoted from the author Todd W. Schneider :
“The New York City Taxi & Limousine Commission has released a staggeringly detailed historical dataset covering over 1.1 billion individual taxi trips in the city from January 2009 through June 2015. Taken as a whole, the detailed triplevel data is more than just a vast list of taxi pickup and drop off coordinates: it’s a story of New York. How bad is the rush hour traffic from Midtown to JFK? Where does the Bridge and Tunnel crowd hang out on Saturday nights? What time do investment bankers get to work? How has Uber changed the landscape for taxis? And could Bruce Willis and Samuel L. Jackson have made it from 72nd and Broadway to Wall Street in less than 30 minutes? The dataset addresses all of these questions and many more.
I mapped the coordinates of every trip to local census tracts and neighborhoods, then set about in an attempt to extract stories and meaning from the data. This post covers a lot, but for those who want to pursue more analysis on their own: everything in this post—the data, software, and code—is freely available. Full instructions to download and analyze the data for yourself are available on GitHub.”
Rbridge for ArcGIS
Source: http://blogs.esri.com/esri/esriinsider/2015/07/20/buildingabridgetothercommunity/
Rbridge for ArcGIS: https://github.com/RArcGIS/rbridge
RArcGIS: https://rarcgis.github.io/
Today at the Esri User Conference in San Diego, Esri announced a new initiative to build a collaborative community for R and ArcGIS users.
Esri has been teaching and promoting integration with R at the User Conference and Developer Summit for several years. During this time we have seen significant increase in interest, and received useful feedback from our ArcGIS users and R users about a variety of needs and techniques for integrating ArcGIS and R. Based upon this feedback, we are working with ArcGIS and R users to develop a community to promote learning, sharing, and collaboration. This community will include a repository of free, open source, R scripts, geoprocessing tools, and tutorials.
I recently sat down with Steve Kopp, Senior Product Engineer on the spatial analysis team, and Dawn Wright, Esri’s Chief Scientist to talk about what this focus on building a bridge to the R community means for ArcGIS users and other users of R.
Matt Artz: What is R?
Steve Kopp: R (aka the R Project for Statistical Computing) is an extremely popular and the fastest growing environment for statistical computing. In addition to the core R software, it includes a more than 6,000 communitycontributed packages for solving a wide range of statistical problems, including a variety of spatial statistical data analysis methods.
Dawn Wright: R is widely used by environmental scientists of all stripes, as well as statisticians. Since R has limited data management and mapping capabilities, many of our users find good synergy in using R and ArcGIS together.
Matt Artz: Does the ArcGIS community use R today?
Steve Kopp: Yes, R has become very popular in the ArcGIS community over the last several years. Many in our user community have been asking for a mix of its functionality with our own, as well as better codesharing interaction with the R community.
Dawn Wright: A great example from the marine ecology community is Duke University’s Marine Geospatial Ecology Tools, where they have already long since moved forward with integrating R and ArcGIS for Desktop for some time.
Matt Artz: What is the R – ArcGIS Bridge?
Steve Kopp: This is a free, open source R package which allows ArcGIS and R to dynamically access data without creating intermediate files on disk.
Matt Artz: Why did Esri build the R – ArcGIS Bridge?
Steve Kopp: It was built for three reasons: to improve the performance and scalability of projects which combine R and ArcGIS; to create a developer experience that was simple and familiar to the R user; and to enable an enduser experience that is familiar to the ArcGIS user.
Dawn Wright: The bottom line is that this project is about helping our user community and the R user community to be more successful combining these technologies.
Matt Artz: So is this initiative just some software code?
Steve Kopp: No, the R – ArcGIS Bridge is simply some enabling technology, the real effort and value of the R – ArcGIS Community initiative will be in the development and sharing of useful tools, tutorials, and tips. It’s a free, open source library which makes it fast and easy to move data between ArcGIS and R, and additional work which makes it possible to run an R script from an ArcGIS geoprocessing tool.
Dawn Wright: This community will be important and useful for R users who need to access ArcGIS data, for ArcGIS users who need to access R analysis capabilities from ArcGIS, and for developers who are familiar with both ArcGIS and R who want to build integrated tools or applications to share with the community.
Steve Kopp: The community of tools will be user developed and user driven. Esri will develop a few sample toolboxes and tutorials, but our primary interest is to facilitate the community and help them build what they find useful.
Matt Artz: How do you see the ArcGIS community using the R – ArcGIS Bridge? What does it give them that they don’t have today?
Steve Kopp: The R – ArcGIS Bridge allows developers with experience with R and ArcGIS to create custom tools and toolboxes that integrate ArcGIS and R, both for their own use, and for building toolboxes to share with others both within their organization and with other ArcGIS users.
Dawn Wright: R developers can quickly access ArcGIS datasets from within R, save R results back to ArcGIS datasets and tables, and easily convert between ArcGIS datasets and their equivalent representations in R.
Steve Kopp: It allows our users to integrate R into their workflows, without necessarily learning the R programming language directly.
Matt Artz: What about the R user who doesn’t use ArcGIS?
Steve Kopp: It’s not uncommon in an organization for a nonGIS person to need to be able to work with GIS data; for these people, they will be able to use the bridge to directly access ArcGIS data without creating intermediate shapefiles or tables, and without needing to know any ArcGIS.
Matt Artz: How can people start using the R Bridge?
Steve Kopp: The R – ArcGIS community samples, tutorials, and bridge are all part of a public GitHub community site similar to other Esri open source projects. And if you happen to be at the Esri User Conference in San Diego this week, this project will be discussed as part of a workshop on Wednesday.
– See more at: http://blogs.esri.com/esri/esriinsider/2015/07/20/buildingabridgetothercommunity/#sthash.0lSJ6kGl.dpuf
Google Developers R Programming Video Lectures
Google Developers recognized that most developers learn R in bits and pieces, which can leave significant knowledge gaps. To help fill these gaps, they created a series of introductory R programming videos. These videos provide a solid foundation for programming tools, data manipulation, and functions in the R language and software. The series of short videos is organized into four subsections: intro to R, loading data and more data formats, data processing and writing functions. Start watching the YouTube playlist here, or watch an individual lecture below:
1.1 – Initial Setup and Navigation
1.2 – Calculations and Variables
1.3 – Create and Work With Vectors
1.4 – Character and Boolean Vectors
1.5 – Vector Arithmetic
1.6 – Building and Subsetting Matrices
1.7 – Section 1 Review and Help Files
2.1 – Loading Data and Working With Data Frames
2.2 – Loading Data, Object Summaries, and Dates
2.3 – if() Statements, Logical Operators, and the which() Function
2.4 – for() Loops and Handling Missing Observations
2.5 – Lists
3.1 – Managing the Workspace and Variable Casting
3.2 – The apply() Family of Functions
3.3 – Access or Create Columns in Data Frames, or Simplify a Data Frame using aggregate()
4.1 – Basic Structure of a Function
4.2 – Returning a List and Providing Default Arguments
4.3 – Add a Warning or Stop the Function Execution
4.4 – Passing Additional Arguments Using an Ellipsis
4.5 – Make a Returned Result Invisible and Build Recursive Functions
4.6 – Custom Functions With apply()
Source: http://gettinggeneticsdone.blogspot.it/2013/08/googledevelopersrprogrammingvideo.html
Indepth introduction to machine learning in 15 hours of expert videos
Chapter 1: Introduction (slides, playlist)
 Opening Remarks and Examples (18:18)
 Supervised and Unsupervised Learning (12:12)
Chapter 2: Statistical Learning (slides, playlist)
 Statistical Learning and Regression (11:41)
 Curse of Dimensionality and Parametric Models (11:40)
 Assessing Model Accuracy and BiasVariance Tradeoff (10:04)
 Classification Problems and KNearest Neighbors (15:37)
 Lab: Introduction to R (14:12)
Chapter 3: Linear Regression (slides, playlist)
 Simple Linear Regression and Confidence Intervals (13:01)
 Hypothesis Testing (8:24)
 Multiple Linear Regression and Interpreting Regression Coefficients (15:38)
 Model Selection and Qualitative Predictors (14:51)
 Interactions and Nonlinearity (14:16)
 Lab: Linear Regression (22:10)
Chapter 4: Classification (slides, playlist)
 Introduction to Classification (10:25)
 Logistic Regression and Maximum Likelihood (9:07)
 Multivariate Logistic Regression and Confounding (9:53)
 CaseControl Sampling and Multiclass Logistic Regression (7:28)
 Linear Discriminant Analysis and Bayes Theorem (7:12)
 Univariate Linear Discriminant Analysis (7:37)
 Multivariate Linear Discriminant Analysis and ROC Curves (17:42)
 Quadratic Discriminant Analysis and Naive Bayes (10:07)
 Lab: Logistic Regression (10:14)
 Lab: Linear Discriminant Analysis (8:22)
 Lab: KNearest Neighbors (5:01)
How to edit the rule sets of decision tree in R
Sometimes it is desirable to edit the rule sets derived from the training data before making prediction for the test data. There are many packages for decision tree classification in R. Today I explored the way to edit the rule sets. See the sample codes and illustration below. The way to do it is to save the rule sets as ASCII file, edit the rule sets in the ASCII file, then import back to R to do prediction.See the sample codes and illustration below.
####################################################
library(C50)
data(churn)
ruleModel < C5.0(churn ~ ., data = churnTrain, rules = TRUE) #construct the rule sets
ruleModel
summary(ruleModel) #print out the rule sets
ruleText = ruleModel$rules
write(ruleText, file=”ruleText.txt”) #save the rule sets to ASCII file, edit the rule sets as needed.
ruleModel$rules = “” #empty the original rule sets
predict(ruleModel,newdata=churnTest) #no rules in the model, will get error on this line
ruleText = paste(readLines(“ruleText.txt”),collapse=”\n”) #imported the modified rule sets to R
ruleModel$rules=ruleText #assign the modified rules back to the rule model
predict(ruleModel,newdata=churnTest) #do prediction based on the new rule sets
####################################################
Extract Multiband Raster Values from Points using R
rasStack < stack(“D:/Dropbox/Research/Data/Mesonet/KKF/kkf01.tif”)
pointCoordinates=read.csv(“D:/Dropbox/Research/Data/Mesonet/ok_stations_coordinates.csv”)
coordinates(pointCoordinates)= ~ LONGITUDE+ LATITUDE
rasValue=extract(rasStack, pointCoordinates)
write.csv(rasValue,”D:/Dropbox/Research/Data/Mesonet/combinedPointValue.csv”)
combinePointValue=cbind(pointCoordinates@data,rasValue)
write.csv(combinePointValue,”D:/Dropbox/Research/Data/Mesonet/combinedPointValue.csv”)
Extract Raster Values from Points using R
Point_ID

LONGITUDE

LATITUDE

1

48.765863

94.745194

2

48.820552

122.485318

3

48.233939

107.857685

4

48.705964

99.817363

Point_ID

LONGITUDE

LATITUDE

raster1

raster2

raster3

1

48.765863

94.745194

200

500

100

2

48.820552

122.485318

178.94

18.90

10.94

3

48.233939

107.857685

30.74

30.74

0. 4

4

48.705964

99.817363

0

110

0.7
