Monthly Archives: February 2014
How To Manage Packages Using apt-get, apt-cache, apt-file and dpkg Commands ( With 13 Practical Examples )
Debian based systems (including Ubuntu) uses apt-* commands for managing packages from the command line.
In this article, using Apache 2 installation as an example, let us review how to use apt-* commands to view, install, remove, or upgrade packages.
1. apt-cache search: Search Repository Using Package Name
If you are installing Apache 2, you may guess that the package name is apache2. To verify whether it is a valid package name, you may want to search the repository for that particular package name as shown below.
The following example shows how to search the repository for a specific package name.
$ apt-cache search ^apache2$ apache2 - Apache HTTP Server metapackage
2. apt-cache search: Search Repository Using Package Description
If you don’t know the exact name of the package, you can still search using the package description as shown below.
$ apt-cache search "Apache HTTP Server" apache2 - Apache HTTP Server metapackage apache2-doc - Apache HTTP Server documentation apache2-mpm-event - Apache HTTP Server - event driven model apache2-mpm-prefork - Apache HTTP Server - traditional non-threaded model apache2-mpm-worker - Apache HTTP Server - high speed threaded model apache2.2-common - Apache HTTP Server common files
3. apt-file search: Search Repository Using a Filename from the Package
Sometimes you may know the configuration file name (or) the executable name from the package that you would like to install.
The following example shows that apache2.conf file is part of the apache2.2-common package. Search the repository with a configuration file name using apt-file command as shown below.
$ apt-file search apache2.conf apache2.2-common: /etc/apache2/apache2.conf apache2.2-common: /usr/share/doc/apache2.2-common/examples/apache2/apache2.conf.gz
4. apt-cache show: Basic Information About a Package
Following example displays basic information about apache2 package.
$ apt-cache show apache2 Package: apache2 Priority: optional Maintainer: Ubuntu Core Developers Original-Maintainer: Debian Apache Maintainers Version: 2.2.11-2ubuntu2.3 Depends: apache2-mpm-worker (>= 2.2.11-2ubuntu2.3) | apache2-mpm-prefork (>= 2.2.11-2ubuntu2.3) | apache2-mpm-event (>= 2.2.11-2ubuntu2.3) Filename: pool/main/a/apache2/apache2_2.2.11-2ubuntu2.3_all.deb Size: 46350 Description: Apache HTTP Server metapackage The Apache Software Foundation's goal is to build a secure, efficient and extensible HTTP server as standards-compliant open source software. Homepage: http://httpd.apache.org/
5. apt-cache showpkg: Detailed Information About a Package
“apt-cache show” displays basic information about a package. Use “apt-cache showpkg” to display detailed information about a package as shown below.
$ apt-cache showpkg apache2 Package: apache2 Versions: 2.2.11-2ubuntu2.3 (/var/lib/apt/lists/us.archive.ubuntu.com_ubuntu_dists_jaunty-updates_main_binary-i386_Packages) (/var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_jaunty-security_main_binary-i386_Packages) Description Language: File: /var/lib/apt/lists/us.archive.ubuntu.com_ubuntu_dists_jaunty-updates_main_binary-i386_Packages MD5: d24f049cd70ccfc178dd8974e4b1ed01 Reverse Depends: squirrelmail,apache2 squid3-cgi,apache2 phpmyadmin,apache2 mahara-apache2,apache2 ipplan,apache2 Dependencies: 2.2.11-2ubuntu2.3 - apache2-mpm-worker (18 2.2.11-2ubuntu2.3) apache2-mpm-prefork (18 2.2.11-2ubuntu2.3) apache2-mpm-event (2 2.2.11-2ubuntu2.3) 2.2.11-2ubuntu2 - apache2-mpm-worker (18 2.2.11-2ubuntu2) apache2-mpm-prefork (18 2.2.11-2ubuntu2) apache2-mpm-event (2 2.2.11-2ubuntu2) Provides: 2.2.11-2ubuntu2.3 - 2.2.11-2ubuntu2 - Reverse Provides: apache2-mpm-itk 2.2.6-02-1build4.3 apache2-mpm-worker 2.2.11-2ubuntu2.3 apache2-mpm-prefork 2.2.11-2ubuntu2.3 apache2-mpm-prefork 2.2.11-2ubuntu2 apache2-mpm-event 2.2.11-2ubuntu2
6. apt-file list: List all the Files Located Inside a Package
Use “apt-file list” to display all the files located inside the apache2 package as shown below.
$ apt-file list apache2 | more apache2: /usr/share/bug/apache2/control apache2: /usr/share/bug/apache2/script apache2: /usr/share/doc/apache2/NEWS.Debian.gz apache2: /usr/share/doc/apache2/README.Debian.gz apache2: /usr/share/doc/apache2/changelog.Debian.gz ...
7. apt-cache depends: List all Dependent Packages
Before installation, if you like to view all the dependent packages, use “apt-cache depends” as shown below.
$ apt-cache depends apache2 apache2 |Depends: apache2-mpm-worker |Depends: apache2-mpm-prefork Depends: apache2-mpm-event
8. dpkg -l: Is the Package Already Installed?
Before installing a package, you may want to make sure it is not already installed as shown below using dpkg -l command.
$ dpkg -l | grep -i apache
9. apt-get install: Install a Package
Finally, install the package using “apt-get install” as shown below.
$ sudo apt-get install apache2 [sudo] password for ramesh: The following NEW packages will be installed: apache2 apache2-mpm-worker apache2-utils apache2.2-common libapr1 libaprutil1 libpq5 0 upgraded, 7 newly installed, 0 to remove and 26 not upgraded.
10. dpkg -l : Verify Whether the Package got Successfully Installed
After installing the package, use “dpkg -l” to make sure it got installed successfully.
$ dpkg -l | grep apache ii apache2 2.2.11-2ubuntu2.3 Apache HTTP Server metapackage ii apache2-mpm-worker 2.2.11-2ubuntu2.3 Apache HTTP Server - high speed threaded mod ii apache2-utils 2.2.11-2ubuntu2.3 utility programs for webservers ii apache2.2-common 2.2.11-2ubuntu2.3 Apache HTTP Server common files
11. apt-get remove: Delete a Package
Use “apt-get purge” or “apt-get remove” to delete a package as shown below.
$ sudo apt-get purge apache2 (or) $ sudo apt-get remove apache2 The following packages were automatically installed and are no longer required: apache2-utils linux-headers-2.6.28-11 libapr1 apache2.2-common linux-headers-2.6.28-11-generic apache2-mpm-worker libpq5 libaprutil1 Use 'apt-get autoremove' to remove them. The following packages will be REMOVED: apache2 0 upgraded, 0 newly installed, 1 to remove and 26 not upgraded. Removing apache2 ...
- apt-get remove will not delete the configuration files of the package
- apt-get purge will delete the configuration files of the package
12. apt-get -u install: Upgrade a Specific Package
The following example shows how to upgrade one specific package.
$ sudo apt-get -u install apache2 Reading package lists... Done Building dependency tree Reading state information... Done apache2 is already the newest version. The following packages were automatically installed and are no longer required: linux-headers-2.6.28-11 linux-headers-2.6.28-11-generic Use 'apt-get autoremove' to remove them. 0 upgraded, 0 newly installed, 0 to remove and 26 not upgraded.
13. apt-get -u upgrade: Upgrade all Packages
To upgrade all the packages to it’s latest version, use “apt-get -u upgrade” as shown below.
$ sudo apt-get -u upgrade The following packages will be upgraded: libglib2.0-0 libglib2.0-data libicu38 libsmbclient libwbclient0 openoffice.org-base-core openoffice.org-calc openoffice.org-common openoffice.org-core openoffice.org-draw openoffice.org-emailmerge openoffice.org-gnome openoffice.org-gtk openoffice.org-impress openoffice.org-math openoffice.org-style-human openoffice.org-writer python-uno samba-common smbclient ttf-opensymbol tzdata 26 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Many people want WordPress to power their site’s root (e.g. http://example.com) but they don’t want all of the WordPress files cluttering up their root directory. WordPress allows you to install it into a subdirectory, but have your blog exist in the site root.
The process to move WordPress into its own directory is as follows:
- Install duplicator on WP
- Create a new package using duplicator
- Download installer.php and package
- Create a wordpress under the root folder on the new server
- Put the installer.php and package under /wordpress folder
- Use phpMyAdmin or mySQL to create a new database for WP
- Go to http://domain.com/wordpress/installer.php to install WP
- Go to WP settings–General–Site Address (URL), if grey-out, go to edit the wp-config.php.
- Delete “WP_SITEURL” and the “WP_HOME” values.
- Change Site URL to: http://domain.com
- COPY index.php and .htaccess to root folder
- MOVE wp-config.php to root folder.
- Change the following and save the file. Change the line that says:
require( dirname( __FILE__ ) . ‘/wp-blog-header.php’ );
to the following, using your directory name for the WordPress core files:
require( dirname( __FILE__ ) . ‘/wordpress/wp-blog-header.php’ );
Linux users can use LibreOffice, Google Docs, and even Microsoft’s Office Web Apps, but some people still need — or just want — the desktop version of Microsoft Office. Luckily, there are ways to run Microsoft Office on Linux.
This is particularly useful if you’re still on the soon-to-be-unsupported Windows XP and don’t want to pay an upgrade fee to upgrade your computer to Windows 7 or 8. This obviously isn’t supported by Microsoft, but it still works fairly well.
Ways to Install Microsoft Office
There are several different ways to install Microsoft Office on Linux:
- Wine: Wine is a Windows compatibility layer that allows you to run Windows programs on Linux. It’s not perfect, but it’s optimized enough to run popular programs like Microsoft Office well. Wine will work better with older versions of Office, so the older your version of Office, the more likely it is to work without any trouble. Wine is completely free, although you may have to do some tweaking yourself.
- CrossOver: CrossOver is a paid product that uses code from the free version of Wine. While it costs money, CrossOver does more of the work for you. They test their code to ensure that popular programs like Microsoft Office run well and ensure upgrades won’t break them. CrossOver also provides support — so if Office doesn’t run well, you have someone to contact who will help you.
- Virtual Machine: You could also install Microsoft Windows in a virtual machine using a program like VirtualBox or VMware and install Microsoft Office inside it. With Seamless Mode or Unity Mode, you could even have the Office windows appear on your Linux desktop. This method provides the best compatibility, but it’s also the heaviest — you have to run a full version of Windows in the background. You’ll need a copy of Windows, such as an old Windows XP disc you have lying around, to install in the virtual machine.
We’ll be focusing on using Wine or Crossover to install Office directly on Linux. If you want to use a virtual machine, all you have to do is install VirtualBox or VMware Player and create a new virtual machine. The program will walk you through installing Windows and you can install Office inside your virtualized Windows as you normally would.
Installing Microsoft Office With Wine
We tested Office 2007 with this process, as Office 2013 is known not to work properly and Office 2010 doesn’t appear to be well supported. If you want to use an older version of Office, like Office 2003, you’ll likely find that it works even better. If you want to install Office 2010, you may need to perform some more tweaks — check the Wine AppDB page for the version of Office you want to install for more information.
First, install the Wine package from your Linux distribution’s software package repository. On Ubuntu, open the Ubuntu Software Center, search for Wine, and install the Wine package.
Next, insert the Microsoft Office disc into your computer. Open it in your file manager, right-click the setup.exe file, and open the .exe file with Wine.
The installer will appear and, if everything goes well, you should be able to go through the installation process on Linux as you normally would on Windows.
We didn’t run into any problems while installing Office 2007, but this will vary depending on your version of Wine, Linux distribution, and especially the release of Microsoft Office you’re trying to use. For more tips, read the Wine AppDB and search for the version of Microsoft Office you’re trying to install. You’ll find more in-depth installation instructions there, filled with tips and hacks other people have used.
You could also try using a third-party tool like PlayOnLinux, which will help you install Microsoft Office and other popular Windows programs. Such an application may speed things up and make the process easier on you. PlayOnLinux is also available for free in the Ubuntu Software Center.
Why You Might Want to Use CrossOver
If the Wine method doesn’t work or you encounter problems, you may want to try usingCrossOver instead. CrossOver offers a free two-week trial, but the full version will cost you $60 if you want to keep using it.
After downloading and installing CrossOver, you’ll be able to open the CrossOver application and use it to install Office. You can do everything you can do with CrossOver with the standard version of Wine, but CrossOver may require less hacking around to get things working. Whether this is worth the cost is up to you.
Using Microsoft Office on Linux
After the installation, you’ll find the Microsoft Office applications in your desktop’s launcher. On Ubuntu, we had to log out and log back in before the shortcuts would appear in the Unity desktop’s launcher.
Office works pretty well on Linux. Wine presents your home folder to Word as your My Documents folder, so it’s easy to save files and load them from your standard Linux file system.
The Office interface obviously doesn’t look as at home on Linux as it does on Windows, but it performs fairly well. Each Office program should should work normally, although it’s possible that some features — particularly little-used ones that haven’t been tested very much — may not work properly in Wine.
Of course, Wine isn’t perfect and you may run into some issues while using Office in Wine or CrossOver. If you really want to use Office on a Linux desktop without compatibility issues, you may want to create a Windows virtual machine and run a virtualized copy of Office. This ensures you won’t have compatibility issues, as Office will be running on a (virtualized) Windows system.
Note: The notes provided here is tested in Linuxmint 14 Cinnamon.
To install CDemu, open the Terminal by pressing CTRL+ALT+T and add its PPA archive:
Then run the commands to update the Repositories and install CDemu:
To start CDemu, Click Menu -> System Settings -> gCDemu. An icon will be displayed in your task bar.
Click the gCDemu icon to get started. Select Device #00 or Device #01 to mount your images. Click“Load” and browse the ISO to load:
The ISO will be mounted in your File manager:
To unload your ISO after the usage, simply press the “Unload” in the gCDemu window or eject from the file manager:
Thats it. Enjoy!!!
- Map / Mount Network Shares in both Windows 7 and Ubuntu 12.04
How to Map Network Shares in Windows 7
To map network shares in Windows 7, go to ‘Start – > Computer –> and select‘Map network drive’ from the top menu shown below.
Next, choose a Drive letter and type in the folder path to the shared resource. The format to connect to network resources in Windows is shown below:
\IP_Address or Hostnameshare_name
Finally, click Finish and you should see the resource mapped.
How to Map Network Shares in Ubuntu 12.04 (Precise Pangolin)
To map network shares in Ubuntu 12.04, press Ctrl –Alt – T on your keyboard to open Terminal. When it opens, run the commands below to edit fstab file.
sudo gedit /etc/fstab
Next, add the line shown below at the end of the file and save it. Replace<username> with your username, and <password> with your password.
//IP_Address or Hotname/Share_Name /home/<username>/Windows cifs username=<windows_username>,password=<windows_password>,uid=<Ubuntu_username>,defaults 0 0
After that, run the commands below to create a Windows mount point in your home directory.
Next, run the commands below to install smbfs packages.
sudo apt-get install smbfs
Restart your computer and you should see the shares.
Right whales unfortunately were named because they were considered the “right” whales to hunt. These slow moving, docile whales tend to stick close to coastlines and high blubber content made them ideal targets for whale hunters. During the height of whale hunting in the early 20th century, tens of thousands of right whales were harvested. In 1937, with the number of right whales estimated to be numbered in the low 300s, a moratorium on hunting was declared although illegal hunting continued for decades. While the right whale population has increased since the abatement of right whale hunting, populations are estimated to be less than 15% of pre whaling numbers. Pre-whaling populations estimate of the right whale are estimated to be 55,000–70,000 while current population estimates are about 7,500.
Because of the docile nature of the right whales, mortality from boat collisions and entanglement in fishing lines continues to threaten the survivability of the right whales. Peter Fretwell of the Mapping and Geographic Information Centre, British Antarctic Survey, has discovered that the shallow water preferring right whales can easily be viewed on satellite imagery, making them an ideal case study for tracking via remote sensing. In a paper cowritten with Iain J. Staniland and Jaume Forcada of the Ecosystems Department at the British Antarctic Survey and published in the journal PLOS One, Fretwell proposed a method of using Very High Resolution (VHR) satellite imagery to identify and count Southern right whales (Eubalaena australis) in their breeding grounds near Golfo Nuevo, Península Valdés in Argentina.
Research area in Golfo Nuevo used for the remote sensing study.
The paper, entitled, “Whales from Space: Counting Southern Right Whales by Satellite” used a September 2012 WorldView2 satellite image covering a 70 square mile area surrounding Golfo Nuevo. The image has a maximum resolution of 50 cm in the panchromatic and 2 m in its eight colour spectral bands and also contains a a water penetrating coastal band in the far-blue part of the spectrum that allowed the researchers to locate whales below the surface of the water.
In order to automate the detection of whales in the satellite imagery, the researchers used“ENVI5 image processing software and ArcGIS automatic detection of whale-like features in the water column was tested using maximum likelihood supervised classification, unsupervised classification (isoData and k-means) and thresholding of specific bands.”
From the abstract:
Using an image covering 113 km2, we identified 55 probable whales and 23 other features that are possibly whales, with a further 13 objects that are only detected by the coastal band … This is the first successful study using satellite imagery to count whales; a pragmatic, transferable method using this rapidly advancing technology that has major implications for future surveys of cetacean populations.
A selection of 20 comparable false colour image chips (bands 1-8-5) of probable whales found by the automated analysis.
Several of the images could be interpreted as whale pairs, or as a mother and calf, others may be displaying behaviour such as tail slapping, rolling or blowing. On several images there is a strong return at one end of the feature which is mostly likely the calluses on the whales head. Reprinted under a CC BY license with permission from British Antarctic Survey and DigitalGlobe.
The authors concluded that “We have shown that the use of current satellite imagery can be used to identify individual whales both at, and just below, the surface. The methods described here readily lend themselves to the calculation of population abundance estimates and suggest that behavioural patterns could also be elucidated. The automation of the methods means that counts can be carried out more quickly and efficiently than using traditional methods.” As opposed to the time intensive and expensive method of using boat expeditions or airplane flyovers to visually count whales, the authors propose that this method of counting whales via remote sensing will allow for more frequent whales count which should lead to a more accurate population estimates. The authors also anticipate that future improvements to the resolution of satellite imagery will allow for increased accuracy in identifying and counting whales.
Fretwell PT, Staniland IJ, Forcada J (2014) Whales from Space: Counting Southern Right Whales by Satellite. PLoS ONE 9(2): e88655. doi:10.1371/journal.pone.0088655
For my blog I do a lot of testing on my own ‘production’ Synology DS410. Because I’m hosting all my media and personal documents on this DS it will be very annoying if anything went wrong. I was thinking of buying a low priced DS (like the DS 212j) for testing when I discovered a community project called XPEnology. XPEnology is a modified Synology DSM firmware which runs on virtual hardware (and some physical hardware to), of course without any support from Synology but great for testing.
In this post I’ll guide you to the process of installing Synology DSM inside a virtual machine.
Installing the hypervisor
First we need some virtualisation software to run the virtual machine. I work a lot with VMware products but because VMware doesn’t support virtualized SATA controllers I can’t use that. A (free) alternative is Oracle VM VirtualBox, this piece of software from Oracle does support virtualized SATA controllers and disks and the XPEnology download contains a pre-configured virtual disk for VM VirtualBox. You can download Oracle VM VirtualBox at https://www.virtualbox.org/
You can download XPEnology at http://xpenology.com, the download link is available in one of the forum posts. In this guide I’m using the patched DS3612xs 4.2 Beta build 3161 pat file. After downloading and unpacking the package you’ll find 3 files;
This is the modified DSM firmware to run on your virtual Synology DS (based on the Synology DS3612xs)
This is a boot image witch emulate your hardware to be a Synology DS
This is a virtual disk for Oracle VM VirtualBox containing the above image
Creating the virtual machine
Now we’re ready to create the VM. Open VirtualBox and click at the ‘New’ button.
The wizard which will guide you trough the configuration of the VM will open. Select ‘Linux’ as type, ‘Linux 2.6 (64-bit)’ as version and click ‘Next’.
Choose an amount of RAM memory for the VM (minimal 512MB, I’m using 2048MB, the same minimum amount of the DS3612xs) and click ‘Next’.
In the next step we need to create a new virtual disk, this disk will represent the usable storage for your media and the Synology DSM operating system.
Choose your favorite type of disk.
Choose for ‘Dynamically allocated’ (Thin provisioned) or ‘Fixed size’.
Select a location, name and size for the disk and click ‘Create’ to create the VM and virtual harddisk.
Configure the VM for DSM
Now we’ve created the VM it’s time to change some settings, select the VM in VirtualBox and click ‘Settings’.
Select ‘Storage’ and click the button next to ‘Controller IDE’ to add the XPEnology boot disk.
Click ‘Choose existing disk’.
Browse to and open the SynoBoot virtual disk.
Finally we need to alter the networking configuration. Click on ‘Network’ and set the adapter to ‘Bridged Adapter’. Change the ‘MAC Address’ to 00113208D62A, this is necessary for DS Assistant to detect the VM as Synology hardware. After you’ve changed the settings click ‘OK’.
The configuration part of the VM is now completed.
The VM is now ready for DSM installation. For the installation process we use the Synology DS Assistant (the same way you would if you’ve to install a regular Disk Station). Download DS Assistant from the Synology Download Center (choose a random model, the DS Assistant is the same for all models).
Before we can detect and install DSM we have to power on the VM. Select the VM in VirtualBox and click ‘Start’.
The VM is ready when you see ‘Booting the kernel.’.
Now you’ve to start the DS Assistant application and if you’ve done everything right it should detect the VM as a Synology DS3612xs.
Now right-click the detected DiskStation and click ‘Install’.
Select the .pat file from the XPEnology archive (don’t try an official Synology .pat file, it won’t work!) and click ‘Next’.
Enter a password for the admin user and provide a name for your XPEnology.
Accept the warning by clicking ‘OK’.
Enter the network settings and click ‘Finish’.
Sit back while DSM is being installed.
And a few minutes later the installation is completed.
Logon to DSM
Open a webbrowser and browse to the ip-address you’ve configured in DS Assistant. Logon with the username ‘admin’ and the password you’ve chosen during the installation and hit ‘Enter’. Enjoy testing Synology DSM!
Please note, this is a fully working version of Synology Disk Station Manager but not usable for production purpose. If like DSM consider buying one of there products http://www.synology.com/products/index.php?lang=enu.
Two of Esri’s principal GIS viewing products, ArcGIS Explorer Desktop and ArcGIS Online, offer a variety of tools for both business and personal use. However, there are some key differences between the two programs to examine before deciding which one is more suitable for your needs.
Esri’s website markets Explorer as a GIS viewer to “explore, visualize, and share GIS information.” The program is designed for one “authoritative” source to distribute data to a broad audience.
ArcGIS Online, on the other hand, is designed with “interactive mapping” in mind. Esri designed ArcGIS Online with ready-to-use content and applications to be used efficiently with the web, smartphones, and tablets.
Primarily, both products offer a free service. Explorer is free to both download and use. To register with ArcGIS online, one of two accounts is required: a free public account for non-commercial use or a subscription account for either commercial or non-commercial use. Customers can choose between four subscription accounts depending on the number of users they wish to include: five users, 50 users, 100 users, or 100 users and more (this requires a customized plan). While public accounts can manage content, add data, share maps, and store data in Esri’s cloud, a subscription account is necessary for storing a large amount of data.
Both Explorer and ArcGIS online offer a number of basemaps, such as imagery, street, and political boundary maps.
The two programs make adding data simple and quick. Both Explorer and a subscription account to ArcGIS Online allow the user to display a large amount of data. Explorer supports conventional GIS data such as geodatabases, shapefiles, KML/KMZs, and more. ArcGIS Online, however, opens a multitude of data types, from PDFs to files for mobile applications.
Furthermore, both programs work with additional mapping services such as ArcGIS for Server, Open Geospatial Consortium, and GeoRSS feeds.
3D viewing is included in each program. While Explorer has an integrated 2D and 3D display, ArcGIS Online offers 3D viewing through the CityEngine Web Viewer. CityEngine was created with urban planning, architecture, and design in mind. The web viewer is a web app for viewing 3D scenes in a browser. Since the app is based on WebGL technology, the user is able to view 3D images without installing extra plug-ins.
Both applications also offer the option to include additional tools. In Explorer, these are known as ‘add-ins,’ and in ArcGIS Online, they are called applications. ArcGIS Online includes both ready-to-use apps, which are included with the user’s account, and downloadable apps from the ArcGIS Marketplace. The ArcGIS Marketplace is a “one-stop destination for apps and data services provided by authorized Esri partners, distributors, and Esri.”
ArcGIS Explorer Screenshot
Both Explorer and ArcGIS Online are localized. In Explorer, all interface elements are available in six different languages. ArcGIS Online, however, is available in ten languages.
While maps in Explorer can be emailed directly from the program, ArcGIS Online maps are easily sharable in a number of formats. It’s simple to embed online maps into blog posts, web pages, and social networking sites such as Facebook and Twitter.
ArcGIS Online screenshot.
It’s also possible to perform spatial analyses within both applications. Explorer offers the option to conduct simple analyses such as visibility, modeling, and proximity searches, while ArcGIS Online can calculate drive times, create buffers, and more.
Ultimately, the largest differences between the two programs are price, ease of sharing maps, and customization. Either way, both products offer suitable options for use with other Esri products.
在最近几年的paper上，如iccv这种重量级的会议，iccv 09年的里面有不少的文章都是与Boosting与随机森林相关的。模型组合+决策树相关的算法有两种比较基本的形式 – 随机森林与GBDT((Gradient Boost Decision Tree)，其他的比较新的模型组合+决策树的算法都是来自这两种算法的延伸。本文主要侧重于GBDT，对于随机森林只是大概提提，因为它相对比较简单。
这里只是准备简单谈谈基础的内容，主要参考一下别人的文章，对于随机森林与GBDT，有两个地方比较重要，首先是information gain，其次是决策树。这里特别推荐Andrew Moore大牛的Decision Trees Tutorial，与Information Gain Tutorial。Moore的Data Mining Tutorial系列非常赞，看懂了上面说的两个内容之后的文章才能继续读下去。
- 在创建随机森林的时候，对generlization error使用的是无偏估计
在建立每一棵决策树的过程中，有两点需要注意 – 采样与完全分裂。首先是两个随机采样的过程，random forest对输入的数据要进行行、列的采样。对于行采样，采用有放回的方式，也就是在采样得到的样本集合中，可能有重复的样本。假设输入样本为N个，那么采样的样本也为N个。这样使得在训练的时候，每一棵树的输入样本都不是全部的样本，使得相对不容易出现over-fitting。然后进行列采样，从M个feature中，选择m个(m << M)。之后就是对采样之后的数据使用完全分裂的方式建立出决策树，这样决策树的某一个叶子节点要么是无法继续分裂的，要么里面的所有样本的都是指向的同一个分类。一般很多的决策树算法都一个重要的步骤 – 剪枝，但是这里不这样干，由于之前的两个随机采样的过程保证了随机性，所以就算不剪枝，也不会出现over-fitting。
随机森林的过程请参考Mahout的random forest 。这个页面上写的比较清楚了，其中可能不明白的就是Information Gain，可以看看之前推荐过的Moore的页面。
Gradient Boost Decision Tree:
GBDT是一个应用很广泛的算法，可以用来做分类、回归。在很多的数据上都有不错的效果。GBDT这个算法还有一些其他的名字，比如说MART(Multiple Additive Regression Tree)，GBRT(Gradient Boost Regression Tree)，Tree Net等，其实它们都是一个东西（参考自wikipedia – Gradient Boosting)，发明者是Friedman
而Gradient Boost与传统的Boost的区别是，每一次的计算是为了减少上一次的残差(residual)，而为了消除残差，我们可以在残差减少的梯度(Gradient)方向上建立一个新的模型。所以说，在Gradient Boost中，每个新的模型的简历是为了使得之前模型的残差往梯度方向减少，与传统Boost对正确、错误的样本进行加权有着很大的区别。
在分类问题中，有一个很重要的内容叫做Multi-Class Logistic，也就是多分类的Logistic问题，它适用于那些类别数>2的问题，并且在分类结果中，样本x不是一定只属于某一个类可以得到样本x分别属于多个类的概率（也可以说样本x的估计y符合某一个几何分布），这实际上是属于Generalized Linear Model中讨论的内容，这里就先不谈了，以后有机会再用一个专门的章节去做吧。这里就用一个结论：如果一个分类问题符合几何分布，那么就可以用Logistic变换来进行之后的运算。
假设输入数据x可能属于5个分类（分别为1,2,3,4,5），训练数据中，x属于类别3，则y = (0, 0, 1, 0, 0)，假设模型估计得到的F(x) = (0, 0.3, 0.6, 0, 0)，则经过Logistic变换后的数据p(x) = (0.16,0.21,0.29,0.16,0.16)，y – p得到梯度g：(-0.16, -0.21, 0.71, -0.16, -0.16)。观察这里可以得到一个比较有意思的结论：
3. 表示对于K个分类进行下面的操作（其实这个for循环也可以理解为向量的操作，每一个样本点xi都对应了K种可能的分类yi，所以yi, F(xi), p(xi)都是一个K维的向量，这样或许容易理解一点）
(0.5, 0.8, 0.1), (0.2, 0.6, 0.3), (0.4, 0.3, 0.3)，那么这样最终得到的分类为第二个，因为选择分类2的决策树是最多的。
除了文章中的引用的内容（已经给出了链接）外，主要还是参考Friedman大牛的文章：Greedy function approximation : A Gradient Boosting Machine
Location provides a common context through which we can compare different domains of data in order to understand complex relationships. Government agencies gather this data as part of their mission and daily operations – but no one cares as much about their neighborhood as the people who live there. By making this data available to the public, we have a unique opportunity to leverage the local expertise to understand, analyze, and act on the issues that affect communities.
Growing over the past 7 years, open data has become a goal with increasing support and validation. Most recently the US Congress haspassed laws directing the government to standardize and publish data for increased transparency. Internationally there is also increasing government adoption.
Nuances of Open Data
There are common principles and patterns that have been developed by the community to clarify the important aspects of open data. Accessible, Complete, Authoritative, and Timely are a few of these common principles.
While there is a tremendous opportunity, there are still many challenges for government agencies to sustainably and effectively make their data accessible to the public. Historically these sites are custom built, require burdensome data migration and subsequently have a limited lifetime resulting in lack of utilization and validation.
Preview of ArcGIS Open Data
Our community works across nearly all levels of government, edges of the globe and domains of knowledge. They currently leverage ArcGIS to manage, update and publish their authoritative data both internally as well as to the web. In a recent discussion with a local government, they pointed out that 70% of their open data catalog came directly from the GIS department. So why should they have to export and migrate their data in order to make it available to citizens?
Instead, we considered how to keep data at its source, within existing workflows and make it easy – indeed transparent – to make this data open and accessible. This is required if we are to accomplish the principles of open data where the data are authoritative and also up-to-date. By connecting directly to the source we ensure that the open data continues just to be the data, made open.
Today at the FedGIS conference we previewed Esri’s new open data initiative. We are adding capabilities to ArcGIS Online that enable any organization to quickly configure a custom view of their public items and open data groups. We built on the existing web services hosted online, on-premise, or through other data services to provide an easy to use interface that is designed for the open data community. Open Data has been part of Esri for years – this initiative means to make it easier and focused on the community.
Based on our experience, both as citizens, and open data developers, we focused on three key capabilities:
- Discoverable users need simple search and recommendations for finding and following relevant data
- Explorable right away people should be able to see a linked map and table view with the ability to filter geographically or by any attribute
- Accessible one-click download as common data formats and developer API’s
These capabilities are coming this spring and will be included for ArcGIS Online. You can start today by publishing your web services as public items and creating open data groups in preparation for launching your own open data sites.
Future of Open Data
These new capabilities aim to remove, or at least mitigate, the technology question of open data publishing. Data owners can leverage their existing infrastructure and web services without having to migrate and update data. This makes initiatives far more sustainable.
However the policy and processes remain difficult questions to resolve. Many agencies are still dealing with historic policies of cost recovery or concerns of releasing their data for public consumption. These are improving, and as more government datasets come online the validation and examples will accelerate adoption.
We believe that through this initiative and working with you that open data will increase by an order of magnitude, dramatically improving citizen engagement and government collaboration.