Home / Archive by category "AWS"

Getting started with SOLR from .NET

SOLR can be run locally on your dev box, and since it’s java-based it can be run on a standard windows server. But to me, a large part of the appeal of NoSQL or other SQL-Server  alternatives is the ability to run many instances on the open source stack. If we can take offload some of the load on the main SQL server by spinning up a few cheap instances running LAMP/Lucene/SOLR with no licencing costs…. that’s a much easier sell to the guys in charge of the budget.

So for this tutorial I’ll be showing how to set up SOLR on an EC2 instance. Feel free to use a local copy of Ubuntu server, or even fire up a VirtualBox image of ubuntu server if you don’t have an Amazon AWS account.

Before you fire up your instance on AWS, create a security group and allow HTTP, HTTP, SSH and port 8983 from your IP. 8983 is the default listening port used by the SOLR server, the other protocols should be self explanatory.

There are a  number of community AMI’s available with SOLR pre-installed. To create any one of those instances, just search for “SOLR” from the community AMI’s tab in the “classic” EC2 launch window. I chose the ami-6deb4004 instance, which is the bitnami 3.6 SOLR distro as of today. This can also be launched from the AWS marketplace, if you’re more familiar with that.

So, launch your EC2 instance for  the security group created, and log via SSH using your private key and the login name of “bitnami.” No, if you are using the bitnami SOLR image, don’t log in with “ubuntu” or “ec2user” – use “bitnami”. Once this is done you should be at the standard shell prompt:

Likewise, just by opening a browser windo and pointing to your EC2 instance address you should get a welcome page:

By clicking on “access my application” and “Solr admin” – you’re greeted with the soalr admin panel, all set up and ready!

SOLR comes with some sample data, so lets’ start with that. From your SSH prompt, change to the examples directory, and use the supplied “SimplePostTool” to post a bunch of XML foles to SOLR and populate some data:

ls /opt/bitnami/apache-solr/exampledocs
java -jar post.jar *.xml

You now have data in your SOLR instance, and using the admin page, can do some simple queries. One of the example docs was a product catalog, so search for something like “LCD” or “iPod” to see some data come back in XML format.

Now that we have SOLR up and running, we need to connect from our .NET app. Other libraries are available, but I used solrnet from http://code.google.com/p/solrnet/. After downloading the code, add a reference to both SolrNet.dll and Microsoft.Practices.ServiceLocation.dll in your application.

The next steps follow closely from the examples at http://code.google.com/p/solrnet/wiki/BasicUsage2. Create a POCO class to model the data that you already have in your SOLR instance:

public class Product {
    [SolrUniqueKey("id")]
    public string Id { get; set; }

    [SolrField("manu_exact")]
    public string Manufacturer { get; set; }

    [SolrField("cat")]
    public ICollection<string> Categories { get; set; }

    [SolrField("price")]
    public decimal Price { get; set; }

    [SolrField("inStock")]
    public bool InStock { get; set; }
}

The examples on the Solrnet page are quite helpful, but here’s the raw minimum you’ll need to connect to SOLR, initiate a query, and return results:

Startup.Init<Product>("http://ec2-xx-xx-xxx-xx.compute-1.amazonaws.com/solr");
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>();
var results = solr.Query(new SolrQuery("iPod"));
foreach (var r in results)
{
Console.WriteLine(r.name);
}

Note: if you neglect to add a reference to  Microsoft.Practices.ServiceLocation then the ServiceLocator call won’t resolve. Both libraries are needed for the example to work.

Explore different options for Query and SolrQuery – it supports (as far as I can tell) all of the functionality of the SOLR engine. Range queries, geospatial, etc.

 

Optimize AWS SImpleDB Deletes with BatchDeleteAttributes

I’ve found that the pricing model for SimpleDB can be somewhat complex. EC2 is easy. The longer you leave your machine up, the longer it costs. However, for SimpleDB there is no single machine for your databases. Each request to SimpleDB takes up a certain number of CPU cycles, and at the end of the month those cycles are added up, translated into a number of hours used, and then translated into a bill.

Amazon SimpleDB measures the machine utilization of each request and charges based on the amount of machine capacity used to complete the particular request (SELECT, GET, PUT, etc.), normalized to the hourly capacity of a circa 2007 1.7 GHz Xeon processor.

It’s easy to check up on the machine utilization for your SimpleDB account. Log in to AWS, go to Account, AccountActivity, and you can download an XML file or CSV file of the current month’s usage. This report will list all requests, along with the usage for each one:

<OperationUsage>
	<ServiceName>AmazonSimpleDB</ServiceName>
	<OperationName>DeleteAttributes</OperationName>
	<UsageType>Requests</UsageType>
	<StartTime>07/14/11 18:00:00</StartTime>
	<EndTime>07/14/11 19:00:00</EndTime>
	<UsageValue>158</UsageValue>
</OperationUsage>

On a recent project we saw a spike in SimpleDB costs after about a month of usage. The app was using SimpleDB to store some logging and transaction information, and after a month it was deemed safe to delete this. However, each of these records was deleted with a single requests – which adds up if you’re deleting hundreds at a time. BatcheDelete lets you delete up to 25 per request – not perfect, but at least it’s better than one at a time. The AWS C# library supports this request:


var client = AWSClientFactory.CreateAmazonSimpleDBClient(ID, KEY);
BatchDeleteAttributesRequest deleteRequest = new BatchDeleteAttributesRequest()
.WithDomainName(domainName);
deleteRequest.Item = new List();
foreach (var r in recordIDs)
{
deleteRequest.Item.Add(new DeleteableItem() { ItemName = r });
}
client.BatchDeleteAttributes(deleteRequest);

SimpleDB also has a BatchPut request, helping you to group INSERTs.

Free Programs for new Businesses to get into Cloud Computing

The fierce competition in the cloud marketplace today has resulted in some great deals for small business. Both Amazon and Microsoft currently have programs that offer a free tier of all their major cloud offerings for new accounts or new businesses. Amazon’s is very simple – a free tier of service is offered for the first 12 months once you sign up. There is no need for an application process and it’s open to individuals. To sign up, just go to http://aws.amazon.com/free/. The restrictions are as follows:

AWS Free Usage Tier (Per Month):

  • 750 hours of Amazon EC2 Linux Micro Instance usage (613 MB of memory and 32-bit and 64-bit platform support) – enough hours to run continuously each month
  • 750 hours of an Elastic Load Balancer plus 15 GB data processing
  • 10 GB of Amazon Elastic Block Storage, plus 1 million I/Os, 1 GB of snapshot storage, 10,000 snapshot Get Requests and 1,000 snapshot Put Requests
  • 5 GB of Amazon S3 storage, 20,000 Get Requests, and 2,000 Put Requests
  • 30 GB per of internet data transfer (15 GB of data transfer “in” and 15 GB of data transfer “out” across all services except Amazon CloudFront)
  • 25 Amazon SimpleDB Machine Hours and 1 GB of Storage
  • 100,000 Requests of Amazon Simple Queue Service
  • 100,000 Requests, 100,000 HTTP notifications and 1,000 email notifications for Amazon Simple Notification Service
This should be more than enough for a basic website with typical database needs.
Google’s AppEngine continues to havea  free usage tier. As of this writing it is 500MB of storage and up to 5 million page views a month, but Google is making changes with the introuction of “Apps for Business” so it’s best to check directly for updates.
Microsoft’s has several programs for trying out its service, but if you’re a small starting business you MUST try to join BizSpark. In addition to the networking and visibility benefits, you get a  full MSDN subscription and the following impressive package of Windows Azure services:
  • Windows Azure Small compute instance 750 hours / month
  • Windows Azure Storage 10 GB
  • Windows Azure Transactions 1,000,000 / month
  • AppFabric Service Bus Connections 5 / month
  • AppFabric Access Control Transactions 1,000,000 / month
  • SQL Azure Web Edition databases (1GB) 3
  • SQL Azure Data Transfers 7 GB in / month, 14 GB out / month

A Quick Tour of Amazon’s Mobile App Developer Program

OK, my mobile app isn’t quite ready yet, but this post from the people at AWS caught my attention. One of the main difficulties in developing Android applications is that there’s not one app store (not even one draconian one), but several different app stores available. Amazon hopes to fill that void by developing its own app store for any Android device, and while only time will tell if it is successful, given Amazon’s track record of quality and market reach any mobile developer needs a foothold here. If you sign up now, it’s free for the first year:

If you are using the SDK to build an Android application, I would like to encourage you to join our new Appstore Developer Program and to submit your application for review. Once your application has been approved and listed, you’ll be able to sell it on Amazon.com before too long (according to the Appstore FAQ, we expect to launch later this year). If you join the program now we’ll waive the $99 annual fee for your first year in the program.

You can list both free and paid applications, and you’ll be paid 70% of the sale price or 20% of the list price, whichever is greater. You will be paid each month as long as you have a balance due of at least $10 for US developers and $100 for international developers. The Amazon Developer Portal will provide you with a number of sales and earnings reports.

The store will provide rich merchandising capabilities. Each product page will be able to display multiple images and videos along with a detailed product description.

Joining the program is simple. If you already have an Amazon.com custome or affiliate account (and who doesn’t), you can simply use that account:

After this, it’s about 4-5 confirmations until you’re signed up. Is this you name? Agree to terms of service? Agree to pay us the $99 after your first year? If you charge for apps, what’s your bank account info?

By the way… only a $10 minimum payout is very cool…

After that you’re in!

Of course, the rest of the site is incomplete. They do have samples of the submit an app page, reports, and account pages that are interesting. It looks like you’ll have considerable control over your application’s launch cycle- including pre-orders and limited release windows. The reports look basic but adequate for most developers. I do hope they open up an API that lets you get more information on the who/what/where of downloads… but it’s a welcome and much needed addition to the android marketplace.

Reducing your Amazon S3 costs…. with a catch

Amazon just recently announced a “Reduced Redundancy Storage” option for S3 objects. In short, you can slash the costs of S3 storage by 33% by accepting a slightly greater chance of losing your data. So ask yourself…

Do I feel lucky? Well, do ya, punk?

In truth, the costs of any data loss in Amazon S3 are minuscule, under both the traditional model and under RRS. If you use S3, I highly recommend starting with the Vogels’ article on RRS and durability.

The same goes for durability; core to the design of S3 is that we go to great lengths to never, ever lose a single bit. We use several techniques to ensure the durability of the data our customers trust us with, and some of those (e.g. replication across multiple devices and facilities) overlap with those we use for providing high-availability. One of the things that S3 is really good at is deciding what action to take when failure happens, how to re-replicate and re-distribute such that we can continue to provide the availability and durability the customers of the service have come to expect. These techniques allow us to design our service for 99.999999999% durability.

Under RRS, instead of 99.999999999% durability, your object is only stored in such a way that is will survive a single data loss, or 99.99% durability:

We can now offer these customers the option to use Amazon S3 Reduced Redundancy Storage (RRS), which provides 99.99% durability at significantly lower cost. This durability is still much better than that of a typical storage system as we still use some forms of replication and other techniques to maintain a level of redundancy. Amazon S3 is designed to sustain the concurrent loss of data in two facilities, while the RRS storage option is designed to sustain the loss of data in a single facility. Because RRS is redundant across facilities, it is highly available and backed by the Amazon S3 Service Level Agreement.

Yes, it’s still covered by the SLA! Finally, to summarize the real risk in terms your manager can undterstand, take this from the RRS announcement on the AWS blog:

The new REDUCED_REDUNDANCY storage class activates a new feature known as Reduced Redundancy Storage, or RRS. Objects stored using RRS have a durability of 99.99%, or four 9’s. If you store 10,000 objects with us, on average we may lose one of them every year. RRS is designed to sustain the loss of data in a single facility.

I suspect that for most business applications 99.99% durability is “good enough” and a 33% savings cost is an great trade-off.

Finally, for my fellow .NET developers… Amazon did update their .NET SDK with this announcement. Be sure to download the latest version.

AWS Announces Spot Instances: Market-Priced Cloud Computing

AWS recently announced a new service: Spot Instances.

Today we launched a new option for acquiring Amazon EC2 Compute resources: Spot Instances. Using this option, customers bid any price they like on unused Amazon EC2 capacity and run those instances for as long their bid exceeds the current “Spot Price.” Spot Instances are ideal for tasks that can be flexible as to when they start and stop. This gives our customers an exciting new approach to IT cost management.

The central concept in this new option is that of the Spot Price, which we determine based on current supply and demand and will fluctuate periodically. If the maximum price a customer has bid exceeds the current Spot Price then their instances will be run, priced at the current Spot Price. If the Spot Price rises above the customer’s bid, their instances will be terminated and restarted (if the customer wants it restarted at all) when the Spot Price falls below the customer’s bid. This gives customers exact control over the maximum cost they are incurring for their workloads, and often will provide them with substantial savings. It is important to note that customers will pay only the existing Spot Price; the maximum price just specifies how much a customer is willing to pay for capacity as the Spot Price changes.

Interestingly, this isn’t a technological innovation but is a major business innovation. The instances they are offering are the same instances offered in the tried and true AWS EC2 system. However, now they can offer these instances at a (presumed) lower price with the caveat that you may lose your instance if the market price for that compute power goes above what you are willing to pay for it.

What strikes me about this is the amazing efficiency of the system. Amazon could (in theory) rent out 100% of their aalable computing power through the EC2/spot instance system. If Amazon needs the computer power back, such as during the Christmas shopping season, they can raise the spot price and reclaim many of the resources. If a third party needs more compute power than is available, they increase their bid and drive up the price.

It should be interesting to see applications built around this model. Protein folding is the obvios example, but I can also see this as very useful for graphics rendering or even mundane tasks such as sending out newsletters.