Author Archives: ceswp

The CESWP Blog has moved to a new home!

We’ve moved to ceswp.ca. Please update your bookmarks.

Divide and Conquer

Our project has three quite different goals:

  1. build a private cloud
  2. deploy the Canadian Space Science Data Portal in a cloud, and
  3. run simulations in a cloud.

In our initial plans, it seemed obvious and necessary that we must first build the cloud before deploying anything in it.  This is rather dangerous, as there is little for users to see until very late in the project, when it is difficult to react.  We very much wanted to show visible progress to our users as early as possible, and use their comments to guide our work.  Upon reflection, we realised that we could achieve this if we split our project into two independent tracks.

One project track will focus on the establishment of a multi-node, geographically distributed, private cloud.  This work will allow us to fully understand the costs of owning and operating a private cloud, and to deal with the myriad technical and organisational issues that are involved.  The private cloud will be fully compatible with the commercial cloud, so moving from the commercial cloud to the private cloud will be fairly simple

The other project track would focus on deploying the data portal and running simulations in an Amazon commercial cloud.  This work will produce visible results much, much sooner than otherwise possible, giving our physics researchers an early opportunity to guide the direction of our work. This work will also allow us to assess the costs of using a commercial cloud, allowing us to make an informed comparison to the private cloud.

In fact, we have already begun.  Our first result is here.  It is admittedly trivial, but it is a real simulation running on a ‘real’ virtual server, and we learned a lot during it’s construction, both technically and in terms of usability.

We have also been considering exactly how the simulation platform should integrate with the Data Portal.  Our original assumption was that the simulation platform would simply be a tightly integrated extension of the Data Portal.  Lately we’ve been wondering if some other arrangement would be better.  Some users are vitally interested in data, and not at all interested in the simulations.  Other users want to make use of the simulations, but don’t care about the observational data.  We will need to consider the various use cases more carefully to decide this issue, and to do this, we will have to listen what the users –of both the Data Portal and the Simulation Platform– have to say.

Over the next few weeks, we will continue our early work with the simulation platform.  After that, our next goal will likely be the deployment of the Data Portal into an Amazon cloud.

A Survey of Space Researcher Models

Earth's MagnetosphereOver the past 2 months we have compiled a survey of the some of the space researchers’ models from the space science modeling group at the University of Alberta.  During that time we’ve learned a lot about space physics and the Earth’s magnetosphere in general.  We’ve also learned a lot about the models the researchers have been developing in order to better understand and make predictions about space weather.

While talking to the researchers we quizzed them on the nuts and bolts of the software that makes their models tick.  Some of the aspects we learned about are:

  • What the researchers are modeling
  • What operating system their simulations run on (e.g. Linux, Windows, Mac, etc.)
  • What programming language the model is written in
  • What applications/libraries they use in addition to the code they write
  • How many lines of code they have written for their model
  • How long it takes for their simulation to run
  • Where their simulations run (e.g. WestGrid, a desktop PC, etc.)
  • How many people outside of their department are interested in running their model
  • How long the model has been in development
  • Whether or not their code is under source control
  • What the input/output of their model is going to be
  • And, of course, many many notes on what the model does and how it does it

Now that we have a better understanding of the models, we’ve selected one regarding magneto-seismology to be the first model we are going to put in the cloud.  With that in mind we’ve laid out the CESWP Usage Map, which is a roadmap of how we envision a researcher will develop, stage and deploy a model and how an interested party may run it and use the output.  Our next step is to build a prototype web application that handles the stage, deploy and run processes.

Adding an Application to the Cloud

Amazon Web Services This week we took advantage of the need to set up some of our project software to get some solid, hands-on experience working with Amazon Elastic Compute Clouds and Elastic Block Storage.

With a suitable base Amazon Machine Image selected, we created a virtual instance, then installed and configured our software.

It was amazing to see how easy it was to create resources out of thin air, and how convincing the visualization is.  Once in the virtual environment, it’s practically impossible to detect any difference from working with a physical machine.   These are still the early days of “computing-as-utility”, yet it’s marvelous to see how well-developed the tools and support are.

In fact, the most difficult part of the whole task had nothing to do with the virtual nature of our hardware, but is an unavoidable chore in any environment.  Installing software on any server –virtual or physical– often requires the skills and knowledge of a System Administrator.  Most organizations with physical hardware have already been forced to acquire such skills, but might be tempted to believe that by getting rid of their hardware, they no longer need the skilled personnel.  This is certainly not the case, and System Administrators can sleep well tonight, knowing their jobs are safe.

This will be an important point to remember as our project progresses: acquiring computing resources is easy; getting them to perform some useful purpose is not.

Agile Software Development

CESWP has decided to use an agile software development methodology to govern our software development.  Agile software development means different things to different people.  But it can be summed up in the 12 principles of the Agile Manifesto:

  1. Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
  2. Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
  3. Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  4. Business people and developers must work together daily throughout the project.
  5. Build projects around motivated individuals.  Give them the environment and support they need and trust them to get the job done.
  6. The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  7. Working software is the primary measure of progress.
  8. Agile processes promote sustainable development.  The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  9. Continuous attention to technical excellence and good design enhances agility.
  10. Simplicity–the art of maximizing the amount of work not done–is essential.
  11. The best architectures, requirements, and designs emerge from self-organizing teams.
  12. At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

In particular, we are going to be using the Scrum style of agile software development.  In Scrum development consists of a series of sprints, typically a two to four week period (with the length being decided by the team). During each sprint the team creates a potentially shippable product deliverable (for example, working and tested software). The set of features that go into a sprint come from the product backlog, which is a prioritized set of high level requirements of work to be done. Which backlog items go into the sprint is determined during the sprint planning meeting. During this meeting, the Product Owner informs the team of the items in the product backlog that he or she wants completed, which then go into the sprint backlog. The team then determines how much of this they can commit to complete during the next sprint. During a sprint, no one is allowed to change the sprint backlog, which means that the requirements are frozen for that sprint. After a sprint is completed, the team demonstrates the use of the deliverable.

scrum-overview

To this end we evaluated a number of web applications that would help us manage our sprints and the Scrum process.  The short list of applications we tried out were:

  1. Agilo
  2. ExtremePlanner
  3. Mingle
  4. VersionOne
  5. XPlanner+

We evaluated the web applications on a number of criteria, such as functionality, usability, technology platform and cost. Eventually we decided to use XPlanner+ as the tool to manage our sprints.  XPlanner+ is Java based, free and open source, which will allow us to host it in the cloud on Amazon AWS at minimal cost and give us some more experience “cloudifying” applications.  It is a light-weight tool that should provide just enough functionality and usability for us to get the job done.

Selenium

SeleniumAfter a few weeks with our heads in the clouds (and above) we came back down to Earth and put our hands in the dirt.  We decided to take on the task of learning Selenium, an automated web application testing framework.  There were two main benefits to learning this framework.  First, when it comes time for CESWP to go into testing we’ll already be familiar with Selenium so we can write our tests quickly and effectively.  Second, we began creating tests for our sister project CSSDP.  This gives the CSSDP team a head start on their testing and us some real world experience with the framework.

So far our work with Selenium has been largely positive.  After installing the Selenium IDE you get an integrated development environment (IDE) in the Firefox web browser.  You use this IDE to record the actions you take in the browser, such as clicking on a link or typing text into a field.  With the IDE you also verify that the web pages the browser visits are correct (e.g. this page has the phrase “Space Science Data Portal”, that page has a link to Data Availability, etc.).  Once your actions and verifications are recorded as a test you can play them back automatically as often as necessary.

The ultimate goal of writing a number of tests like this is to be able to easily and quickly tell if changes made to one area of a web application have broken something in another area of the web application.  After a software developer makes a change to the application they will automatically run all of the Selenium tests.  If any of the tests fail, the developer must go back and fix what was broken.  If all of the tests are successful, the developer’s job is done!

Learning about CSSDP

CSSDP is the Canadian Space Science Data Portal. It is our sister project–our older sister, so to speak: it was part of the first round of Network Enabled Platform (NEP-1) projects funded by CANARIE (our project, CESWP, is part of the second round, or NEP-2), and it is still in progress. While we are not strictly tied to the CSSDP project, there are a number of connection points, and it is very important for us to understand what the CSSDP team has done (and will do). Since the project began we have been keeping in touch with the CSSDP team and looking for opportunities for both teams to benefit from each other’s work.

As a bit of background, the main purpose of the CSSDP project is to make a range of space weather data available through a single portal. In turn, CESWP will take the CSSDP portal and move it into the cloud. As part of the process of understanding the nuts and bolts of how CSSDP is built and deployed, Barton and Everett have set up a copy of the CSSDP development environment on their machines–well, actually on their virtual machines: they are setting the environment up on both the native OS and on Ubuntu Linux VMs. They are also setting up a framework for CSSDP System Testing, with a few example tests to get things started. They will then be able to reuse this framework for CESWP, and will gain a deeper insight into the details of CSSDP in the process.

Meanwhile, Patrick Mann, Cybera’s CTO, has graciously offered to let us use a Cybera server for a period of time as a sandbox for experimenting with virtualisation and setting up a private Eucalyptus cloud. This will help us immensely as we work out some of the details of which technologies are best suited for CESWP and how best to deploy them.

And of course we continue to learn more about the models and simulations that we hope to make available through CESWP. As we are learning, we are building up a taxonomy of the models vis a vis CESWP. That means asking lots of questions: How long do they run? How much paralellization is already there? How much parallelization is possible? How many cores (processors) do they need? How many people in the world are interested in running them? Can they be specified and run without expert intervention? And so on. We hope to use the classification to help us understand the spectrum of different types of models and simulations that CESWP must support, from short running and fully automated to long running with expert assistance required. Many of the models are described on the FDAM (Facility for Data Analysis and Modeling) site.

Getting Buffeted by Space Weather

We’ve been busy working on a number of fronts these last few weeks, but today we’ll tell you about our efforts to figure out just what the heck Space Physicists do. Don’t worry, we have no pretensions of becoming experts, let alone space physicists! But we do think it’s really important to understand CESWP’s eventual users. The better we understand what they do, the more likely it is we’ll build something that they actually need and want to use.

So, who have we met so far, and what have we learned? Let’s start with the PI, Dr. Robert Rankin. Dr. Rankin has been very generous with his time and has provided us with books and articles to get us started on geomagnetism, the magnetosphere, and solar weather. Early on he even took the time to explain, very simply and lucidly, the basics of how magnetic substorms work. Today we heard Dr. Ian Mann, a member of the Virtual Organization, give a lecture on Ultra Low Frequency (ULF) waves in the magnetosphere. In September we met Dr. Moritz Heimpel, another member of the Virtual Organization, who talked about modeling planetary dynamos. Dr. Heimpel kindly invited us to attend the ISSET Planetary Science Research Symposium, where we heard a number of interesting lectures, including a talk by Dr. Paul Abell (NASA) about the “Scientific Exploration of Near-Earth Objects” (Drs. Rankin, Mann and Heimpel also gave presentations at the symposium).

On September 19, Barton and John attended a “Star Party” at Black Nugget Lake, Alberta, where Dr. Heimpel and Dr. Clare Watt both gave talks. Clare’s talk was on “The Plasma Universe,” where we learned what plasma is, and that it comprises more than 99% of visible matter in the universe. [John and his son also won a pair of binoculars and a replica of Galileo's original telescope, first built 400 years ago. Barton was shut out of the Star Party prizes, poor fellow.] Today we had another meeting with Clare, and she reviewed some recent research on the Aurora Borealis with us. She also told us about her research into how shear Alfvén waves may drive the Aurora by imparting energy to electrons in the magnetosphere. Clare explained how she built a model of the phenomenon, and we gained a better understanding of what is involved in developing and running such a model.

We’ve also been attending weekly Space Physics seminars, meeting other scientists, and reviewing and classifying the types of simulations that may be candidates for parallelization (in one way or another) and throwing into the cloud. Next week we’ll meet with Drs. Dmytro Sydorenko, Alex Degeling, Johnny Rae, and Konstantin Kabin, all of the University of Alberta Space Physics group, to learn more about their research and models.

One last thing: we’ve gathered all of the materials necessary to make a soda bottle magnetometer. Once assembled, we’ll keep it in our project office and duly track any evidence of passing magnetic storms. Oh, and we have about two dozen rare earth magnets stuck to various objects throughout our office, because they’re fun.

CESWP in Banff

Banff_CentreWe arrived in Banff on Tuesday afternoon after a snowy trip from Edmonton. In the shadow of the mountains here at the Banff Centre we heard Rich Wolski, founder of Eucalyptus, talk about cloud computing, how Eucalyptus was conceived, and where it’s going. He had some interesting observations, including his take on why cloud computing is being adopted at a pace that dwarfs anything that grid computing has seen so far (one reason: abandoning federation as a top level concern). Who knows if he’s right? Regardless, it’s certainly true that the cloud concept, as exemplified by Amazon Web Services (AWS), has caught people’s attention. We also attended a grid computing session where beleaguered grid operations people from both sides of the Atlantic stoically defended their offerings against an onslaught of criticism from (at times) hostile users. The one thing that kept jumping out for us was this comment from the users: “It’s too hard to use.” Words for us to heed. We take those comments seriously, and will try not to let them stray far from our consciousness as CESWP proceeds. To succeed, CESWP must be very easy to use.

Another big event for us today: we met Chen Zhang, the fourth member of our team. Chen is working on his PhD at the University of Waterloo. His supervisor is Hans De Sterck, one of the members of the CESWP Virtual Organization. Chen’s area of research is Cloud Computing, and he is an expert on Hadoop and workflow. We had a great discussion with Chen and look forward to working with him and drawing on his expertise as the project proceeds. Everett even had a chance to practice his CESWP elevator pitch on Chen (see the “Welcome” page for the full version).

Over the next few days we’ll immerse ourselves in all things ‘cloud’ here at the Cybera/CANARIE National Summit, as well as listening in on CANARIE’s Network Enabled Platform (NEP) Round Table Session and attending the NEP Lounge, where we’ll have a chance to learn something about all the other NEP projects–including CSSDP, our NEP-1 sister project–and hopefully share and learn from their experiences.

And in those few quiet moments between sessions, we’ll hope for a glimmer of sunshine as we gaze up in awe at the Rocky Mountains and think about clouds.

And they’re off…

Barton and Everett started today, so the CESWP project is officially underway. If you’re interested in keeping track of what’s happening on the project, this is the place to be.

We have a few communication channels that you can follow, depending on your style. The same information will be available across the channels, so don’t worry if you don’t use Twitter or Facebook!

1. CESWP Project Blog (you’re looking at it now): This is the primary tool for communication between the project team and project stakeholders. We will aim to post a short weekly news item each Wednesday. Our target is to keep it short, relevant, and newsy.

2. CESWP News: This is a twitter account that will announce project news headlines and point people back to the project blog or other content sources. We’ll try to keep the update frequency low but regular (about 1-2 per week), and keep the content pertinent and informative. It will largely amount to headlines taken from the blog posts (and linked back to the blog).

3. CESWP Delicious bookmarks: Bookmarks to sites relevant to the project.

All channels support RSS feeds.