Friday, October 17, 2014

A Non-technical Overview of the SageMathCloud Project

What problems is the SageMathCloud project trying to solve? What pain points does it address? Who are the competitors and what is the state of the technology right now?


What problems you’re trying to solve and why are these a problem?

  • Computational Education: How can I teach a course that involves mathematical computation and programming?
  • Computational Research: How can I carry out collaborative computational research projects?
  • Cloud computing: How can I get easy user-friendly collaborative access to a remote Linux server?

What are the pain points of the status quo and who feels the pain?

  • Student/Teacher pain:
    • Getting students to install software needed for a course on their computers is a major pain; sometimes it is just impossible, due to no major math software (not even Sage) supporting all recent versions of Windows/Linux/OS X/iOS/Android.
    • Getting university computer labs to install the software you need for a course is frustrating and expensive (time and money).
    • Even if computer labs worked, they are often being used by another course, stuffy, and students can't possibly do all their homework there, so computation gets short shrift. Lab keyboards, hardware, etc., all hard to get used to. Crappy monitors.
    • Painful confusing problems copying files around between teachers and students.
    • Helping a student or collaborator with their specific problem is very hard without physical access to their computer.
  • Researcher pain:
    • Making backups every few minutes of the complete state of everything when doing research often hard and distracting, but important for reproducibility.
    • Copying around documents, emailing or pushing/pulling them to revision control is frustrating and confusing.
    • Installing obscuring software is frustarting and distracting from the research they really want to do.
  • Everybody:
    • It is frustrating not having LIVE working access to your files wherever you are. (Dropbox/Github doesn't solve this, since files are static.)
    • It is difficult to leave computations running remotely.

Why is your technology poised to succeed?

  • When it works, SageMathCloud solves every pain point listed above.
  • The timing is right, due to massive improvements in web browsers during the last 3 years.
  • I am on full sabbatical this year, so at least success isn't totally impossible due to not working on the project.
  • I have been solving the above problems in less scalable ways for myself, colleagues and students since the 1990s.
  • SageMathCloud has many users that provide valuable feedback.
  • We have already solved difficult problems since I started this project in Summer 2012 (and launched first version in April 2013).

Who are your competitors?

There are no competitors with a similar range of functionality. However, there are many webapps that have some overlap in capabilities:
  • Mathematical overlap: Online Mathematica: "Bring Mathematica to life in the cloud"
  • Python overlap: Wakari: "Web-based Python Data Analysis"; also PythonAnywhere
  • Latex overlap: ShareLaTeX, WriteLaTeX
  • Web-based IDE's/terminals: target writing webapps (not research or math education): c9.io, nitrous.io, codio.com, terminal.com
  • Homework: WebAssign and WebWork
Right now, SageMathCloud gives away for free far more than any other similar site, due to very substantial temporary financial support from Google, the NSF and others.

What’s the total addressable market?

Though our primary focus is the college mathematics classroom, there is a larger market:
  • Students: all undergrad/high school students in the world taking a course involving programming or mathematics
  • Teachers: all teachers of such courses
  • Researchers: anybody working in areas that involve programming or data analysis
Moreover, the web-based platform for computing that we're building lends itself to many other collaborative applications.

What stage is your technology at?

  • The site is up and running and has 28,413 monthly active users
  • There are still many bugs
  • I have a precise todo list that would take me at least 2 months fulltime to finish.

Is your solution technically feasible at this point?

  • Yes. It is only a matter of time until the software is very polished.
  • Morever, we have compute resources to support significantly more users.
  • But without money (from paying customers or investment), if growth continues at the current rate then we will have to clamp down on free quotas for users.

What technical milestones remain?

  • Infrastructure for creating automatically-graded homework problems.
  • Fill in lots of details and polish.

Do you have external credibility with technical/business experts and customers?

  • Business experts: I don't even know any business experts.
  • Technical experts: I founded the Sage math software, which is 10 years old and relatively well known by mathematicians.
  • Customers: We have no customers; we haven't offered anything for sale.

Market research?

  • I know about math software and its users as a result of founding the Sage open source math software project, NSF-funded projects I've been involved in, etc.

Is the intellectual property around your technology protected?

  • The IP is software.
  • The website software is mostly new Javascript code we wrote that is copyright Univ. of Washington and mostly not open source; it depends on various open source libraries and components.
  • The Sage math software is entirely open source.

Who are the team members to move this technology forward?

I am the only person working on this project fulltime right now.
  • Everything: William Stein -- UW professor
  • Browser client code: Jon Lee, Andy Huchala, Nicholas Ruhland -- UW undergrads
  • Web design, analytics: Harald Schilly -- Austrian grad student
  • Hardware: Keith Clawson

Why are you the ideal team?

  • We are not the ideal team.
  • If I had money maybe I could build the ideal team, leveraging my experience and connections from the Sage project...

Thursday, October 16, 2014

Public Sharing in SageMathCloud, Finally

SageMathCloud (SMC) is a free (NSF, Google and UW supported) website that lets you collaboratively work with Sage worksheets, IPython notebooks, LaTeX documents and much, much more. All work is snapshotted every few minutes, and copied out to several data centers, so if something goes wrong with a project running on one machine (right before your lecture begins or homework assignment is due), it will pop up on another machine. We designed the backend architecture from the ground up to be very horizontally scalable and have no single points of failure.

This post is about an important new feature: You can now mark a folder or file so that all other users can view it, and very easily copy it to their own project.




This solves problems:
  • Problem: You create a "template" project, e.g., with pre-installed software, worksheets, IPython notebooks, etc., and want other users to easily be able to clone it as a new project. Solution: Mark the home directory of the project public, and share the link widely.

  • Problem: You create a syllabus for a course, an assignment, a worksheet full of 3d images, etc., that you want to share with a group of students. Solution: Make the syllabus or worksheet public, and share the link with your students via an email and on the course website. (Note: You can also use a course document to share files with all students privately.) For example...


  • Problem: You run into a problem using SMC and want help. Solution: Make the worksheet or code that isn't working public, and post a link in a forum asking for help.
  • Problem: You write a blog post explaining how to solve a problem and write related code in an SMC worksheet, which you want your readers to see. Solution: Make that code public and post a link in your blog post.
Here's a screencast.

Each SMC project has its own local "project server", which takes some time to start up, and serves files, coordinates Sage, terminal, and IPython sessions, etc. Public sharing completely avoids having anything to do with the project server -- it works fine even if the project server is not running -- it's always fast and there is no startup time if the project server isn't running. Moreover, public sharing reads the live files from your project, so you can update the files in a public shared directory, add new files, etc., and users will see these changes (when they refresh, since it's not automatic).
As an example, here is the cloud-examples github repo as a share. If you click on it (and have a SageMathCloud account), you'll see this:


What Next?

There is an enormous amount of natural additional functionality to build on top of public sharing.

For example, not all document types can be previewed in read-only mode right now; in particular, IPython notebooks, task lists, LaTeX documents, images, and PDF files must be copied from the public share to another project before people can view them. It is better to release a first usable version of public sharing before systematically going through and implementing the additional features needed to support all of the above. You can make complicated Sage worksheets with embedded images and 3d graphics, and those can be previewed before copying them to a project.
Right now, the only way to visit a public share is to paste the URL into a browser tab and load it. Soon the projects page will be re-organized so you can search for publicly shared paths, see all public shares that you have previously visited, who shared them, how many +1's they've received, comments, etc.

Also, I plan to eventually make it so public shares will be visible to people who have not logged in, and when viewing a publicly shared file or directory, there will be an option to start it running in a limited project, which will vanish from existence after a period of inactivity (say).

There are also dozens of details that are not yet implemented. For example, it would be nice to be able to directly download files (and directories!) to your computer from a public share. And it's also natural to share a folder or file with a specific list of people, rather than sharing it publicly. If somebody is viewing a public file and you change it, they should likely see the update automatically. Right now when viewing a share, you don't even know who shared it, and if you open a worksheet it can automatically execute Javascript, which is potentially unsafe.  Once public content is easily found, if somebody posts "evil" content publicly, there needs to be an easy way for users to report it.

Sharing will permeate everything

Sharing has been thought about a great deal during the last few years in the context of sites such as Github, Facebook, Google+ and Twitter. With SMC, we've developed a foundation for interactive collaborative computing in a browser, and will introduce sharing on top of that in a way that is motivated by your problems. For example, as with Github or Google+, when somebody makes a copy of your publicly shared folder, this copy should be listed (under "copies") and it could start out public by default. There is much to do.

One reason it took so long to release the first version of public sharing is that I kept imagining that sharing would happen at the level of complete projects, just like sharing in Github. However, when thinking through your problems, it makes way more sense in SMC to share individual directories and files. Technically, sharing at this level works works well for read-only access, not for read-write access, since projects are mapped to Linux accounts. Another reason I have been very hesitant to support sharing is that I've had enormous trouble over the years with spammers posting content that gets me in trouble (with my University -- it is illegal for UW to host advertisements). However, by not letting search engines index content, the motivation for spammers to post nasty content is greatly reduced.

Imagine publicly sharing recipes for automatically gradable homework problems, which use the full power of everything installed in SMC, get forked, improved, used, etc.

Wednesday, October 1, 2014

SageMathCloud Course Management