John Licato's Summer Work Log: June 2011

Sunday, June 26, 2011

Update 6/26 - Border questions

In my previous post I said that we would limit this first version to the Kumasi area. Problem is that I can't find any definite borders anywhere (some sites are giving me inconsistent gps measurements for where its exact borders are) and there seems to be a recent (within the last few months) movement to slightly expand Kumasi to create a "Greater Kumasi" Metropolitan area (1) (2).

Because of this I've decided to limit our search area to the borders:
latitude = 6.583333 - 6.766667 (degrees N)
longitude = 1.516667 - 1.7 (degrees W)

Using data from google earth, I've gathered the data for and manually entered 137 schools, colleges, universities, churches, hotels, lodges, and major locations like the airport and primary post office. The database's structure allows for multiple names to refer to the same location, so for example "X Primary School" may be referred to as "X School", "X Primary", and whatever other unambiguous pseudonyms that I could think of.

Of course, we end up with the problem that some locations may have names I don't know about, or spellings or whatever. Initially I'm expecting a lot of recognition failures, and the plan is to use these to learn over time. The data we collect in this regard will be quite valuable.

Saturday, June 25, 2011

Update: 6/25

Top priority, as I mentioned, is getting the program as bug-free and stable as possible before the team leaves for Ghana. The first step was to finalize a flowchart, based on the discussions yesterday. The new version assumes that the user is in the Kumasi Metropolitan district.

Working on implementing those changes today, after which I will be probably spending the rest of the weekend creating a simple form and database on the website that will allow for submission of provider locations and landmarks (both of which will need to be subject to manual verification before being added to the primary database). After that is the backup system, the "follow-up system" (the part which asks for some brief quality check after the session is completed), the web system which will allow people who prefer not to text to go online instead and access information there, and to manually input some starting landmark data: hospitals, schools, churches, etc.

Monday, June 20, 2011

Update 6/20

Designed and implemented the database, now I'm testing it. Looks good so far. I'm writing the python script that will ultimately handle the flow I described in my last post, which is arguably the "meat" of the project. The hardest part to implement will be the matching of landmarks to valid locations -- I have no idea what kind of input we'll actually get from users. It's an open question in that respect, to my knowledge nobody has done something like this before. This all points to a lot of tweaking that will have to be done in the first few weeks after deployment. Because of this, I'm designing all of the relevant software to be easier for me to change remotely (I'm using logmein so I'll be able to access the computer from America, assuming that there is sufficient internet access).

That being said, the question of whether the landmark system will produce valid results or not is an interesting question; I think it will provide valuable research data. I'm realizing that the flowchart I described in the last post has too many possibilities for syntactic failure, which conflicts with an implicit design goal--to make it so first time users can, with absolutely minimal instruction, use it successfully and want to use it again. So I've made some changes that make it easier to use, with the trade off that it may require more overall text messages back and forth to complete each session. I'll post an updated graphic soon; right now my priority is getting the system running and as stable as possible before the 30th.

I also just realized that my previous post had the subject line "5/12" when it was supposed to be "6/12". I'll not edit it, it will stand forevermore as a monument to the dangers of not knowing what month it is.

Sunday, June 12, 2011

Update: 5/12

I set up FrontlineSMS on the mini Acer laptop, with a Huawei GSM modem and a T-Mobile SIM Card set up with a "pay as you go" plan with unlimited SMS (with very expensive per minute charges, luckily we won't be using that). After a bit of configuration (windows 7's stock installation sure requires a lot of updates!) I was able to get it to work, and was able to receive and auto-respond to SMS messages. I left it running in the office so that I could remotely log in while I was gone for the week (using logmein.com) but it shut down for some reason, when I return I'll have to try to figure out why; I'm blaming windows's aggressive auto-update and restart policy, which I thought I disabled.

Meanwhile, FrontlineSMS has a feature that allows for a local command to be executed--I currently have it starting a python script that adds the received text to a local mysql database. That works great!

Now that the framework is set up, the next thing I have to do is figure out what kind of database we need. We need to create a simple logging system first of all, something that records the information available: the text that was sent, the date/time, and the number sent from (probably the response given as well). Design of the database also requires that we figure out what we need to know about the provider locations and how we intend to access and provide the correct information.

It is here that the limitations to the system and our experimental approach comes in. The following things apply:

Text messages are limited to 160 characters.
Ghanains are likely to know their region, but not necessarily their districts.
Ghanains may not know any more detailed information like street addresses, as a uniform addressing system doesn't seem to exist like it does in the US. They are likely to use references to landmarks: "nearby the Hospital", or "between the graveyard and the post office".

With these things in mind, I think the best thing to use is a system that gets their region, a district if they know it, and then any landmarks they can use to pinpoint their location. We will then use some sort of probability-based ranking system (something like google maps uses) to provide a list of locations. The following flow chart describes my current idea:

I'm going to try to start implementing it this week. The hard part will be trying to figure out what ranking algorithm to use, and the specifics of how to store the data.