Tuesday, 13 May 2014

How I Learn ( And What I'm Learning )

This post might be a bit off-topic, as it's more about my own learning.

As part of whatever job I'm doing, there are normally regular forays into learning new things. Times when I have to throw myself into something I know pretty much nothing about. It's often learning something that I have a hunch will be useful for making things with, but sometimes it's more esoteric and tangentially related to my work.

As I threw myself headlong into another foray, I noticed that I was noticing the feelings and approaches I was going through as it happened. I thought I'd share it, but I don't really have a reason why. I just have a gut feeling that I should. So here it is...

The Task in Hand

The cunning plan, in its simplest form, is to create a sort of Twitterwall for our Google Education Conference in our 3Sixty space at the University of York. It's largish room with projectors pointing at 4 huge walls. ( I wrote about it in previous blog post here ).

Research The Alternatives

I needed some kind of visualisation tool. The 3Sixty space is such that even a cheesy Powerpoint transition makes you feel visually assaulted, in a good way. I asked Twitter and jumped in.

I looked at Apple's Quartz Composer. Quartz Composer is a visual tool with which you create flow diagrams to create, absolutely stunning visualizations ( watch the showreel below ).


Although Quartz Composer is a visual tool ( you might not know how much of a fan of visual tools / programming languages I am. Did I mention... ) , being visual isn't necessarily enough to make something simple ( see below ).



If I'm honest, Quartz Composer is exactly the tool I should have chosen, but I felt that the learning involved wouldn't have enough general application to warrant the effort. It seems very much the sort of tool you could happily use for broadcast graphics though but didn't seem to have much in the way of "now add some sparkly stars" features. I managed to hack in an RSS feed pulling Tweet data to a Google Spreadsheet ( A TAGS spreadsheet created by +Martin Hawksey ) but although it looks nice ( shown below) , and from an animation perspective, "is swooshy",  if anyone has seen Quartz Composer before, this is pretty much old hat. I struggled to find which parameters I could change without it just blowing up and crashing.





Also, it also seemed that although the tool and its features were amazing, I wasn't clear if Apple have "killed it off". The community surrounding it seemed like they've moved on, and all the examples were either very old, missing, broke or all three.



I looked at d3.js, a JavaScript visualisation library. It looks great but my Javascript is far from great.  All the cool kids seem to be using d3.js but it looked like my experience with it would be like a man from the future ringing my doorbell, saying here's your toolkit, and leaving me with a million odd gadgets that I didn't recognise ( he would probably have green eyebrows and the gadgets would glow and chirrup ). In other words, it looked a bit beyond me.



I looked at Processing. Processing is a great visualisation tool in which you can write, in pseudo-Java, scripts that make images. A few years ago, I was lucky enough to attend a workshop run by Jer Thorpe, a talented guy who whips brilliant things together using Processing. Take a look at some of his twitter and other visualisations here (see below).  Jer convinced me that Processing was a great tool to *sketch* ideas into shape.


I downloaded heaps of examples from the Processing site, selecting the ones that a. looked fancy and b. looked simple and hackable.

I made something that *animated* and showed tweets ( shown below and source code here ). But I really found adding a ";" at the end of every line and HAVING to declare object types difficult.  IN short, both Java and JavaScript are outside my comfort zone. I keep trying but they don't really stick.


One of the weird things I noticed about exploring Processing was that I was actually copying kids' homework projects ( that they'd shared online ) . Struck me as weirdly hilarious.
--


So realising I needed more Python in my attempts I had a look at Py-Processing, a Processing clone but one that allows you to write in Python.  I started to feel like I was getting somewhere, quickly finding a twitter API library and splat fading text on top of each other ( shown below ). But the problem was that all the learning materials ( for Processing ) need sort of translating into Py-Processing. Getting Java .jar libraries to work was, at times, impossible and running your scripts through a little helper app, separates you from any serious debugging which is as bad as using Java.

I also noticed that my animations, I managed to make twitter avatars forage around the screen and bump into each other, were getting slow.






I looked at Nodebox. I'd vaguely recalled that Nodebox was a python tool, that had a little editor with which your small code fragments could be turned into visualisations. I was right, but since I last looked ( years ago ) they have removed the python bit and replaced it with a rather swish visual programming environment.

Although it looked great, and I managed to grab a JSON feed and show it ( see below ), and although some of the features ( like being able to turn text into vectors that you can easily mess with ) look great... I found it slow going.




AT THIS POINT I'M CRAPPING IT

So, I've been learning all of this in-between doing my day job and so far, for my efforts I pretty much have nothing except a few crap animations and a constant sick feeling from having tried stuff and having not quite "got it". I recognise this feeling, I've had it before a million times and I've learned to "just press on" and not let it get in the way too much.

But the clock is still ticking... ignore it. 

I was inspired by the work on show here and the fact that they reveal the *how* as well as the *what*, like this fish project here.



Then Someone Said It...

PyGame.

OR did they? I can't remember. But I thought I'd take a look.

PyGame is a tool for creating games with Python. I've looked at it before, found it a bit hard, found it really hard to install, and found it not really relevant to my other web work ( PyGame being desktop-oriented ). But then I thought, hang on... what I want to create is more like a computer game than an online visualisation. And it's PYTHON...

In a day I'd managed to make something that was simple to create, connected to Twitter, and bounced avatars around the screen.

I found a simple Twitter library called Tweepy that happily listened to Twitter and let me "do something" when a tweet with a hashtag was posted ( source code here ).




My Next Problem...

... was working how to pull my Twitter data into the PyGame environment. The wisdom seemed to be that there were two approaches, using low-level web sockets or using Twisted ( a python tool for working with servers and sockets ).

I tried both and found I couldn't send big packets of data with sockets. I couldn't with Twisted either. 

Sick feeling. Going round in circles. 

Sometimes Getting Side-lined Is A Good Thing

Four years ago I tried to learn neo4J, a graph database to help with my PPPeoplePowered project - a tool to help researchers find related researchers at the University of York, Sheffield and Leeds. But, I failed with neo4j, simply not able to translate my thinking into a query that returned anything, so I bottled it and wired something together using sellotape and MySQL.

With my aim being having something to store tweets ( and connections between tweets ) I thought I'd take a look at neo4j again. How hard could it be?

Layers Upon Layers Upon Layers

With increasing background desperation, I did what I always do. Thrash around "in the field" and try all the tools that seem to make sense, trying to understand the job at a high level and use the tools that make everything "more sensible" and easier and familiar.

Rolling up my sleeves and hoping, I got neo4j working with neo4django ( django is a python tools I'd used before) so that seemed sensible. Except it wasn't. neo4django needs an older version of neo4j to work. And once again, I find myself with bugs and errors that I have no idea are my fault, neo4django's fault or a root neo4j fault. And again, all the documentation needs mapping from the tool to the layer I'm using.

So, I decide to use neomodel, a tool ( based on Django ) but more closely tied to neo4j. It too needs an older version of neo4j. Again, I'm working with the layer rather than the tool.

Feeling sick. 

On the advice of David Bigelow, I decide to work with py2neo a python neo4j library without any Django/model familiarity and promise.

Start again. Again.

And after a few hours I have something that *works*. And in doing so, I also realise that both neo4Django and neomodel ( whilst really useful ) sort of hindered me by keeping my database thinking stuck in an "old fashioned" mindset, of tables and schemas and normalisation. They keep you focussing on your data rather than your relationships.

I say that, as if I now understand graph databases. I don't, but working with a *simpler* tool like py2neo is revealling something of the expressiveness of graph databases.

I'm hoping I can jump the next hurdle. Time is running out and I need to get a shifty on.

After a bit of tweaking in PyGame, I was able to create a "4 screen layout" and fire "stars" in a 3D star field way every time a tweet appeared( each of those grey stars on the pink background is a tweet ).  You can see the neo4j instances rolling by in the background....

It was a start!




Create My Own Tools On The Way To Understanding

Because I'm not a proper programmer, whilst I thoroughly approve of object orientation, I tend to create my own tools that are far from object oriented and simply simplify working with collections of complex objects, so that I might end with simple functions like... add_tweet() and add_person() etc. And then I work with those, having dipped down into the more complex stuff, in a fairly procedural and expressive way.

So that's what I'm doing now.

I've wired that Twitter Listener up to neo4j and my basic data looks like this...


It's a start! An encouraging start ( I hope )


Identify Some Helpful People

As an official "not really a programmer" I always find that I need, and am lucky to get help from people who are. I try not to bother them with the nitty-gritty questions ( which I can often answer myself with lots of trial and error ) and stick to the conceptual questions ( which trial and error often doesn't catch ) and the "design" questions ( like is this a "bad idea" ).

My next step is to try and test all my assumptions in code in a way that means I can ask some really stupid questions from these helpful people because I'm more than aware that, although I've got something that looks like it works, I'm not quite how. I also don't know if what I'm doing is even approximately "a good idea".

So here goes...

Some Very Silly neo4J and py2neo Questions

Some context. My model looks like this... it's very simple. I show how the nodes are connected, and the dotted circles are instances of the node types. The red dotty lines are how things *might* have a connection ( in the long term... I've no idea how this'll actually work, but, potentially a HashTag called "Google" may link to a "Company:name= Google" ).





1. My Index Muddiness

Is there a difference between an index and a node? Shouldn't every node be indexed? Why wouldn't you index a node?

 At the start of some code is it sensible to...
people = graph_db.get_or_create_index(neo4j.Node, "People")
tweets = graph_db.get_or_create_index(neo4j.Node, "Tweets")
entity_types = graph_db.get_or_create_index(neo4j.Node, "EntityTypes")


I think I don't really understand Indexes. I don't understand whether when I create a node is that the same and creating an indexed node? 

Is this...

tweet = graph_db.get_or_create_indexed_node("Tweet", "twitter_id", "5454545455", {"text":"Elvis Presley lives lol"})

...the same as this?


tweet, = graph_db.create(node({"twitter_id":"5454545455","text":"Elvis Presley lives lol"})#create the node
index = graph_db.get_or_create_index(neo4j.Node, "Tweets")
index.add("twitter_id", ""Elvis Presley lives lol"tweet)

Does the first create a node AND add it to an index? Is the second one a node that needs to be saved still, especially if I add extra relationships to it? And do I have to save it after adding each relationship... and what is the difference between...

store.save(tweet

... and ...

tweet.save() <- Doesn't work, not sure why I can't do this

What would be the point of creating a node that isn't indexed? Would a rule of thumb be just to index everything ( assuming you know your data set won't be THAT big )?

Do you, like in Django, typically create a node object, set it's properties, add a few more, add some relationships, and then lastly, save()? Or does it save as it goes?



---

2. My Label Muddiness

What are Labels? Are they different from a name? Could I use them to approximate Hierarchy/inheritance?

For example, if I create a node that has a label "Person" and "TwitterPerson"... and add it to the index of "People"... and I add a hashtag node labelled "Person" ... might I at some point, connect the two items in a query?

I'm unclear about how you'd create an index called "People" and a node labelled "Person" and if the two will ever meet and if so, in what circumstances ( i.e a Cypher query )

--

3. My Linking Muddiness

When I'm creating a link between objects do I have to...

graph_db.create(rel(e, "IS_A", et), {'relevance':relevance})
store.save(e)

... do I also have to have saved the et node or has that been done already? So in the the example above do the nodes e and et get saved when I store.save(e)?

I think I can save nodes, without involving the store? Is the store just a tool to make saving easier or does it save a node differently? I guess I only want one way/method of saving...

---
4. My DateTime Muddiness

This looks a nightmare, I might end up storing a date as seconds ( an integer ) and as a string... is this sensible or should I just bite the bullet and work out how to handle the Gregorian model ( I know I should really... when in Rome etc. )

--

5. My Cypher Muddiness 

Are there "generic ways" to get long paths and short paths regardless of what objects and relationships you have... A sort of show me some interesting/random connections?

--

Conclusion


I guess all of these add up to a pretty fundamental lack of clarity of what different concepts are, such as Nodes, Labels, Indexes... The irony here is that I feel that I'm working quite successfully, but know that I don't really know what I'm doing.

And as for working with the nodes, again, I can fairly successfully save and retrieve nodes but I have no idea if I'm doing it right, no idea if I'm both creating duff/null nodes and indexed nodes or attempting to build something sensible or doing something like storing a novel in a normalised MySQL database ( it'll work but... ).

I'm now at that point of feeling familiar but not quite at ease with neo4j. And feeling positive at the potential...























No comments:

Post a Comment