TumbleStream

Your Curated Tumblr Experience Awaits!

Data Science - Blog Posts

3 months ago

as a young afab queer person going into computer/data science, it makes me so sad that the face of the tech industry is a largely misogynistic homophobic transphobic trump-suck-up unethical billionaire bro club like musk, bezos, and zuckerberg. like, computers and the internet have limitless potential, but we’re using it for this????

i cannot wait until all these dipshits get what’s coming to them so a new generation of leaders can rise up and make tech kind.


Tags
4 years ago

zero to pandas

Zero to Pandas: Data Analysis with Python

There are alot of Python courses out there that we can jump into and get started with. But to a certain extent in that attempt to learn the language, the process becomes unbearably long and frustratingly slow. We all know the feeling of wanting to run before we could learn how to walk; we really wanna get started with some subtantial project but we do not know enough to even call the data into the terminal for viewing.

Back in August, freeCodeCamp in collaboration with Jovian.ai, organized a very interesting 6-week MOOC called Data Analysis with Python: Zero to Pandas and as a self-proclaimed Python groupie, I pledged my allegiance!

If there are any expectation that I've managed to whizz myself through the course and obtained a certificate, nothing of that sort happened; I missed the deadline cause I was busy testing out every single code I found and work had my brain on overdrive. I can't...I just...can't. Even with the extension, I was short of 2 Pythonic answers required to earn the certificate. But don't mistake my blunders for the quality of the content this course has to offer; is worth every gratitude of its graduates!

Zero to Pandas MOOC is a course that spans over 6 weeks with one lecture webinar per week that compacts the basics of Python modules that are relevant in executing data analysis. Like the play on its name, this course assumes no prior knowledge in Python language and aims to teach prospective students the basics on Python language structure AND the steps in analyzing real data. The course does not pretend that data analytics is easy and cut-corners to simplify anything. It is a very 'honest' demonstration that effectively gives overly ambitious future data analysts a flick on the forehead about data analysis. Who are we kidding? Data analysis using programming language requires sturdy knowledge in some nifty codes clean, splice and feature engineer the raw data and real critical thinking on figuring out 'Pythonic' ways to answer analytical questions. What does it even mean by Pythonic ways? Please refer to this article by Robert Clark, How to be Pythonic and Why You Should Care. We can discuss it somewhere down the line, when I am more experienced to understand it better. But for now, Packt Hub has the more comprehensive simple answer; it simply is an adjective coined to describe a way/code/structure of a code that utilizes or take advantage of the Python idioms well and displays the natural fluency in the language.

The bottom line is, we want to be able to fully utilize Python in its context and using its idioms to analyze data.

The course is conducted at Jovian.ai platform by its founder; Aakash and it takes advantage of Jupyter-like notebook format; Binder, in addition to making the synchronization available at Kaggle and Google's Colab. Each webinar in this course spans over close to 2 hours and each week, there are assignments on the lecture given. The assignments are due in a week but given the very disproportionate ratio of students and instructors, there were some extensions on the submission dates that I truly was grateful for. Forum for students is available at Jovian to engage students into discussing their ideas and question and the teaching body also conducts office hours where students can actively ask questions.

The instructor's method of teaching is something I believe to be effective for technical learners. In each lectures, he will be teaching the codes and module requires to execute certain tasks in the thorough procedure of the data analysis task itself. From importing the .csv formatted data into Python to establishing navigation to the data repository...from explaining what the hell loops are to touching base with creating functions. All in the controlled context of two most important module for the real objective of this course; Numpy and Pandas.

My gain from this course is immensely vast and that's why I truly think that freeCodeCamp and Jovian.ai really put the word 'tea' to 'teachers'. Taking advantage of the fact that people are involuntarily quarantined in their house, this course is something that should not be placed aside in the 'LATER' basket. I managed to clear my head to understand what 'loop' is! So I do think it can solve the world's problem!

In conclusion, this is the best course I have ever completed (90%!) on data analysis using Python. I look forward to attending it again and really finish up that last coursework.

Oh. Did I not mention why I got stuck? It was the last coursework. We are required to demonstrate all the steps of data analysis on data of our choice, create 5 questions and answer them using what we've learned throughout the course. Easy eh? Well, I've always had the tendency of digging my own grave everytime I get awesome cool assignments. But I'm not saying I did not do it :). Have a look-see at this notebook and consider the possibilities you can grasp after you've completed the course. And that's just my work...I'm a standard C-grade student.

And the exciting latest news from Jovian.ai is that they have upcoming course at Jovian for Deep Learning called Deep Learning with PyTorch: Zero to GANS! That's actually yesterday's news since they organized it earlier this year...so yeah...this is an impending second cohort! Tentatively, the course will start on Nov 14th. Click the link below to sign-up and get ready to attack the nitty-gritty. Don't say I didn't warn ya.

Deep Learning with PyTorch: Zero to GANS

And that's me, reporting live from the confinement of COVID pandemic somewhere in a developing country at Southeast Asia....


Tags
4 years ago

wildlife study design & analysis

Wildlife Study Design & Analysis

To cater for my lack of knowledge in biological data sampling and analysis, I actually signed up for the 'Wildlife Study Design and Data Analysis' organized by Biodiversity Conservation Society Sarawak

So, this new year, I've decided to take it down a notch and systematically choose my battlefield. Wildlife species data has always been mystery at me. As we all know, biologists hold them close to their hearts to the point of annoyance sometimes (those movies with scientists blindly running after some rare orchids or snakes or something like that really wasn't kidding). Hey...I get it and I totally agree - the data that belongs to the organization has to be treated with utmost confidentiality and all by the experts that collects them. Especially since we all know that they are not something so easily retrieved. Even more so, I optimistically support for the enthusiasm to be extended to their data cleaning and storing too while they're at it. But it doesn't mean I have to like the repercussions. Especially not when someone expects a habitat suitability map from me and I have no data to work with and all I had is a ping-pong game of exchanging jargon in the air with the hopes that the other player gets what you mean cough up something you can work with. Yes...there is not a shred of shame here when I talk about how things work in the world, but it is what it is and I'm not mad. It's just how it works in the challenging world of academics and research. 

To cater for my lack of knowledge in biological data sampling and analysis, I actually signed up for the 'Wildlife Study Design and Data Analysis' organized by

Biodiversity Conservation Society Sarawak (BCSS for short)

or

Pertubuhan Biodiversiti Konservasi Sarawak

It just ended yesterday and I can't say I did not cry internally. From pain and gratitude and accomplishment of the sort. 10 days of driving back and forth between the city center and UNIMAS was worth the traffic shennanigans.  

It is one of those workshops where you really do get down to the nitty-gritty part of understanding probability distribution from scratch; how to use it for your wildlife study data sampling design and analyzing them to obtain species abundance, occupancy or survival. And most importantly, how Bayes has got anything to do with it. I've been hearing and seeing Bayesian stats, methods and network on almost anything that involves data science, R and spatial stats that I am quite piffed that I did not understand a thing. I am happy to inform that now, I do. Suffice to say that it was a bootcamp well-deserved of the 'limited seats' reputation and the certificate really does feel like receiving a degree. It dwindles down to me realizing a few things I don't know:

I did not know that we have been comparing probabilities instead of generating a 'combined' one based on a previous study all these years.

I did not know that Ronald Fisher had such strong influence that he could ban the usage of Bayesian inference by deeming it unscientific.

I did not know that, for Fisher, if the observation cannot be repeated many times and is uncertain, then, the probability cannot be determined - which is crazy! You can't expect to shoot virus into people many times and see them die to generate probability that it is deadly!

I did not know that Bayes theorem actually combines prior probability and the likelihood data you collected on the field for your current study to generate the posterior probability distribution!

I did not know that Thomas Bayes was a pastor and his theory was so opposed to during his time. It was only after Ronald Fisher died that Bayesian inference gain favor especially in medical field. 

I did not know...well...almost anything at all about statistics!

It changed the way I look at statistics basically. But I self-taught myself into statistics for close to 9 years and of course I get it wrong most of the time; now I realize that for the umpph-th time. And for that, I hope the statistics power that be forgives me. Since this boot camp was so effective, I believe it is due to their effort in developing and executing the activities that demonstrates what probability distribution models we were observing. In fact, I wrote down the activities next to the topic just to remember what the deal was. Some of the stuffs covered are basics on Binomial Distribution, Poisson Distribution, Normal/Gaussian Distribution, Posterior probability, Maximum Likelihood Estimate (MLE), AIC, BACI, SECR, Occupancy and Survival probability. Yes...exhausting and I have to say, it wasn't easy. I could listen and distracted by paper falling for a fraction of time just to find myself lost in the barrage of information. What saved me was the fact that we have quizzes that we have to fill in to evaluate our understanding of the topic for the day and discuss them first thing in the next session. Best of all, we were using R with the following packages: wiqid, unmarked, rjags and rasters. Best locations for camera traps installation was discussed as well and all possible circumstances of your data; management and collection itself on the field, were covered rigorously. 

For any of you guys out there who are doing wildlife study, I believe that this boot camp contains quintessential information for you to understand to design your study better. Because once the data is produced, all we can do it dance around finding justification of some common pitfalls that we could've countered quite easily. 

In conclusion, not only that this workshop cast data analysis in a new light for me, but it also helps establishes the correct steps and enunciates the requirements to gain most out of your data. And in my case, it has not only let me understand what could be going on with my pals who go out into the jungle to observe the wildlife first hand, it has also given me ideas on looking for the resources that implements Bayesian statistics/methods on remote sensing and GI in general. Eventhough location analysis was not discussed beyond placing the locations of observation and occasions on the map, I am optimistic in further expanding what I understood into some of the stuff I'm planning; habitat suitability modeling and how to not start image classification from scratch...every single time if that's even possible. 

For more information on more workshops by BCSS or wildlife study design and the tools involved, check out the links below:

Biodiversity Conservation Society Sarawak (BCSS) homepage: https://bcss.org.my/index.htm

BCSS statistical tutorials: https://bcss.org.my/tut/

Mike Meredith's home page: http://mikemeredith.net/

And do check out some of these cool websites that I have referred to for more information as well as practice. Just to keep those brain muscles in loop with these 'new' concepts:

Statistical Rethinking: A Bayesian Course with Examples in R and Stan: https://github.com/rmcelreath/statrethinking_winter2019

Probability Concepts Explained: Introduction by Jonny Brooks-Bartlett: https://towardsdatascience.com/probability-concepts-explained-introduction-a7c0316de465 

Probability Concepts Explained: Maximum Likelihood Estimation by Jonny Brooks-Bartlett: https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1

Probability Concepts Explained: Bayesian Inference for Parameter Estimation by Jonny Brooks-Bartlett 

I'll be posting some of the things I am working on while utilizing the Bayesian stats. I'd love to see yours too!

P/S: Some people prefer to use base R with its simple interface, but if you're the type who works better with everything within your focal-view, I suggest you install RStudio. It's an IDE for R that helps to ease the 'anxiety' of using base R. 

P/S/S: Oh! Oh! This is the most important part of all. If you're using ArcGIS Pro like I do, did you know that it has R-Bridge that can enable the accessibility of R workspace in ArcGIS Pro? Supercool right?! If you want to know more on how to do that, check out this short 2 hour course on how to get the extension in and an example on how to use it: 

Using the R-Bridge: https://www.esri.com/training/catalog/58b5e417b89b7e000d8bfe45/using-the-r-arcgis-bridge/


Tags
8 years ago
Wondering whether you should use R or Python for your next data analysis post? Check our infographic "Data Science Wars: R vs Python".

Tags
8 years ago
Statistics, code and data

An analysis of Pokémon GO created with R. OMG They’re so beautiful!!

image

“I see nice things here:

The cluster are determined by the type of pokemon.

The algorithm keep the chain evolution side by side. You can see charmander, charmeleon, charizard together.

Put steelix, onix and wailord (the most heavy pokémons) together.”


Tags
Immersive Insights: IBM’s AR Data Visualization Tool At SXSW – Medium
Immersive Insights: IBM’s AR Data Visualization Tool At SXSW – Medium
Immersive Insights: IBM’s AR Data Visualization Tool At SXSW – Medium

Immersive Insights: IBM’s AR Data Visualization Tool at SXSW – Medium

The IBM is making space at SXSW from March 10–14 showcased dozens of projects in AI, robotics, VR, AR, and other emerging technology fields. My team and I had the fortunate opportunity to demo our AR project, Immersive Insights.

Immersive Insights aims to be a tool that provides data scientists with insights to large-scale data through a spatial visualization experience. While the demo is still in an early stage, we have real-time data streaming to the headset and displayed in 3D, fully manipulable and scalable. It is also our way to begin exploring, creating and refining the next generation of enterprise applications for data analysis.

Read more:


Tags
Loading...
End of content
No more pages to load
Explore Tumblr Blog
Search Through Tumblr Tags