avatar

KZK's blog

Lead engineer

Thought Experiment

Thought experiments allow us to dig deeper on “What if” questions allowing us to: Improve our understanding of the world, by making thinking what if… and then check how the world would change. Also tells you a lot of what you don’t know Get consciousness of our plan, for example I buy 1000 of stock, what if, starts going up 10% in a single day? should I sell? what if goes down but little by little?

None

Bank of trust: each interaction is an opportunity to gain/loss trust Mental Models

None

What is an eigenvector? Eigenvectors is a decomposition of a matrix, where we multiply the matrix by a vector, and the result is the same vector multiplied by an scalar $$A*\vec{v} = \lambda * \vec{v}$$ What is the geometric explanation of eigenvectors? it will tell you, which directions while not change when using the transformation matrix A, the eigenvectors will not change direction, they will only stretch by the factor of the eigenvalue ($$ \lambda $$)

Cosine Similarity

-What is cosine similarity? is a metric to compare two vectors depending on the inner angle, and that way, it’s not affected for the size of the vector Cosine of two vectors is: $$cos(Beta) = \frac {\vec(v) * \vec(w)} {|\vec(v)| * |\vec{w}|} $$ if the angle is orthogonal, the cos of 90 is 0, meaning if the cosine similarity is 0, means that the vectors are orthogonal, meaning that the similarity is 0.

JSII - Create libraries in TypeScript, use them everywhere!

Jsii is a toolchain that allow you to code libraries in typescript, and use them in Python, C# and Java.

Likelihood

Naive Bayes

On Naive Bayes we estimate the probability for each class by using the joint probability of the words in classes. The Naive Bayes formula is just the ratio between these two probabilities, the products of the priors and the likelihoods Why is Naive Bayes, named naive? Cause it makes the assumption that features used for classification are independent Algorithm: Get the frequency of each word in each class freq(word, class) Also get the total number of words in each class countWords(class) Now we can calculate P(class | word appeared) as $$\frac {freq(word, class)} {countWords(class)}$$ Now to infer what is the probability of a sentence being of a class or another we can $$\frac {P(positive)}{P(negative} \prod_{i}^{m} \frac {P(w_i | positive)} {P(w_i | negative )}$$ If the values is > 1, meaning that overall that the sentence is positive prio is $$\frac {P(positive)}{P(negative}$$ likelihood is $$\prod_{i}^{m} \frac {P(w_i | positive)} {P(w_i | negative )}$$ Log Likelihood Why we use Log Likelihood for numeric stability, preventing underflow $$log(\frac {P(positive)}{P(negative} \prod_{i}^{m} \frac {P(w_i | positive)} {P(w_i | negative )})$$ how you can decompose $$log(a * b)$$ $$log(a) + log(b)$$ We can rewrite naive bayes using log likelihood as… $$log(\frac {P(positive)}{P(negative}) + log(\prod_{i}^{m} \frac {P(w_i | positive)} {P(w_i | negative )})$$, Since now is the logarithm a sentence will be positive if log likelihood is > 0 Assumptions Naive Bayes assumes that features are independent, in NLP specifically means that there is no overlap of meaning in the words, which is not true.

None

Keycloack 🔗What is Keycloack? is a Open Source Identity and Access Management, Similar services of keycloack? Okta and Auth0

GraalVM

GraalVM 🔗GraalVM is an alternative to the JVM (but is not a runtime), but with special thought for performance and polyglot. When executing java in the JVM, you can tweak the the JAVA_PATH dynamically, this way the java compiler can not apply the optimization and tricks that compiled languages can do (GCC, LLVM) GraalVM actually is more like a java compiler, compiles the java code to binary, jumping the step of bytecode.

Notes on NLP Course on Coursera

Week 1: 🔗 One easy way to convert a text to an array of numbers is by using a technique named Bag of Words which consist of converting every word in the vocabulary into an index, and then a sentence is 0 or 1 depending if the word exist this is a sparse representation, for each tweet you will need a vector of size == your vocabulary with a lot of features equals to 0 With an spare representation a logistic regression model will need to learn n+1 parameters where n is the size of the vocabulary this is problematic for big vocabulary, becoming longer than need to train We can build a frequency table for each category (inn sentimental analysis is would be one for positive and one for negative).